Exam 2 ReviewPART I (your data):
Identify each of the following for your data listed on
SX PO #1.
Experimental unit: (3 pts)
– A single antique grandfather clock
Dependent variable (give units of measurement): (3 pts)
– the price of an antique grandfather clock
Independent variable (give units of measurement): (3 pts)
– the age of an antique grandfather clock
Multiple Choice. (Circle your answer.)
Consider the hypothesized straight-line
model,
E(y) = β0 + β1 x, for your data.
Examine the least squares line shown on
your scatterplot, SX PO #2. Which of the
following is NOT a property of the line? (3
pts)
A.
B.
C.
D.
E.
The line yields a sum of squared errors (SSE) that is minimum.
The line yields an average error of prediction equal to 0.
It is considered the “best” fitting line through the data.
It is the true line of means for the dependent variable y.
The line has a positive slope.
Practically interpret the value of the correlation
coefficient relating y to x. (5 pts) (SX PO # 3)
• There is a strong positive linear relationship
between the age and price of an antique
grandfather clock.
Least Squares Linear Regression of PRICE
Predictor
Variables
Constant
AGE
Coefficient
-192.047
10.4798
Std Error
264.372
1.79327
T
-0.73
5.84
P
0.4732
0.0000
Practically interpret the y-intercept of the least squares line.
• There is no practical interpretation since age=0 doesn’t make sense
Practically interpret the true slope of the line.
• For every one-year increase in the age of the clock, we estimate the
price of the clock to increase by $10.48.
Least Squares Linear Regression of PRICE
Predictor
Variables
Constant
AGE
Coefficient
-192.047
10.4798
Test:
Test Statistic
P-value:
Conclusion:
Std Error
264.372
1.79327
T
-0.73
5.84
P
0.4732
0.0000
𝐻0 : 𝛽1 = 0
𝐻𝑎 : 𝛽1 > 0
t = 5.84
P = .0000/2=.0000
At α = .05, we reject Ho. There is sufficient evidence to
indicate that the age of the clock is a useful positive
linear predictor of price.
TRUE or FALSE. (Circle your answer.) For inferences derived from the above
test to be valid, the regression errors must be independent and normally
distributed, with mean 0 and constant variance. (2 pts.)
TRUE
A company produces Fresh, a brand of laundry detergent. The marketing manager is investigating the
factors that impact monthly demand for the product. A sample of 60 recent months was selected and
the following variables were recorded for each month: demand (in hundred thousand bottles), price
at which Fresh is sold during the month (dollars), and advertising expenditure (hundred thousand
dollars). The manager used STATISTIX to build a regression model for predicting monthly demand.
Use the following STATISTIX printouts to answer the exam questions in Part II.
STUDENT EDITION OF STATISTIX
Unweighted Least Squares Linear Regression of DEMAND
Predictor
Variables
Constant
PRICE
ADVEXP
Coefficient
4.78565
-0.54697
0.86195
R-Squared
Adjusted R-Squared
Source
Regression
Residual
Total
DF
2
57
59
Std Error
2.49760
0.57983
0.09141
0.6881
0.6772
SS
15.5564
7.0499
22.6063
Hypothesized Equation:
Prediction Equation:
T
1.92
-0.94
9.43
P
0.0604
0.3495
0.0001
Resid. Mean Square (MSE)
Standard Deviation
MS
7.77822
0.12368
F
62.89
0.12368
0.35168
P
0.0002
𝐸 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2
𝑦ො = 4.786 − 0.547𝑥1 + 0.862𝑥2
β interpretations!
STUDENT EDITION OF STATISTIX
Unweighted Least Squares Linear Regression of DEMAND
Predictor
Variables
Constant
PRICE
ADVEXP
Coefficient
4.78565
-0.54697
0.86195
R-Squared
Adjusted R-Squared
Source
Regression
Residual
Total
Global F-test:
Test:
Test Statistic
P-value:
Conclusion:
DF
2
57
59
Std Error
2.49760
0.57983
0.09141
0.6881
0.6772
SS
15.5564
7.0499
22.6063
T
1.92
-0.94
9.43
P
0.0604
0.3495
0.0001
Resid. Mean Square (MSE)
Standard Deviation
MS
7.77822
0.12368
F
62.89
0.12368
0.35168
P
0.0002
𝐻0 : 𝛽1 = 𝛽2 = 0
𝐻𝑎 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 0
F = 62.89
P = .0002
At α = .05, we reject Ho. There is sufficient evidence to indicate
that the model with Price and Advertising Expense is statistically
useful for predicting monthly Fresh demand.
STUDENT EDITION OF STATISTIX
Unweighted Least Squares Linear Regression of DEMAND
Predictor
Variables
Constant
PRICE
ADVEXP
Coefficient
4.78565
-0.54697
0.86195
R-Squared
Adjusted R-Squared
Source
Regression
Residual
Total
DF
2
57
59
Std Error
2.49760
0.57983
0.09141
0.6881
0.6772
SS
15.5564
7.0499
22.6063
T
1.92
-0.94
9.43
P
0.0604
0.3495
0.0001
Resid. Mean Square (MSE)
Standard Deviation
MS
7.77822
0.12368
F
62.89
0.12368
0.35168
P
0.0002
68.81% of the variation in the monthly demand of Fresh can be explained by
the regression model that includes price and advertising expenditure.
We expect most of the monthly Fresh demand values to fall within 70,336 of
their least squares predicted values
Predicted/Fitted Values of DEMAND
Lower Predicted Bound
Predicted value
Upper Predicted Bound
SE (Predicted Value)
7.1920
7.9062
8.6204
0.3566
Unusualness (Leverage)
Percent Coverage
Corresponding T
0.0284
95.0
2.00
Lower Fitted Bound
Fitted Value
Upper Fitted Bound
SE (Fitted Value)
7.7875
7.9062
8.0250
0.0593
Predictor Values: PRICE = 3.7500, ADVEXP = 6.0000
We are 95% confident that the monthly demand of Fresh laundry detergent for a
single month that had a price of $3.75 and an advertising budget of $600,000 will fall
between 719,200 and 862,040 bottles
We are 95% confident that the average monthly demand of Fresh laundry detergent
for all months that had a price of $3.75 and an advertising budget of $600,000 will fall
between 778,750 and 802,500 bottles
Unweighted Least Squares Linear Regression of DEMAND
Predictor
Variables
Coefficient
Constant
4.78565
PRICE
-0.54697
ADVEXP
0.86195
PRICE_ADVEXP
2.64000
Std Error
2.49760
0.57983
0.09141
2.18444
R-Squared
0.7022
Adjusted R-Squared 0.6881
Source
Regression
Residual
Total
DF
3
56
59
SS
15.8741
6.7322
22.6063
T
1.92
-0.94
9.43
1.21
P
0.0604
0.3495
0.0001
0.1554
Resid. Mean Square (MSE)
Standard Deviation
MS
5.29137
0.12022
F
44.01
0.12022
0.34672
P
0.0017
Give a practical interpretation of the phrase “price and advertising
expenditure interact”.
The relationship between monthly demand and price depends on the
amount of advertising expenditure for that month.
Unweighted Least Squares Linear Regression of DEMAND
Predictor
Variables
Coefficient
Constant
4.78565
PRICE
-0.54697
ADVEXP
0.86195
PRICE_ADVEXP
2.64000
Std Error
2.49760
0.57983
0.09141
2.18444
R-Squared
0.7022
Adjusted R-Squared 0.6881
Source
Regression
Residual
Total
Interaction test:
Test:
Test Statistic
P-value:
Conclusion:
DF
3
56
59
SS
15.8741
6.7322
22.6063
T
1.92
-0.94
9.43
1.21
P
0.0604
0.3495
0.0001
0.1554
Resid. Mean Square (MSE)
Standard Deviation
MS
5.29137
0.12022
F
44.01
0.12022
0.34672
P
0.0017
𝐻0 : 𝛽3 = 0
𝐻𝑎 : 𝛽3 ≠ 0
t = 1.21
P = .1554
At α = .05, we fail to reject Ho. There is insufficient evidence to
indicate that the Price and Advertising Expense interact when
predicting monthly Fresh demand.
Write a model for demand (y) that proposes a curvilinear
relationship with price (x1).
𝐸 𝑌 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥12
where y = demand and x1 = price
Write a model for demand (y) as a function of whether or
not the temperature during the month exceeds 70°.
𝐸 𝑌 = 𝛽0 + 𝛽1 𝑥2
where y = demand and
x2 = 1 if temp over 70o, 0 if not
Regression Analysis
Model Terms – why???
Lecture 12: Interaction Model
GF Clocks: y=Price($), x1=Age(years), x2=# bidders
Alternative Theory:
(1) Price increases linearly with Age & # Bidders
(2) Rate of increase of Price with Age is steeper for a larger #
Bidders
Interaction Model (continued)
Theory 1: E(y) = β0 + β1×1 + β2×2 (1st-order model)
Slope of y vs. x1 line = β1 for fixed x2
Slope of y vs. x2 line = β2 for fixed x1
Interaction Model:
Theory 2: E(y) = β0 + β1×1 + β2×2 + β3×1 x2
Slope of y vs. x1 line = (β1 + β3×2)
Slope of y vs. x2 line = (β2 + β3×1)
Interaction Model (continued)
Interpretation of “x1 and x2 interact”:
The relationship between y and x1 depends on x2
Why different slopes for interaction model?
E(y) = β0 + β1×1 + β2×2 + β3×1 x2
x2 = 5: βo+ β1×1 + β2(5) + β3×1(5) =
(β0 + 5β2) + (β1 + 5β3)x1
slope
x2 = 10: βo+ β1×1 + β2(10) + β3×1(10) =
(β0 + 10β2) + (β1 + 10β3)x1
slope
Interaction Model (continued)
Interaction (2nd-order) Model:
E(y) = β0 + β1×1 + β2×2 + β3x1x2
Slope of price/age line: β1+ β3×2
Slope of price/bidders line: β2+ β3×1
Global F-test: H0: β1 = β2 = β3= 0
Important t-test: H0: β3= 0
STATISTIX software results:
Least Squares Linear Regression of PRICE
Predictor
Variables
Constant
AGE
NUMBIDS
AGEBIDS
Coefficient
320.458
0.87814
-93.2648
1.29785
R²
Adjusted R²
AICc
PRESS
0.9539
0.9489
295.25
288487
Source
DF
Regression 3
Residual
28
Total
31
Cases Included 32
SS
4578427
221362
4799789
Std Error
295.141
2.03216
29.8916
0.21233
T
1.09
0.43
-3.12
6.11
P
0.2868
0.6690
0.0042
0.0000
Mean Square Error (MSE)
Standard Deviation
MS
1526142
7905.79
Missing Cases 0
F
193.04
VIF
0.0
12.2
28.3
30.5
7905.79
88.9145
P
0.0000
Interaction Model (continued)
Step 4 (a): Assess Overall Model Adequacy
Model: E(y) = β0 + β1×1 + β2×2 + β3x1x2
Test H0: β1 = β2 = β3= 0
Ha: At least one βi is not 0
Global F=193, p-value= 0
Conclusion: α=.05 > p-value= 0 → Reject H0
Sufficient evidence (at α=.05) to conclude that the
interaction model with Age (x1) & # Bidders (x2) is
statistically useful for predicting Price (y).
Interaction Model (continued)
Step 4 (b): Test Important Beta (interaction term)
Model: E(y) = β0 + β1×1 + β2×2 + β3x1x2
Test H0: β3 = 0 (no interaction)
Ha: β3 > 0 (positive interaction-slope increases)
t-value =6.11, p-value=0/2 = 0
Conclusion: α=.05 > p-value=0 → Reject H0
Sufficient evidence (at α=.05) of positive interaction
between Age (x1) & # Bidders (x2) in predicting Price
(y) – Theory #2 supported.
Interaction Model (continued)
Caveat #1: Avoid interpreting other t-tests
Drop Age (x1) from the model since p-value = .669?
No; x1 must be important since x1 and x2 interact KEEP AGE IN MODEL!
Caveat #2: Be careful when interpreting beta estimates
𝛽መ2 = -93.26; Price decreases as # bidders increases?
Not for all ages (??); slope of price/bidders line is β2+ β3×1
slope ≈-93 + 1.3(100) = +37 for a 100 yr-old clock
What happens to the slope for number of bidders when the age of
the clock is between 71 and 72 years? Why?
Interaction Model (continued)
Step 4 (c): Evaluate adj-R2 and 2s
Adjusted-R2 = .949 (increased?)
94.9% of the sample variation in auction prices (y)
can be explained by the interaction model with Age
(x1) and Number of Bidders (x2).
2s = 2(89) = 178 (decreased?)
95% of the sampled auction prices will fall within
$178 of their predicted values using the
interaction model with Age (x1) and Number of
Bidders (x2).
Scorecard
MODEL
X1=age
X2=bids
x1 and x2
x1, x2, and x1x2
P-value
0.0000
0.0252
0.0000
0.0000
Std. Deviation
273.5
367.4
133.5
88.9
R-Square
.532
.156
.892
.954
(812, 1947)
(286, 1870)
(713, 1290)
(766, 1152)
PI for y
Regression Analysis
Lecture 13: Other Models
(1) Curvilinear Model with QN-x:
Allows for nonlinear (e.g., curvature) relationship
between y and x
(2) Model with a QL-x variable:
Allows mean of y to differ for different levels of the
QL-x variable
Quadratic (Curvilinear) Model
Section 12.6: E(y) = β0 + β1x + β2×2 (2nd-order model)
Graphs as a “quadratic” or curve relating y to x
Quadratic Model (continued)
Experimental unit = home (sample n = 15 homes)
Dependent Variable: Monthly Electric Usage (QN)
Independent Variable: Size of Home (QN)
Quadratic Model (continued)
Theory:
Rate of increase of usage (y) with
size (x) is slower for larger homes
Scatter Plot of USAGE vs SIZE
2100
USAGE
1900
1700
1500
1300
1100
1200
2000
2800
SIZE
3600
Quadratic Model (continued)
Step 1: Propose model E(y) = β0 + β1x+ β2×2
Step 2: Estimate unknown betas
Step 3: Assumptions on error
Step 4: Evaluate model adequacy (game plan)
Global F-test, H0: β1 = β2 = 0
Important t-test, H0: β2 = 0
Evaluate adj-R2 and 2s
Step 5: Use model for prediction
STATISTIX software results:
Least Squares Linear Regression of USAGE
Predictor
Variables
Constant
SIZE
SIZESQ
Coefficient
-806.717
1.96162
-3.404E-04
R²
Adjusted R²
AICc
PRESS
Source
Regression
Residual
Total
0.9773
0.9735
126.13
56695
DF
2
12
14
Cases Included 15
Std Error
166.872
0.15252
3.212E-05
T
-4.83
12.86
-10.60
P
0.0004
0.0000
0.0000
Mean Square Error (MSE)
Standard Deviation
SS
1300900
30240
1331140
MS
650450
2520.02
Missing Cases 0
F
258.11
VIF
0.0
74.2
74.2
2520.02
50.1998
P
0.0000
Quadratic Model (continued)
Step 3 – Estimate betas: E(y) = β0 + β1x + β2×2
𝛽መ0 = -806.7, 𝛽መ1 = 1.96, 𝛽መ2 = -.00034
Interpretations:
β0: y-intercept of curve
No practical int. since Size(x)=0 is nonsensical
β 1: not a slope, but a shift parameter
Shifts parabola right or left along the x-axis;
No practical interpretation
β 2: Rate of curvature; larger the number the faster the
rate; Negative “sign” indicates downward
curvature
Quadratic Model (continued)
Step 4 (a): Assess Overall Model Adequacy
Model: E(y) = β0 + β1x + β2×2
Test: H0: β1 = β2 = 0
Ha: At least one β is not zero
Global-F Test Statistic = 258, p-value=0
Conclusion: α=.05 > p-value=0 → Reject H0
Sufficient evidence (at α=.05) to conclude that the
quadratic model with Size (x) is statistically
useful for predicting Usage (y).
STATISTIX software results:
Least Squares Linear Regression of USAGE
Predictor
Variables
Constant
SIZE
SIZESQ
Coefficient
-806.717
1.96162
-3.404E-04
R²
Adjusted R²
AICc
PRESS
Source
Regression
Residual
Total
0.9773
0.9735
126.13
56695
DF
2
12
14
Cases Included 15
Std Error
166.872
0.15252
3.212E-05
T
-4.83
12.86
-10.60
P
0.0004
0.0000
0.0000
Mean Square Error (MSE)
Standard Deviation
SS
1300900
30240
1331140
MS
650450
2520.02
Missing Cases 0
F
258.11
VIF
0.0
74.2
74.2
2520.02
50.1998
P
0.0000
Quadratic Model (continued)
Step 4 (b): Test Important Beta (squared term)
Model: E(y) = β0 + β1x + β2×2
Test: H0: β2 = 0 (no curvature)
Ha: β2 < 0 (downward curvature – why??)
t-value =-10.60, p-value=0/2 = 0
Conclusion: α=.05 > p-value=0 → Reject H0
Sufficient evidence (at α=.05) of downward
curvature between Size (x) & Usage (y)
– Theory supported.
Quadratic Model (continued)
Caveat #1: Avoid interpreting other t-tests
Why not test β1 for Size (x)?
No need; x must be important since curvature
tested and significant
Caveat #2: Be careful when predicting y
Avoid “extrapolation” –
selecting x outside
range of the sample
Quadratic Model (continued)
Step 4 (c): Evaluate adj-R2 and 2s
Adjusted-R2 = .973
97.3% of the sample variation in Usage (y) values
can be explained by the quadratic model with
Size (x)
2s = 2(50) = 100
95% of the sampled Usage (y) values will fall
within 100 kw-hours of their predicted values
using the quadratic model with Size (x)
Modeling Qualitative Data (2 levels)
Discrimination in the workplace example:
Model Salary of USF professor based on Gender (M,F)
Dependent Var: y=salary; Independent Var: Gender (M,F)
Experimental Unit = a single USF professor
QL Var with 2 levels: Model using x = {1 if Female, 0 if Male}
– referred to as a “dummy” variable
– the value assigned zero is called the “base” level
Model: E(y) = β0 + β1x
-where E(y) is the mean salary
Modeling Qualitative Data (2 levels)
Interpreting betas in dummy variable model:
E(y) = β0 + β1x , where x = {1 if Female, 0 if Male}
x=0: E(y) = β0 = Mean salary for Males (𝝁𝑴 )
x=1: E(y) = β0 + β1 = Mean salary for Females (𝝁𝑭 )
β1 = 𝝁𝑭 – 𝝁𝑴 = Difference between mean salary of
Females and mean salary of Males
Test to Conduct:
H0: β1 = 0 (𝜇𝐹 =𝜇𝑀 ; no discrimination)
Ha: β1 < 0 (𝜇𝐹 p-value=0 → Reject H0
There is sufficient evidence (at α=.05) to conclude that the 1st-order
model with Age (x1) & # Bidders (x2) is statistically useful for
predicting Price (y).
WARNING: Rejecting Ho does not necessarily imply that the model is "best"
for predicting y..... Only that it is better than a model with no x’s.
Suppose p-value = .29 for overall F-test at α=.05 :
α=.05 < p-value=0 → Fail to reject H0
There is insufficient evidence (at α=.05) to conclude that the 1st-order
model with Age (x1) & # Bidders (x2) is statistically useful for
predicting Price (y).
MR Step 4 (continued)
Game plan for Step 4 in Multiple Regression:
1) Conduct Global F-test
if model is “statistically useful”, continue …
2) Test only “important” betas with t-tests
(we will give examples in future lectures)
3) Interpret R2 and 2s for “practical” utility
Desire R2 “high” and 2s “small”
STATISTIX software results:
Least Squares Linear Regression of PRICE
Predictor
Variables
Constant
AGE
NUMBIDS
Coefficient
-1338.95
12.7406
85.9530
R²
Adjusted R²
AICc
PRESS
0.8923
0.8849
319.55
646070
Source
Regression
Residual
Total
DF
2
29
31
Cases Included 32
Std Error
173.809
0.90474
8.72852
T
-7.70
14.08
9.85
Mean Square Error (MSE)
Standard Deviation
SS
4283063
516727
4799790
MS
2141531
17818.2
Missing Cases 0
F
120.19
P
0.0000
0.0000
0.0000
VIF
0.0
1.1
1.1
17818.2
133.485
P
0.0000
MR Step 4 (continued)
In MR, look at R2 = .892 (SX printout)
Interpretation:
89.2% of the sample variation in auction prices (y)
can be explained by the 1st-order model with Age (x1)
and Number of Bidders (x2).
Look at 2s = 2(133.5) = 267 (SX printout)
Interpretation:
95% of the sampled auction prices will fall within $267
of their predicted values using the 1st-order model
with Age (x1) and Number of Bidders (x2).
MR Example (Continued)
Step 5: Use the model for prediction/estimation
1) Predict Price (y) for a GF clock with …
Age(x1)=150 years and # Bidders (x2) =5 bidders
2) Estimate mean Price for all GF clocks with …
Age(x1)=150 years and # Bidders (x2) =5 bidders
STATISTIX software results:
Predicted/Fitted Values of PRICE
Lower Predicted Bound
Predicted Value
Upper Predicted Bound
SE (Predicted Value)
713.61
1001.9
1290.2
140.95
Unusualness (Leverage)
Percent Coverage
Corresponding T
0.1151
95
2.05
Lower Fitted Bound
Fitted Value
Upper Fitted Bound
SE (Fitted Value)
Predictor Values: AGE=150.00, NUMBIDS=5.0000
909.29
1001.9
1094.5
45.279
MR Step 5 (continued)
95% PI for y: (714, 1290)
Interpretation: We are 95% confident that the price
of a single 150 year old GF clock with 5 bidders will
fall between $714 & $1290.
95% CI for E(y): (909, 1095)
Interpretation: We are 95% confident that the
average price of all 150 year old GF clocks with 5
bidders will fall between $909 & $1,095.
Regression Analysis
Simple Linear Regression (cont’d)
Step 4: Statistically evaluate utility of the model
E(y) = β0 + β1x
Confidence Interval and/or Hypothesis Test on β1
Test: H0: β1 = 0 (model not useful)
Ha: β1 > 0 (model is useful, beta-value positive)
Test statistic: t = estimated β1/std. error of est.
Printout!!!!
STATISTIX software results:
Least Squares Linear Regression of STOCKRR
Predictor
Variables
Constant
MARKETRR
Coefficient
-0.93651
1.71825
R²
Adjusted R²
AICc
PRESS
0.9894
0.9876
11.996
20.566
Source
Regression
Residual
Total
DF
1
6
7
Cases Included 8
Std Error
0.43329
0.07272
T
-2.16
23.63
Mean SquareError (MSE)
Standard Deviation
SS
MS
744.004 744.004
7.996 1.33267
752.000
Missing Cases 0
Test statistic: t = 23.63 (SX printout)
P-value: p-value (1-tailed) = .0000/2 ≈ 0
F
558.28
P
0.0739
0.0000
1.33267
1.15441
P
0.0000
Simple Linear Regression (cont’d)
Step 4 (cont’d): α = .05 > p-value = 0.0000 → Reject H0
Conclusion: There is sufficient evidence (at α = .05) to indicate
that that StockRR is positively related to MarketRR. Therefore,
the model is statistically useful for predicting StockRR (y)
If “Fail to Reject H0”, insufficient evidence to indicate…
Do not say x is not a useful predictor (a more complex model
with x may fit better)
Simple Linear Regression (cont’d)
95% CI for slope, β1: Estimate + tα/2 (Std. error of est.)
Least Squares Linear Regression of STOCKRR
Predictor
Variables
Constant
MARKETRR
Coefficient
-0.93651
1.71825
Std Error
0.43329
0.07272
T
-2.16
23.63
P
0.0739
0.0000
Using SX printout: 1.72 ± 2(.073) = (1.58, 1.86)
Interpretation:
We are 95% confident that for every 1% increase in MarketRR
(x), StockRR (y) will increase between 1.58% & 1.86%
[Note: Since interval is above 1, 95% confident
that stock is “risky” / aggressive”]
Simple Linear Regression (cont’d)
Step 4 (cont’d): Numerical Measures of Model Fit
1) Coefficient of Determination (R2)
2) Coefficient of Correlation (r)
Coefficient of Determination – measures percentage of
variation in y “explained” by the model
Coefficient of Correlation – measures the strength of the
linear relationship between y and x
Coefficient of Determination (R2)
0 < R2 < 1
R2 = (SSyy – SSE)/SSyy
where SSyy = ∑(y - )2 and SSE=∑(y - )2
Coefficient of Determination (R2) (cont’d)
Least Squares Linear Regression of STOCKRR
Predictor
Variables
Constant
MARKETRR
Coefficient
-0.93651
1.71825
R²
0.9894
Std Error
0.43329
0.07272
T
-2.16
23.63
Mean SquareError (MSE)
P
0.0739
0.0000
1.33267
Interpretation:
≈ 99% of the sample variation in StockRR (y) values can
be “explained” by the linear model with MarketRR (x)
Warning: R2 can be artificially inflated (forced to 1) by
adding x’s to the model so that (n-1) = # of x’s
Coefficient of Determination (R2) (cont’d)
Recommendation for Step 4:
a) Test Ho: β1 = 0 and/or confidence interval for β1
b) If model is “statistically” useful, interpret R2 and Std Dev
Coefficient of Correlation (r)
Note: r is NOT on Statistix regression printout
r = + 𝑅2 = .9894 = .9947
-1 < r < 1
r measures the strength of the linear relationship
between y and x
Interpretation of r = .9947 :
Evidence of strong, positive, linear relationship
between MarketRR (x) & StockRR (y)
STATISTIX software results:
Values of r
Coefficient of Correlation (r) (cont’d)
Caveats:
1) Dangerous to use r to imply a causal relationship
between y and x!
2) Correlation r is limited to linear relationships
3) All information in r is contained in slope β1
(plus more)
No need to include interpretation of r into SLR
steps (as an interpretation for the estimated slope
says more/serves our purpose)
Regression Analysis
Lecture 8: (Chapter 11) Simple Linear Regression
Regression Goal: Predict the value of one QN
variable from values of related variables.
Think inputs and outputs!
Dependent variable (DV): QN variable to be
predicted (y)
Independent variables (IVs): predictor variables
(x1,x2...)
Experimental units: objects upon which
measurements (y, x's) are taken
Simple Linear Regression (cont’d)
Example 1 (Real estate): Predict sale price of a home
DV=Sale Price (y)
IVs: Size (x1), Age (x2), Neighborhood (x3)
Exp. Unit = Home
Example 2 (Racing): Predict time of winning dog
DV=Winning Time (y)
IVs: Time at 1st turn(x1), Weight(x2), Class(x3)
Exp. Unit = Winning Greyhound Racer
Simple Linear Regression (cont’d)
“Simple” → single IV (x), QN in nature
“Linear” → use a straight-line model to relate y to x
Two types of linear models:
Deterministic: y = β0 + β1x
β0 = y-intercept
β1 = slope
Probabilistic: y = β0 + β1x + ε
ε = random error
Simple Linear Regression (cont’d)
We will use the more realistic probabilistic model
Using a linear model of the form
where
y = E(y) + ε
E(y) = β0 + β1x = Expected (mean) value of y
ε = random error
Model implies that the "average
values of y" fall on a straight
line --- not the y's themselves.
Simple Linear Regression (cont’d)
Steps to follow in SLR
1) Hypothesize the model: E(y) = β0 + β1x
2) Collect data; estimate βo and β1
0 + 𝛽
1 x
Yields prediction equation: 𝑦ො = 𝛽
e.g., : 𝑦ො = 10 + 2x
3) Assumptions on random error (ε)
4) Test model utility (Is model useful for predicting y?)
5) If yes, use model for prediction/inferences
Why the term “regression”?
Frances Galton (1886) – “Law of universal regression”
Studied heights of fathers & sons
Tall (short) fathers tended to have tall (short) sons,
but not as tall (short) as the fathers
Why the term “regression”?
Karl Pearson – analyzed the data
y = Son’s height, x = Father’s height (inches)
Estimated equation:
= 33 + .5x
Supported Galton’s law
Pearson known as the “father of regression”
SLR Example (Finance)
Goal --predict a particular stock’s annual rate of return
for 2017 based on average annual rate of return for a
market portfolio of stocks (also determine stock’s “risk
Index” – beta value)
DV: y = STOCKRR = Annual rate of return (%) of
Stock named STATLOGIC
IV: x = MARKETRR = Annual rate of return (%) of
Market Portfolio of stocks
Experimental unit = Year
SLR Example (Finance)
Step 1: y = β0+ β1x + ε
Step 2: Estimate βo and β1 -- need data to do this;
Sample of n=8 years (2009 – 2016)
Scatter Plot of STOCKRR vs MARKETRR
18
STOCKRR
12
6
0
-6
-12
-6
0
6
MARKETRR
12
SLR Example (Finance)
How to estimate y-intercept & slope from data?
Find line with the “best fit” based on error = (y - 𝑦)
ො
‘Best” line: (1) average error = 0
(2) sum of squared errors (SSE) minimized
Use Method of Least Squares (MLS)
STATISTIX software results:
Least Squares Linear Regression of STOCKRR
Predictor
Variables
Constant
MARKETRR
Coefficient
-0.93651
1.71825
R²
Adjusted R²
AICc
PRESS
0.9894
0.9876
11.996
20.566
Source
Regression
Residual
Total
DF
1
6
7
Cases Included 8
Std Error
0.43329
0.07272
T
-2.16
23.63
Mean SquareError (MSE)
Standard Deviation
SS
744.004
7.996
752.000
MS
744.004
1.33267
Missing Cases 0
P
0.0739
0.0000
1.33267
1.15441
F
558.28
P
0.0000
SLR Example (Finance)
0 = -.94,
SX results: : 𝛽
1 = 1.72
𝛽
LS line: : 𝑦ො = -.94 + 1.72(x); SSE = 7.996
Practical interpretation of y-intercept:
For a year when MarketRR (x) = 0%,
we predict StockRR (y) = -.94%
Practical interpretation of slope (beta value):
For every 1% increase in MarketRR (x),
we estimate StockRR (y) to increase 1.72%
[Note: if slope > 1, then stock is “risky”/”aggressive”]
Lecture 9: SLR (Cont’d)
Finance Example (cont’d)
Step 3: Assumptions on error (ε)
1) E(ε) = 0, i.e., mean error is zero
2) Var(ε) = σ2 is constant for all x-values
3) ε is Normally distributed
4) ε’s are Independent
STATISTIX software results:
Least Squares Linear Regression of STOCKRR
Predictor
Variables
Constant
MARKETRR
Coefficient
-0.93651
1.71825
R²
Adjusted R²
AICc
PRESS
0.9894
0.9876
11.996
20.566
Source
Regression
Residual
Total
DF
1
6
7
Cases Included 8
Std Error
0.43329
0.07272
T
-2.16
23.63
Mean Square Error (MSE)
Standard Deviation
SS
744.004
7.996
752.000
MS
744.004
1.33267
Missing Cases 0
P
0.0739
0.0000
1.33267
1.15441
F
558.28
P
0.0000
Simple Linear Regression (cont’d)
Estimate σ2 and σ:
s2 = MSE = SSE/(n-2) = 7.996/6 = 1.33 (SX printout)
s = √s2 = √1.33 = 1.15 (SX printout)
Practical Interpretation of s:
We expect about 95% of the years to have
actual stock rates of return that fall within
2s = 2.3% of their predicted rates of return.
Model will be practically useful if 2s is “small”
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Achiever Papers is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Dissertation Writing Service Works
First, you will need to complete an order form. It's not difficult but, if anything is unclear, you may always chat with us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order form
Once we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignment
As soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download