A dataset will be distributed on the attachment and your job will be to read the data, do any needed data manipulation, conduct several analyses, and interpret and describe the results.
Final project will consist of several statistical problems based on the statistical methods covered in this course.
To receive maximum points, you need to have:
Appropriate interpretation of the results
Adequate formatting of the document
Final Take Home Project
INTG1-GC 1055
Final Project– take home data analysis
Final project consists of several statistical problems based on the statistical methods covered in
this course.
• To receive maximum points, you need to have:
o Accurate analysis
o Appropriate tables/graphs and reports
o Correct wording when reporting results
o Appropriate interpretation of the results
o Adequate formatting of the document
• Report should be submitted as a PDF document with 1-inch margins
Open Nightclub Price Fairness FINAL exam.sav in SPSS and check all variables and examine
data. Please test hypothesis and use 7 step approach where applicable.
This dataset includes information about customer’s perception regarding pricing strategies for
entrance fees that nightclubs use.
This study used 5×2 full-factorial experimental design.
Participants were randomly assigned to one of 10 scenarios and asked to imagine that they are
customer in one of the scenarios:
Two variables were manipulated:
• Customer type based on how many times they have been to the nightclub in scenario
(CustomerType). Participants were asked to imaging that they are one of the following
two types:
o Frequent customer o First time customer
• Pricing strategies (variable Pricing_Strategy) with 5 conditions:
o Day of the week – pricing strategy that calls for change in price of entrance fee
depending on the day of the week (higher prices on weekend)
o Time of a day – pricing strategy that calls for change in price of entrance fee
depending on the time when the customer shows up at the door (higher prices
on late arrivals)
o Reservation in advance – pricing strategy that calls for change in price for
customers that make reservations (higher prices for customers without
reservation)
o VIP entrance – pricing strategy that calls for change in price for customers that
do not wait in the line and those that wait in line (higher prices to go around the
line)
o Flat pricing – pricing strategy that always includes the same price for all
customers.
After learning about pricing strategy, participants were asked to answer a series of questions
regarding their perception of the strategy.
They were asked to evaluate:
• Perceived price fairness – measured with 3 items (Fair1, Fair2 and Fair3)
• Word-of-mouth (would they recommend this nightclub to others) – measured with 3
items (WOM1, WOM2 and WOM3)
• Return Intention (would they return to this nightclub) – measured with 3 items (RI1, RI 2
and RI 3)
• Familiarity with the pricing strategy from scenario – measured with 2 items (FAM1 and
FAM2)
After that participants were asked how often they go to nightclubs and a series of demographic
questions (gender, age, ethnicity, income, employment status, and education)
Part 1 – Descriptives – 60 points
• Create descriptive statistics for all demographic variables. Report means or frequencies
depending on the type of variable and include graphs when applicable
o Gender
o Age
o Ethnicity
o Income
o employment status
o education
• Compute new variables:
o Perceived_Fairness – Average of Fair 1, 2 and 3
o Word_of_mouth – Average of WOM 1, 2 and 3
o Return_Intention – Average of RI 1, 2 and 3
o Familiarity – Average of FAM 1 and 2
• Create descriptive statistics for newly computed variables and report means and
standard deviations.
Part 2 – Factor analysis – 60 points
• Conduct Principal component analysis with Oblimin rotation on 9 variables (Fair1, Fair2,
Fair3, WOM1, WOM2, WOM3, RI1, RI2, RI3)
• Use absolute value bellow 0.4
• Recognize and name extracted factors.
• Save Factor sores as variables using Regression
• Calculate Cronbach’s alpha for each new factor
• Report results
Part 3 – Hypothesis testing 1- 60 points
• Use appropriate tests to test out following hypotheses:
o H1: The average age is higher than 25 years.
o H2: Frequent customers have higher word-of-mouth (use variable that you
calculated for part 1) than first time customers.
o H3: Familiarity with the type of pricing strategy (FAM1) has a median lower than
4 on a scale from 1-7
o H4: Familiarity with the type of pricing strategy (FAM1) has a higher median for
frequent customers compared to first-time customer.
o H5: Female customers are more likely to be first-time customers compared to
male customers that are more likely to be frequent customers
• Use 7-step approach and report results for each hypothesis.
Part 4 – Hypothesis testing 2 – Analysis of Variance – 60 points
• Conduct 5×2 full-factorial ANOVA
• Test following hypotheses:
o H6: First time customers have higher overall perceived price fairness compared
to frequent customers.
o H7: Customer type (Frequent customer vs. First time customer) moderates the
relationship between type of pricing strategy and perceived price fairness
(calculated in part1).
• Use 7-step approach and report results for each hypothesis.
• Conduct 5×2 full-factorial MANCOVA
• Test following hypothesis:
o H7: Customer type (Frequent customer vs. First time customer) moderates the
relationship between type of pricing strategy and combination of variables
(perceived price fairness, Word-of-mouth and return intention – all calculated in
part 1) when controlling for familiarity (calculated in part1)
• Use 7-step approach and report results.
Part 5 – Hypothesis testing 3 – Regression – 60 points
• Conduct regression analysis
• Build a model that would include following predictors of word-of-mouth
o Age
o Gender
o Familiarity
o Customer type
o Perceived fairness
• Propose hypotheses for each potential relationship between independent and
dependent variables
• Use 7-step approach and report results.
Part 6 – Clustering – Optional
• Conduct k-means clustering analysis with 2 clusters
• Use following clustering variables
o Perceived_Fairness – Average of Fair 1, 2 and 3
o Word_of_mouth – Average of WOM 1, 2 and 3
o Return_Intention – Average of RI 1, 2 and 3
o Familiarity – Average of FAM 1 and 2
• Run the analysis even if assumptions have not been met
• Use 7-step approach and report results
• Interpret clustering solution
• Name clusters
• Report differences in means between clusters
• Report demographic differences between clusters
Final Take Home Project
INTG1-GC 1055
Final Project– take home data analysis
Final project consists of several statistical problems based on the statistical methods covered in
this course.
• To receive maximum points, you need to have:
o Accurate analysis
o Appropriate tables/graphs and reports
o Correct wording when reporting results
o Appropriate interpretation of the results
o Adequate formatting of the document
• Report should be submitted as a PDF document with 1-inch margins
Open Nightclub Price Fairness FINAL exam.sav in SPSS and check all variables and examine
data. Please test hypothesis and use 7 step approach where applicable.
This dataset includes information about customer’s perception regarding pricing strategies for
entrance fees that nightclubs use.
This study used 5×2 full-factorial experimental design.
Participants were randomly assigned to one of 10 scenarios and asked to imagine that they are
customer in one of the scenarios:
Two variables were manipulated:
• Customer type based on how many times they have been to the nightclub in scenario
(CustomerType). Participants were asked to imaging that they are one of the following
two types:
o Frequent customer o First time customer
• Pricing strategies (variable Pricing_Strategy) with 5 conditions:
o Day of the week – pricing strategy that calls for change in price of entrance fee
depending on the day of the week (higher prices on weekend)
o Time of a day – pricing strategy that calls for change in price of entrance fee
depending on the time when the customer shows up at the door (higher prices
on late arrivals)
o Reservation in advance – pricing strategy that calls for change in price for
customers that make reservations (higher prices for customers without
reservation)
o VIP entrance – pricing strategy that calls for change in price for customers that
do not wait in the line and those that wait in line (higher prices to go around the
line)
o Flat pricing – pricing strategy that always includes the same price for all
customers.
After learning about pricing strategy, participants were asked to answer a series of questions
regarding their perception of the strategy.
They were asked to evaluate:
• Perceived price fairness – measured with 3 items (Fair1, Fair2 and Fair3)
• Word-of-mouth (would they recommend this nightclub to others) – measured with 3
items (WOM1, WOM2 and WOM3)
• Return Intention (would they return to this nightclub) – measured with 3 items (RI1, RI 2
and RI 3)
• Familiarity with the pricing strategy from scenario – measured with 2 items (FAM1 and
FAM2)
After that participants were asked how often they go to nightclubs and a series of demographic
questions (gender, age, ethnicity, income, employment status, and education)
Part 1 – Descriptives – 60 points
• Create descriptive statistics for all demographic variables. Report means or frequencies
depending on the type of variable and include graphs when applicable
o Gender
o Age
o Ethnicity
o Income
o employment status
o education
• Compute new variables:
o Perceived_Fairness – Average of Fair 1, 2 and 3
o Word_of_mouth – Average of WOM 1, 2 and 3
o Return_Intention – Average of RI 1, 2 and 3
o Familiarity – Average of FAM 1 and 2
• Create descriptive statistics for newly computed variables and report means and
standard deviations.
Part 2 – Factor analysis – 60 points
• Conduct Principal component analysis with Oblimin rotation on 9 variables (Fair1, Fair2,
Fair3, WOM1, WOM2, WOM3, RI1, RI2, RI3)
• Use absolute value bellow 0.4
• Recognize and name extracted factors.
• Save Factor sores as variables using Regression
• Calculate Cronbach’s alpha for each new factor
• Report results
Part 3 – Hypothesis testing 1- 60 points
• Use appropriate tests to test out following hypotheses:
o H1: The average age is higher than 25 years.
o H2: Frequent customers have higher word-of-mouth (use variable that you
calculated for part 1) than first time customers.
o H3: Familiarity with the type of pricing strategy (FAM1) has a median lower than
4 on a scale from 1-7
o H4: Familiarity with the type of pricing strategy (FAM1) has a higher median for
frequent customers compared to first-time customer.
o H5: Female customers are more likely to be first-time customers compared to
male customers that are more likely to be frequent customers
• Use 7-step approach and report results for each hypothesis.
Part 4 – Hypothesis testing 2 – Analysis of Variance – 60 points
• Conduct 5×2 full-factorial ANOVA
• Test following hypotheses:
o H6: First time customers have higher overall perceived price fairness compared
to frequent customers.
o H7: Customer type (Frequent customer vs. First time customer) moderates the
relationship between type of pricing strategy and perceived price fairness
(calculated in part1).
• Use 7-step approach and report results for each hypothesis.
• Conduct 5×2 full-factorial MANCOVA
• Test following hypothesis:
o H7: Customer type (Frequent customer vs. First time customer) moderates the
relationship between type of pricing strategy and combination of variables
(perceived price fairness, Word-of-mouth and return intention – all calculated in
part 1) when controlling for familiarity (calculated in part1)
• Use 7-step approach and report results.
Part 5 – Hypothesis testing 3 – Regression – 60 points
• Conduct regression analysis
• Build a model that would include following predictors of word-of-mouth
o Age
o Gender
o Familiarity
o Customer type
o Perceived fairness
• Propose hypotheses for each potential relationship between independent and
dependent variables
• Use 7-step approach and report results.
Part 6 – Clustering – Optional
• Conduct k-means clustering analysis with 2 clusters
• Use following clustering variables
o Perceived_Fairness – Average of Fair 1, 2 and 3
o Word_of_mouth – Average of WOM 1, 2 and 3
o Return_Intention – Average of RI 1, 2 and 3
o Familiarity – Average of FAM 1 and 2
• Run the analysis even if assumptions have not been met
• Use 7-step approach and report results
• Interpret clustering solution
• Name clusters
• Report differences in means between clusters
• Report demographic differences between clusters
Minhui Zhang
Paul A. Wardle
Statistical Measurement, Analysis & Research, Sec015
11/27/2022
Assignment 5: T-test, ANOVA, ANCOVA, Hypothesis Testing
I.
SPSS Hypothesis Testing
A. Hypothesis 1
B. Hypothesis 2
C. Hypothesis 3
D. Hypothesis 4
E. Hypothesis 5
II.
SPSS ANOVA with Post-hoc analysis
III.
SPSS Two-way ANOVA/ANCOVA
A. Research question 1
B. Research Question 2
1
Assignment 5: T-test, ANOVA, ANCOVA, Hypothesis Testing
Minhui Zhang
Paul A. Wardle
Statistical Measurements, Analysis & Research, Sec015
11/27/2022
2
Assignment 5: T-test, ANOVA, ANCOVA, Hypothesis Testing
1) SPSS Hypothesis Testing
a) H1: Average monthly earnings before Covid19 crisis (EarningsPreCovid) of service
industry workers were above $3000 – Compare mean to predetermined value and
check direction.
i.
Step 1: Formulate hypothesis
𝐻0 : 𝜇 = $3000 Average monthly earnings before Covid19 crisis of service
industry workers were $3000.
𝐻1 : 𝜇 > $3000 Average monthly earnings before Covid19 crisis of service
industry workers were above $3000
ii.
Step 2: Significance level = 0.05
Independent variable = EarningsPreCovid
Variable level of measure is scale
iii. Step 3: Determine if Samples Are Paired or Independent
The sample is independent
iv.
Step 4: Choose a test and Check Assumptions
One-sample t-test
The sample is larger than 30 hence we assume normality.
v.
Step 5: Conduct the Test
vi.
vii.
Step 6: Interpretation
The average monthly earnings before Covid19 (M = 3100.68, SD = 2063.84)
was higher than the hypothesized average monthly earnings before Covid19 of
$3000. The average monthly earnings before Covid19 was not statistically
significant than the hypothesized score, t(413) = 0.993, p = 0.161.
Step 7: Reporting results
There was no statistically significant difference between means (p = 0.161).
Therefore, we fail to reject the null hypothesis and conclude Average monthly
earnings before Covid19 crisis of service industry workers were $3000.
3
b) H2: Average monthly earnings before Covid19 crisis (EarningsPreCovid) of service
industry workers were higher than average monthly earnings after Covid19 crisis
(EarningsPostCovid) – Compare means and check direction
i.
Step 1: Formulate hypothesis
𝐻0 : 𝜇1 = 𝜇2 Average monthly earnings before Covid19 crisis
(EarningsPreCovid) of service industry workers were same to average
monthly earnings after Covid19 crisis (EarningsPostCovid).
𝐻1 : 𝜇1 > 𝜇2 Average monthly earnings before Covid19 crisis
(EarningsPreCovid) of service industry workers were higher than average
monthly earnings after Covid19 crisis (EarningsPostCovid).
ii.
Step 2: Significance level = 0.05
Variables level of measure is scale
iii. Step 3: Determine if Samples Are Paired or Independent
The sample is dependent
iv.
Step 4: Choose a test and Check Assumptions
Paired samples t-test
The sample is larger than 30 hence we assume normality.
The two samples are balanced (1:4 or less ratio) hence we assume equality of
variance.
v.
Step 5: Conduct the Test
vi.
Step 6: Interpretation
4
vii.
The average monthly earnings before Covid19 (M = 3100.68, SD = 2063.84)
was higher than the average monthly earnings after Covid19 crisis (M =
1788.06, SD = 2073.49). The average monthly earnings before Covid19 were
highly positively correlated with average monthly earnings after Covid19
crisis (r = 0.71, p < 0.001. There was a significant difference between the
average monthly earnings before Covid19 and the average monthly earnings
after Covid19, t(413) = 17.04, p < 0.001.
Step 7: Reporting results
There was statistically significant difference between the average monthly
earnings before Covid19 and the average monthly earnings after Covid19 (p <
0.001). Therefore, we reject the null hypothesis and conclude Average monthly
earnings before Covid19 crisis (EarningsPreCovid) of service industry
workers were higher than average monthly earnings after Covid19 crisis
(EarningsPostCovid).
5
c) H3: Service industry workers that are paid annual salary had a bigger reduction in
earnings (EarningsPostCovidPercent) than those that were on hourly wage - Compare
means and check direction
i.
Step 1: Formulate hypothesis
𝐻0 : 𝜇1 = 𝜇2 Service industry workers that are paid annual salary had similar
reduction in earnings (EarningsPostCovidPercent) to those that were on hourly
wage.
𝐻1 : 𝜇1 > 𝜇2 Service industry workers that are paid annual salary had a bigger
reduction in earnings (EarningsPostCovidPercent) than those that were on
hourly wage
ii.
iii.
iv.
v.
vi.
vii.
Step 2: Significance level = 0.05
EarningsType variable is nominal measure.
EarningsPostCovidPercent is scale measure.
Step 3: Determine if Samples Are Paired or Independent
The sample is independent
Step 4: Choose a test and Check Assumptions
Independent sample t-test
The sample is larger than 30 hence we assume normality.
The two samples are balanced (1:4 or less ratio) hence we assume equality of
variance.
Step 5: Conduct the Test
Step 6: Interpretation
The hourly wage (M = 48.75, SD = 41.36) was higher than the annual wage
(M = 65.21, SD = 38.76). There was a significant difference between hourly
wage and annual wage, t(412) = -3.721, p < 0.001.
Step 7: Reporting results
There was a statistically significant difference between service industry
workers that are paid annual salary and those that were on hourly wage (p <
0.001). Therefore, we reject the null hypothesis and conclude that service
6
industry workers that are paid annual salary had a bigger reduction in earnings
(EarningsPostCovidPercent) than those that were on hourly wage.
7
d) H4 – More service industry workers are paid wage than salary (EarningsType) Compare frequencies to expected values and check direction
i.
Step 1: Formulate hypothesis
𝐻0 : 𝜇1 = 𝜇2 Service industry workers were paid wages similar to salaries.
𝐻1 : 𝜇1 > 𝜇2 More service industry workers are paid wage than salary
(EarningsType).
ii.
Step 2: Significance level = 0.05
Variables level of measure is nominal
EarningsType is nominal
EmploymentStatus is nominal
iii. Step 3: Determine if Samples Are Paired or Independent
The sample is independent
iv.
Step 4: Choose a test and Check Assumptions
Chi-square test
The sample is larger than 30 hence we assume normality.
The two samples are balanced (1:4 or less ratio) hence we assume equality of
variance.
v.
Step 5: Conduct the Test
vi.
vii.
Step 6: Interpretation
More service industry workers were paid in hourly wages (N = 296) than in
annual salary (N = 118). There was a significant difference between hourly
wage and annual wage among service industry workers, 𝜒2= 76.531, p <
0.001.
Step 7: Reporting results
There was a statistically significant difference between the number of service
industry workers that are paid in wages and in salary (p < 0.001). Therefore,
8
we reject the null hypothesis and conclude that more service industry workers
are paid wage than salary (EarningsType).
9
e) H5 - Service industry workers that are paid annual salary are less likely to be on
unpaid leave (EmploymentStatus) than those that are on hourly wage - Compare
frequencies in crosstabulation table and look at subscript to determine specific
differences in frequencies/proportions between groups
i.
Step 1: Formulate hypothesis
𝐻0 :There is no association between earnings type and employment status
among service industry workers.
𝐻1 : There is an association between earnings type and employment status
among service industry workers.
ii.
iii.
iv.
v.
Step 2: Significance level = 0.05
Variables level of measure is nominal
Step 3: Determine if Samples Are Paired or Independent
The sample is independent
Step 4: Choose a test and Check Assumptions
Chi-square test of independence
Expected frequencies for each cell are at least 1.
Expected frequencies are at least 5 for the majority (80%) of the cells.
Step 5: Conduct the Test
10
vi.
vii.
Step 6: Interpretation
The Chi-square test for independence indicated a significant association with a
small effect size between gender and BMI, 𝜒2(2, n=414) = 7.498, p-value =
0.024, phi = 0.135. More service workers on annual salary were on paid leave
(23.1%) than those on unpaid leave (21.2%).
Step 7: Reporting results
There was a statistically significant difference between earnings type and
employment status among service industry workers (p = 0.024). Therefore, we
reject the null hypothesis and conclude that service industry workers that are
paid annual salary are less likely to be on unpaid leave (EmploymentStatus)
than those that are on hourly wage.
11
2) SPSS ANOVA with Post-hoc analysis
i.
Step 1: Recognize research question and formulate hypotheses
Research question: Do customers that travel for different reasons (business,
leisure and mixed) have different levels of customer satisfaction with hotel
rooms?
Formulate Hypothesis:
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 Customers that travel for different reasons (business, leisure
and mixed) have similar levels of customer satisfaction with hotel rooms.
𝐻0 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝜇𝑖 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡. Business travelers have lower satisfaction
than leisure travelers and travelers that stay in hotel for mixed reasons.
ii.
Step 2: Determine how many variables you are comparing
Significance level = 0.05.
Independent variable is Travel Reason – nominal measure
Dependent variable is SAT – scale measure
iii.
Step 3: Determine sample size for each group (minimum of 20 per group)
Customers travelling for business reasons are (N =125)
Customers travelling for leisure reasons are (N =589)
Customers travelling for mixed reasons are (N =48)
iv.
Step 4: Check Assumptions
The sample is larger than 30 hence we assume normality for the dependent
variable, SAT.
The variances are equal across the groups.
v.
Step 5: Analysis
12
A one-way between-groups analysis of variance was conducted to explore the
association of travel reason and SAT. There was a significant difference at the
p < 0.05 level in SAT for the three travel reason categories: F(2,761)=5.136, pvalue=0.006. The effect size for the mean difference between SAT scores was
small (eta-squared=0.013). Post-hoc comparisons using the Tukey HSD test
indicated that the mean SAT score for business (Mean=4.81, SD=1.22) was
significantly different from leisure (Mean=5.18, SD=1.23), but it did not
significantly differ from mixed travel reason (Mean=4.91, SD=1.32).
Similarly, Tukey HSD test indicated that the mean SAT score for leisure
13
(Mean=5.18, SD=1.23) was significantly different from business (Mean=4.81,
SD=1.22), but it did not significantly differ from mixed travel reason
(Mean=4.91, SD=1.32) Nevertheless, mixed travel reasons was not
significantly different from leisure and business travel reasons.
vi.
Step 6: Interpretation
The mean SAT score for business was (Mean=4.81, SD=1.22), leisure
(Mean=5.18, SD=1.23), and mixed travel reason (Mean=4.91, SD=1.32). The
ANOVA table shows there was significant difference between the different
travel reasons among the customers F(2,761)=5.136, p-value=0.006.
𝑅2 = 1 −
𝑅2 = 1 −
𝑆𝑆𝐸
𝑆𝑆𝑇
1149.77
1165.331
𝑅 2 = 0.98664
The 𝑅 2 = 0.98664 indicating that about 98.66% of the variation in SAT
scores is explained by travel reason.
vii.
Step 7: Reporting results
Since the ANOVA showed there was significant difference between the
different travel reasons among the customers, we reject the null hypothesis and
conclude business travelers have lower satisfaction than leisure travelers and
travelers that stay in hotel for mixed reasons.
14
3) SPSS Two-way ANOVA/ANCOVA
Research Question 1
i.
Step 1: Recognize research question and formulate hypotheses.
• Research Question 1 – Do Age Groups (Younger vs Older) moderate the
relationship between Design Style (Modern vs Traditional) and desire to stay.
• Hypothesis
H0: Age Groups (Younger vs Older) have no effect on the relationship between
Design Style (Modern vs Traditional) and desire to stay.
H1: Age Groups (Younger vs Older) moderate the relationship between Design
Style (Modern vs Traditional) and desire to stay.
ii.
Step 2: Determine how many variables you are comparing
Significant level = 0.05.
DTS is the dependent variable in scale measure
Age_M and DesignMT are independent variables in nominal measures.
iii.
Step 3: Determine sample size for each cell (minimum of 20 per cell). Sample
size should also increase with addition of covariates.
The sample size of the customers who preferred traditional design style among the
older age group was 180 while customers who preferred modern design style
among the older age group was 189. Similarly, customers who preferred
traditional design style among the younger age group was 201 while customers
who preferred modern design style among the younger age group was 192.
iv.
Step 4: Check Assumptions
The Levene’s test of equality of error variance was not significant (p=0.104). We
fail to reject the null hypothesis and proceed with the test.
v.
Step 5: ANOVA/ANCOVA analysis.
15
vi.
Interpretation
• Means
The average traditional design style among the older age group was (M=4.52,
SD=1.28) while modern design among the older age group was (M=4.49,
SD=1.31). The average traditional design style among the younger age group was
(M=4.67, SD=1.44) while modern design among the younger age group was
(M=5.15, SD=1.22).
• ANOVA/ANCOVA tables
•
•
•
•
Interpret main effects
Firstly, the main effect of Age_M was statistically significant, F(1, 758) =
18.437, p < 0.001.
Secondly, the main effect of DesignMT was statistically significant, F(1, 758)
= 5.630, p = 0.018.
Interpret interaction effect
The interaction effect of Age_M*DesignMT was statistically significant
F(1,758) = 12.666, p = 0.007.
Effect size (Partial-eta-Squared)
The partial eta squared for main effect of Age_M was 0.024 hence small
effect.
The partial eta squared for main effect of DesignMT was 0.007 hence little or
no effect.
The partial eta squared for the interaction effect of Age_M*DesignMT was
0.010 hence little or no effect.
Check Plots
16
The plot shows more older customers preferred modern hotel room design
than the younger age group.
vii.
Reporting results
The two-way ANOVA showed a statistical significance in the main effect of
Age_M and main effect of DesignMT. Similarly, there was statistical significance
in the interaction effect of Age_M*DesignMT. Therefore, we reject the null
hypothesis and conclude that Age Groups (Younger vs Older) moderate the
relationship between Design Style (Modern vs Traditional) and desire to stay.
17
Research Question 2
viii. Step 1: Recognize research question and formulate hypotheses.
• Research Question 2 – Do Age Groups (Younger vs Older) moderate the
relationship between Design Style (Modern vs Traditional) and desire to stay
when controlling for scenario realism.
• Hypothesis
H0: Age Groups (Younger vs Older) have no effect on the relationship between
Design Style (Modern vs Traditional) and desire to stay when controlling for
scenario realism.
H1: Age Groups (Younger vs Older) moderate the relationship between Design
Style (Modern vs Traditional) and desire to stay when controlling for scenario
realism.
ix.
Step 2: Determine how many variables you are comparing
Significant level = 0.05.
DTS is the dependent variable in scale measure
Age_M and DesignMT are independent variables in nominal measures.
Realism is an independent variable with scale measure.
x.
Step 3: Determine sample size for each cell (minimum of 20 per cell). Sample
size should also increase with addition of covariates.
The sample size of the customers who preferred traditional design style among the
older age group was 180 while customers who preferred modern design style
among the older age group was 189. Similarly, customers who preferred
traditional design style among the younger age group was 201 while customers
who preferred modern design style among the younger age group was 192.
xi.
Step 4: Check Assumptions
18
The Levene’s test of equality of error variance was significant (p=0.025). We
reject the null hypothesis and continue with the test.
xii.
Step 5: ANOVA/ANCOVA analysis.
xiii.
Interpretation
• Means
The average traditional design style among the older age group was (M=4.52,
SD=1.28) while modern design among the older age group was (M=4.49,
SD=1.31). The average traditional design style among the younger age group was
(M=4.67, SD=1.44) while modern design among the younger age group was
(M=5.15, SD=1.22).
• ANOVA/ANCOVA tables
•
•
•
Interpret main effects
Firstly, the main effect of Age_M was statistically significant, F(1, 757) =
20.12, p < 0.001.
Secondly, the main effect of DesignMT was statistically significant, F(1, 757)
= 10.993, p < 0.001.
The main effect of Realism was also statistically significant, F(1, 757) =
287.975, p < 0.001.
Interpret interaction effect
The interaction effect of Age_M*DesignMT was statistically significant
F(1,757) = 4.958, p = 0.026.
Effect size (Partial-eta-Squared)
19
•
The partial eta squared for main effect of Age_M was 0.026 hence small
effect.
The partial eta squared for main effect of DesignMT was 0.014 hence small
effect.
The partial eta squared for main effect of Realism was 0.276 hence medium
effect.
The partial eta squared for the interaction effect of Age_M*DesignMT was
0.010 hence little or no effect.
Check Plots
The plot shows more older customers preferred modern hotel room design
than the younger age group.
20
xiv.
Reporting results
The two-way ANOVA while controlling for scenario realism showed a statistical
significance in the main effect of Age_M and main effect of DesignMT. Similarly,
there was statistical significance in the interaction effect of Age_M*DesignMT.
The partial eta squared showed that Realism had a medium effect on the DTS.
Therefore, we reject the null hypothesis and conclude that Age Groups (Younger
vs Older) moderate the relationship between Design Style (Modern vs Traditional)
and desire to stay when controlling for scenario realism.
Minhui Zhang
Paul A. Wardle
Statistical Measurement, Analysis & Research, Sec015
11/27/2022
Assignment 5: T-test, ANOVA, ANCOVA, Hypothesis Testing
I.
SPSS Hypothesis Testing
A. Hypothesis 1
B. Hypothesis 2
C. Hypothesis 3
D. Hypothesis 4
E. Hypothesis 5
II.
SPSS ANOVA with Post-hoc analysis
III.
SPSS Two-way ANOVA/ANCOVA
A. Research question 1
B. Research Question 2
Assignment 5: T-test, one-way ANOVA, two-way ANOVA and hypotheses testing INTG1-GC 1055
Complete the following exercises. Make sure to include ALL calculations and outputs to receive
full credit. Be concise and specific in your answers. Please provide all tables and interpretations
of results where needed in Word or PDF file. Provide Output file as PDF.
1) SPSS hypothesis testing
Open CovidEmployeesFIN1055.sav in SPSS and check all variables and examine data
Please test following hypotheses and use 7-step approach for each hypothesis. Each hypothesis
is caring the same number of points:
a) H1: Average monthly earnings before Covid19 crisis (EarningsPreCovid) of service
industry workers were above $3000 - Compare mean to predetermined value and check
direction
b) H2: Average monthly earnings before Covid19 crisis (EarningsPreCovid) of service
industry workers were higher than average monthly earnings after Covid19 crisis
(EarningsPostCovid) - Compare means and check direction
c) H3: Service industry workers that are paid annual salary had a bigger reduction in
earnings (EarningsPostCovidPercent) than those that were on hourly wage - Compare
means and check direction
d) H4 – More service industry workers are paid wage than salary (EarningsType) - Compare
frequencies to expected values and check direction
e) H5 - Service industry workers that are paid annual salary are less likely to be on unpaid
leave (EmploymentStatus) than those that are on hourly wage - Compare frequencies in
crosstabulation table and look at subscript to determine specific differences in
frequencies/proportions between groups
Suggested steps in analysis:
• Step 1: Formulate hypotheses
• Step 2: Determine how many variables you are comparing and choose significance
o Choose the Significance Level 0.05
o Recognize independent and dependent variables
o Report variable(s) and their types – Variable view in SPSS provides information if
variable is nominal, scale or ordinal – SPSS can make mistake about this so you
should make final judgment.
o Treat all Likert scale variables as scales.
• Step 3: Determine if Samples Are Paired or Independent
o Report type(s) of sample(s)
• Step 4: Choose a test and Check Assumptions
o Normality test
▪ Shapiro-Wilk test
▪ Visually check for normality
▪ If a sample is larger than 30 assume normality for continuous variables
•
•
•
o Equality of variances test
▪ Levene’s test
▪ If two samples are balanced (1:4 or less ratio) assume equality of variance
▪ Report assumptions
Step 5: Conduct the Test
Step 6: Interpretation
Step 7: Reporting results
2) SPSS ANOVA with Post-hoc analysis
Open Hotel Room Design.sav in SPSS and check all variables and examine data.
Please test hypothesis and use 7-step approach.
•
•
•
•
•
•
•
•
Variables used in the analysis:
o Travel_reason
o SAT
Step 1: Recognize research question and formulate hypotheses
o Question – Do customers that travel for different reasons (business, leisure and
mixed) have different levels of customer satisfaction with hotel rooms.
o Formulate Hypothesis:
▪ Business travelers have lower satisfaction than leisure travelers and
travelers that stay in hotel for mixed reasons.
Step 2: Determine how many variables you are comparing
o Choose the Significance Level – common α = 0.05
o Recognize independent and dependent variables
o Report variable(s) and their types
Step 3: Determine sample size for each group (minimum of 20 per group)
Step 4: Check Assumptions
o Normality test
o Equality of variances test
o Report assumptions
Step 5: Analysis
o ANOVA
o Post hoc analysis (Tukey)
Step 6: Interpretation
o Means
o ANOVA Table
o R-squared
o Post-Hoc
Step 7: Reporting results
3) SPSS Two-way ANOVA/ANCOVA
Open Hotel Room Design.sav in SPSS and check all variables and examine data
Please formulate hypotheses based on research question and then test hypotheses using 7-step
approach for each hypothesis.
•
•
•
•
•
•
•
•
•
Two-way ANOVA/ANCOVA Hotel Room Design.sav
Variables used in the analysis:
o Age_M
o DesignMT
o DTS
o Realism
Step 1: Recognize research question and formulate hypotheses
o Research Question 1 – Do Age Groups (Younger vs Older) moderate the
relationship between Design Style (Modern vs Traditional) and desire to stay.
o Research Question 2 – Do Age Groups (Younger vs Older) moderate the
relationship between Design Style (Modern vs Traditional) and desire to stay
when controlling for scenario realism.
o You will need to do a separate analysis for each question.
o Formulate Hypotheses
Step 2: Determine how many variables you are comparing
o Choose the Significance Level – common α = 0.05
o Recognize all independent and dependent variables
o Report variable(s) and their types
Step 3: Determine sample size for each cell (minimum of 20 per cell). Sample size should
also increase with addition of covariates.
Step 4: Check Assumptions
o Levene’s test of equality of covariance – should have p-value above 0.05
o Report assumptions
Step 5: ANOVA/ANCOVA analysis.
o ANOVA/ANCOVA
o Define Model
Step 6: Interpretation
o Means
o ANOVA/ANCOVA tables
o Interpret main effects
o Interpret interaction effect
o Effect size (Partial-eta-Squared)
o Check Plots
Step 7: Reporting results
SESSIONS 8 – 14
RESEARCH DESIGN & MEASUREMENTS
❏
X Session 8: Data Preparation, Descriptive Statistics & Correlation
X Session 9: Testing for Differences between Groups and Predictive Relationships
❏
❏
X Session 10: Analysis of Variance
X Session 11: Review Session - Differences between Groups and Analysis of Variance
❏
X Session 12: Midterm EXAM 2
❏
X Session 13: Regression, and Log-linear Analysis
❏
❏ Session 14: Summary of statical analysis techniques with Regression, Log-linear Analysis, Factor
and Cluster Analysis
FINAL PROJECT: 30% OF FINAL GRADE
§ Final Project take home data analysis
§ Due at the date and time the registrar has scheduled the final exam for this course.
§ 18th December
§ A dataset will be posted in Brightspace and your project is to read the data, do any needed data
manipulation, conduct several analyses, and interpret and describe the results.
§ Final project will consist of several statistical problems based on the statistical methods covered in
this course.
§ To receive maximum points, you need to have:
§ Accurate analysis
§ Appropriate tables/graphs and reports
§ Correct wording when reporting results
§ Appropriate interpretation of the results
§ Adequate formatting of the document
§ Report should be submitted as a PDF document with 1-inch margins
§ Submission time extensions cannot be granted.
§ Complete all of PARTS 1-5 is necessary. Part 6 is optional
FINAL PROJECT / EXAM
§ Nightclub Price Fairness FINAL exam.sav
§ Check all variables and examine data.
§ Test hypotheses and use 7 step approach where applicable.
§ Part 1 – Descriptives
- 60 points
§ Part 2 - Factor analysis
- 60 points
§ Part 3 – Hypothesis testing 1
- 60 points
§ Part 4 – Hypothesis testing 2 - Analysis of Variance
- 60 points
§ Part 5 – Hypothesis testing 3 - Regression
- 60 points
§ Part 6 – Clustering
optional
LEARNING OBJECTIVES
§Overview
§Linear Regression
§Log-Linear Analysis
§Factor Analysis
§Cluster Analysis
LINEAR & LOGISTIC
REGRESSION
REGRESSION ANALYSIS
§ A way of mathematically sorting out which of many independent variables has an impact on a
dependent variable.
§ It answers the questions:
§ Which factors matter most?
§ Which factors can we ignore?
§ How do the factors interact with each other?
§ And, perhaps most importantly, how certain are we about all of these factors?
§ Regression analysis can:
§ Indicate if independent variables have a significant relationship with a dependent variable.
§ Indicate the relative strength of different independent variables’ effects on a dependent variable.
§ Make predictions.
§ Y = α + βX + e
§ Alpha (α – Y intercept) and beta (β - slope)
§ Multiple regression analysis allows one dependent variable to be explained by more than one
independent variable
§ Yi = b0 + b1X1 + b2X2 + b3X3 + … + bnXn + ei
§ The coefficient of determination, R2 (R-squared), reflects the proportion of variance explained by the regression line
§ F-Test (regression) statistical test of the deviation unexplained by the regression
STEPS IN LINEAR REGRESSION
Step 1
Recognize research question and formulate hypotheses
Step 2
Determine how many variables you are comparing and choose significance
Step 3
Data requirements and initial analysis
Step 4
Check Assumptions
Step 5
Model Estimation
Step 6
Interpretation and validation
Step 7
Report Results
LOGISTIC REGRESSION
§Used for explanation and prediction of why a consumer either
responded or did not respond to some marketing effort
§Logistic regression is a multivariate technique
§ Prediction of a categorical, dichotomous dependent variable, usually coded 0 = failure
and 1 = success,
§ Using metric and/or less than interval independent variables
§Purpose
§ Find the independent variables most highly related to the likelihood of an observation
being successful
§ Classifying respondents into success and failure categories
Adapted from Babin & Zikmund © 2016 Cengage Learning
PREDICTION IN LOGISTIC REGRESSION
§A logistic regression’s ability to predict success or failure can be
assessed using several statistics
§Logistic regression equations are not estimated using OLS
because of the statistical distribution created by using logits as
dependent variables
§Instead, logistic regression relies on maximum likelihood
estimation
§The goal is to create parameters that maximize the likelihood of
some event
§As a consequence, most logistic regression results will include a
likelihood value
Adapted from Babin & Zikmund © 2016 Cengage Learning
THE LIKELIHOOD VALUE
§The likelihood value provides a way of indicating the overall
significance or predictive capabilities of logistic regression
because –2 times the log of the likelihood value (abbreviated as
–2LL) takes on the value of zero when every observation is
correctly classified as a success or failure
§Most logistic regression software will produce a crossclassification matrix and chi-square test demonstrating how
many observations are correctly classified
§ If this is significant, it provides some evidence of the predictive power of
the model
Adapted from Babin & Zikmund © 2016 Cengage Learning
PSEUDO R2
§In logistic regression, we can employ an entropy (pseudo) R2
that we can interpret just like the R2 in multiple regression
§Pseudo R2 is an assessment of logistic regression predictive
power that functions like the R2 in multiple regression
§Values of 0 mean no predictive power and a value of 1 would
mean perfect predictive power
Adapted from Babin & Zikmund © 2016 Cengage Learning
INTERPRETING SIGNIFICANT INDEPENDENT
VARIABLES
§We can interpret the independent variables’ strength of
relationship to the logit values representing the dependent
variable using the Wald Statistic
§The statistical output should include a list of Wald statistic
results for each independent variable
§ The statistical significance of each can be interpreted just as in multiple
regression
§ Because the actual parameter estimates express the relationship
between an independent variable and logits, they can be perplexing to
interpret
§ In particular, a negative value would indicate a variable that increases the odds of
success
Adapted from Babin & Zikmund © 2016 Cengage Learning
INTERPRETING SIGNIFICANT INDEPENDENT
VARIABLES
§Logistic regression analysis often includes the exponential
logistic coefficient, which is nothing more than the antilog of
the raw parameter estimate
§Also realize that the software may arbitrarily pick one category
as “success” and it may not match the user’s definition
§When this occurs, the signs of coefficients can be reversed
§Thus, one way to eliminate doubt about the positive or
negative effect of a predictor on the dependent variable is to
do an independent samples t-test of the independent variable
using the dependent variable as a group variable
Adapted from Babin & Zikmund © 2016 Cengage Learning
KEY TERMS
§Logistic regression
§Logit
§Pseudo R2
§Wald statistic
INTERDEPENDENCE TECHNIQUES
§ Interdependence techniques
§ make no distinction between independent and dependent variables and seek to identify the underlying
structure of a data set
§ Factor Analysis: analysis of variables into dimensions
§ Dimensions: underlying meanings that help provide structure to common observations
§ Factor: an underlying dimension that helps understand the nature of the variables in the data set
§ Parsimony: simple solution that maintains sufficient meaning to represent reality
§ identifies a reduced number of factors from a larger number of variables
§ Factor Loading: the correlation between a variable and a factor
§ Communality: measure of the percentage of a variable’s variation that is explained by the factors
§ Cluster analysis: arrangement of respondents into groups calls
§ Segment: a number of observations sharing characteristics associated with
some outcome of interest
§ 3 broad approaches for Factor Analysis
§ Data reduction – reduce the number of pieces of information needed to interpret data
§ Exploratory Factor Analysis (EFA) – uncertainty regarding the number of factors and which factors belong
§ Confirmatory Factor Score (CFS)
§ Principal Components Analysis (PCA)
§ similar to factor analysis; identifies components (vs factors) common variance, specific variance, error variance
FACTOR ANALYSIS / DATA REDUCTION TECHNIQUE
§ Variate Xi= L1F1 + L2F2 + L3F3 … + LkFk
§ Xi observed relationship for the ith variable
§ Lk factor loading of a variable on factor k
§ Factor Analysis Xi= L1F1 + … + LfFf + 𝜺I
§f
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Achiever Papers is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Dissertation Writing Service Works
First, you will need to complete an order form. It's not difficult but, if anything is unclear, you may always chat with us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order form
Once we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignment
As soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download