STAT 250 Data Analysis Assignment 3You may not upload this file to any online homework help sites. Please see our course
syllabus for honor code rules. Thank you.
Your solutions document should include the following items. Points will be deducted if the
following are not included.
1. Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx)
right justified and then Data Analysis Assignment #3 centered on the top of page 1
below your name to begin your solutions document.
2. Number your pages across your entire solutions document.
3. Your solutions document should include the ANSWERS ONLY with each answer
labeled by its corresponding number and subpart. Keep the answers in order.
4. Generate all requested graphs and tables using StatKey or Rguroo, where stated.
5. Upload your solutions document onto Blackboard as a pdf file using the link provided by
your instructor. It is your responsibility for uploading a readable file.
6. You may not work with other individuals on this assignment. It is an honor code
violation if you do.
Please note: all StatKey or Rguroo Instructions provided in the parts of the problems will be
presented in italics.
Elements of good technical writing:
Use complete and coherent sentences to answer the questions.
Graphs must be appropriately titled and should refer to the context of the question.
Graphical displays must include labels with units if appropriate for each axis.
Units should always be included when referring to numerical values.
When making a comparison you must use comparative language, such as “greater than”, “less
than”, or “about the same as.”
Ensure that all graphs and tables appear on one page and are not split across two pages.
Type all mathematical calculations when directed to compute an answer ‘by-hand.’
Pictures of actual handwritten work are only accepted when specifically asked.
When writing mathematical expressions into your solutions document you may use either an
equation editor or common shortcuts. For example, √𝑥 can be written as sqrt(x), 𝑝̂ can be written
as p-hat, and 𝑥̅ can be written as x-bar.
1
Investigation 1: Appropriateness of Inference: Price of a Haircut
For the following scenario, answer the questions below. Please note, do not conduct inference
in this problem; just answer each question.
A random sample of 24 Mason students was collected and each student was asked, among other
things, the total cost of their last hair service (including cuts, styling, etc.). The researcher does
not have any information about the population from which the sample was collected. The data
set is called HairPrice.
a) If we attempt to conduct statistical inference using the collected sample, what is the
parameter of interest? Use the correct symbol and describe the parameter in context in
one sentence.
b) Check the specific conditions necessary to consider conducting inference using theorybased methods using the t-distribution. There are three to consider: (1) Was a random
sample collected; (2) Is the population where the sample comes from normal; and (3) Is
the sample size greater than or equal to 30? Answer each of these questions in one
sentence.
c) What could be checked using the sample data if our sample size is less than 30? Answer
this question in one sentence.
d) Depending on your answer to part (a), construct one or two frequency histograms in
Rguroo. Remember to properly title and label the graph(s). Copy and paste this graph
(or these graphs) into your document.
e) Describe the shape of the histogram(s) in one sentence.
f) Depending on your answer to part (a), construct one or two horizontal boxplots in
Rguroo. Remember to properly title and label the graph(s). Copy and paste this graph
(or these graphs) into your document.
g) Does the boxplot (or do the boxplots) show any outliers? Answer this question in one
sentence and identify any outliers if they are present.
h) Considering your answers to parts (e) and (g), is theory-based inference appropriate in
this case? If you respond “yes,” provide a reason for your response. If you respond “no,”
state the reason why not and present another possibility if the researcher still wanted to
conduct statistical inference. Use one or two complete sentences in your response.
2
Investigation 2: Earnings among Voters
A political scientist wondered if there is a significant difference between the proportion of
Democrats and Republicans earning over $100,000. To obtain the data, she used the National
Election Pool. The NEP is a consortium of major new networks (ABC, CBS, CNN, and NBC)
that pools together resources to gather voting and exit poll data from a random sample of voters.
On Election Day, November 8, 2022, exit poll data showed that among a random sample of 774
voters, 408 were registered as Republican and 366 were registered as Democrat. Of the 408
Republican voters, 216 earn over $100,000 a year. Of the 366 Democrat voters, 162 earn over
$100,000 a year. Use = 0.05.
a) Define the population parameter using context and symbols in one complete sentence.
b) State the hypotheses using the political scientist’s claim.
c) Check the specific conditions necessary to consider conducting inference using theorybased (or distribution based) methods. There are two to consider: (1) Was the data
collected randomly from the population; and (2) Are there at least ten successes and
failures in each group? Answer each of these questions in one sentence and show that
condition (2) is true or false using calculation to obtain the failures (note the successes
are given).
d) Calculate and label the two sample proportions separately and round the values to four
decimal places. Next, calculate the difference between these sample proportions by
subtracting (Republican – Democrat). Type all of these calculations and label each of
them.
e) Calculate the pooled proportion estimate needed in the calculation of the standard error of
the test statistic. Type this calculation and round to four decimal places.
f) Calculate the test statistic value using your proportions obtained in parts (d) and (e) and
type your work. Round your test statistic value to three decimal places.
g) Obtain your p-value using your test statistic calculated in part (f) in StatKey using
Theoretical Distributions → Normal. Copy and paste the image of the standard normal
distribution and type the value of the p-value below the image.
h) Verify your test statistic and p-value using Rguroo. Go to Analytics → Analysis →
Proportion Inference → Two Populations. See the image below to fill in each box
correctly. Then, click the Test of Hypothesis tab and choose Large Sample z under
method. Finally, correctly set your alternative hypothesis and significance level and click
Preview. Copy and paste only the output and table displayed under the title “Two
Population Proportion Test of Hypothesis”
3
i) Based on your p-value, make a decision to either reject or not reject the null hypothesis.
j) Draw a conclusion based on your decision from part (h) by answering the initial question
posed by the political scientist in one or two complete sentences.
k) Create a randomization distribution for this problem. In StatKey, go to the right pane
labeled ‘Randomization Hypothesis Tests’ and click Test for Difference in Proportions.
Edit the data in ‘Edit Data.’ and click ‘Generate 1000 Samples’ ten times to produce
10,000 samples. Screenshot your distribution and paste it in your solutions document.
l) Calculate the p-value from your randomization distribution using your observed statistic
calculated in part (d) (the numerator of your test statistic). First, click the ‘Right Tail’
button and enter the value of your observed statistic in the blue box below the x-axis.
Next, click the ‘Left Tail’ button and enter the negative value of your observed statistic
the blue box below the x-axis (to the left of zero). Then, if necessary, readjust your
bottom blue box to the right of zero to correctly display the value of the observed
statistic. Finally, add the values of the two blue boxes above their corresponding red x’s
to obtain the p-value. Screenshot your image of this work and paste it in your document.
m) Is the conclusion you can draw from the simulation-based method the same as the theorybased conclusion you made?
4
Investigation 3.1: Diet Comparison (Independent)
A doctor of internal medicine was interested in comparing two methods of dieting (intermittent
fasting and the Keto diet). She designed an experiment that began by selecting a random sample
of 50 adults with BMIs in the “overweight” or “obese” range from a large list of volunteers.
Next, she randomly assigned 25 adults into the intermittent fasting group and the remaining 25
entered the Keto diet group. Each adult was monitored weekly to make sure they safely and
correctly followed the dieting method. The data was collected after 12 weeks. Each adult’s BMI
reduction was recorded and can be found in the data set Diet1. For example, if an adult reduced
their BMI from 32.2 to 28.1, the reduction is recorded as 32.2 – 28.1 = 4.1. Negative results
would indicate a BMI increase over the 12 weeks. We will subtract Fasting – Keto and use the
5% significance level. Assume the Central Limit Theorem conditions hold.
a) Define the population parameter in the context of this question in one sentence.
b) State the hypotheses you would use to test the claim that a difference exists in the mean
BMI reduction for fasting and Keto using correct notation.
c) In this investigation, we will obtain a 95% confidence interval to make the hypothesis test
decision. In Rguroo, go to Analytics → Analysis → Mean Inference → One & Two
Populations. In the Dataset dropdown, select Diet1. In the Variable 1 dropdown, select
Fasting. Variable 2 dropdown, select Keto. Below the variables, select the fourth tab
Population 1-2 and note that the Confidence Interval tab is selected. Then, select tstatistic under Method. Leave the Assumptions box on the right as is and click Preview.
Provide either a screenshot or a copy of your output table and state the confidence
interval.
d) Make a decision using the confidence interval by checking to see if the null value is
captured by the confidence interval. State the decision and explanation in one sentence.
e) Draw a conclusion about the claim using one or two sentences in context of the problem.
f) Verify your decision and conclusion by obtaining a test statistic and p-value in Rguroo.
Follow the directions presented in part (c). Rather than staying on the Confidence
Interval tab, select the Test of Hypothesis tab. In this tab, keep the significance level at
0.05 and update the alternative hypothesis. Select t-statistic under Method. Leave the
Assumptions box on the right as is. Click Preview. Provide either a screenshot or a copy
of your output table and verify your decision by comparing your p-value to the
significance level in one complete sentence.
g) Provide at least one confounding variable that may have had an effect on this study’s
results in one sentence.
5
Investigation 3.2: Diet Comparison (Paired)
The doctor continued the research study after obtaining additional information from the
participants. During the 12-week study, each of the participants tracked the amount of exercise
they completed each week. Using this additional information, the doctor paired the adult who
exercised the least in the fasting group with the adult who exercised the least in the Keto group.
She continued to pair the adults until the adult who exercised the most in the fasting group was
paired with the adult who exercised the most in the Keto group. The data set is called Diet2.
The data set includes columns for the minutes of exercise, the paired data and the differences.
Again, assume the Central Limit Theorem conditions hold.
a) Define the population parameter for this investigation in context in one sentence.
b) State the null and alternative hypothesis to test the claim that a difference exists in the
mean BMI reduction between Fasting and Keto. Please consider the new data design.
c) Again, we will obtain a 95% confidence interval to make the hypothesis test decision. In
Rguroo, go to Analytics → Analysis → Mean Inference → One & Two Populations. In
the Dataset dropdown, select Diet2. In the Variable 1 dropdown, select Fasting. Variable
2 dropdown, select Keto. Under the Summary tab, check the box to the left of Paired
Data. Next, select the fourth tab Population 1-2 and note that the Confidence Interval tab
is selected. Then, select t-statistic under Method. Leave the Assumptions box on the
right as is and click Preview. Provide either a screenshot or a copy of your output table
and state the confidence interval.
d) Construct the 95% confidence interval again by treating the column of differences as if it
were data from one sample. Go to Analytics → Analysis → Mean Inference → One
Populations. In the Dataset dropdown, select Diet2. In the Variable dropdown, select
Differences. Then, select t-statistic under Method. Leave the Assumptions box on the
right as is and click Preview. Provide either a screenshot or a copy of your output table
and state the confidence interval.
e) Make a decision using the confidence interval by checking to see if the null value is
captured by the confidence interval. State the decision and explanation in one sentence.
f) Draw a conclusion about the claim using one or two sentences in context of the problem.
g) Comment on the differences between the standard errors calculated in 3.1(c) and 3.2(c)
and the decisions made in each test. Why do you believe these standard errors differ?
h) Can we generalize these results to a larger group (i.e. a population)? Answer this question
in one sentence and please provide a reason for your answer.
i) Can we determine if a cause and effect relationship exists between the variables? Answer
this question in one sentence and please provide a reason for your answer.
6
Investigation 4: Predicting a 5K Race Time
A running coach was interested in predicting a runner’s 5K (3.1 miles) finishing time (in
minutes). Two variables the coach considered was the amount of oxygen runners could utilize
during training (known as VO2 max (ml/kg/min)) and a runner’s age (in years). The data were
collected from a random sample of Garmin watch users who ran a popular 5K (e.g. a
Thanksgiving Turkey Trot) so each runner ran the same course. A runner’s 5K finishing time
(Time), their VO2 Max (measured on their Garmin watch), and their Age were collected. The
data set is called RunningTime. We will use Rguroo and StatKey for this investigation.
a) Make two separate scatterplots where each scatterplot will present one of the explanatory
variables graphed with the response variable Time. Go to Plots → Create Plot →
Scatterplot. In the Dataset dropdown, select RunningTime. Change the predictor
(explanatory) and response variables accordingly. Properly title and label your graph and
axes. Copy and paste your scatterplots into your solutions document.
b) Interpret the scatterplot of VO2 Max and Time using trend, strength, and shape (form) in
one compete sentence.
c) Interpret the scatterplot of Age and Time using trend, strength, and shape (form) in one
compete sentence.
d) Provide both correlation coefficients. Go to Analytics → Analysis → Linear Regression
→ Simple Regression. In the Dataset dropdown, select RunningTime. Change the
predictor (explanatory) and response variables accordingly. Click Preview. You will
need to complete this twice, once for each explanatory variable. The correlation is
presented as “Pearson Correlation Coefficient (r).” Please state both correlation values in
your solutions.
e) Use StatKey to create a bootstrap distribution of correlations. From the main page, select
CI for Slope, Correlation. Edit the data by copying only two columns (explanatory and
response) into the box (repeat for each explanatory). Then, generate 10,000 samples and
use each standard error to produce 2SE confidence intervals to estimate the population
correlation. Present each confidence interval and comment on whether 0 is captured in
each interval.
f) Which of the two explanatory variables would be the better predictor of Time? Base your
answer on the scatterplots, the correlation coefficients and their confidence intervals.
State your answer in one or two complete sentences including an explanation for your
variable choice.
g) Find the fitted line for the explanatory variable VO2 Max and the response variable Time,
run a simple linear regression analysis. You may use the same output as in part (d) but
look at “Equation of Least Squares Line” to help you state the fitted line equation in your
solutions document.
7
h) Produce the fitted line plot for VO2 Max and Time and copy it into your solutions
document. Scroll down the page in your output from part (d) and copy and paste the
graph labelled “Response Versus Numerical Predictor.”
i) Interpret the slope of the regression line for VO2 Max and Time in context of the problem.
j) Would it be meaningful to interpret the y-intercept for VO2 Max and Time? Explain why
or why not in one sentence.
k) Provide r2 for VO2 Max and Time and explain what this value means in context of the
problem. Again, refer to the output from part (d), but look at “Coefficient of
Determination (R-Squared).”
l) Test whether the slope is significant using theory-based inference (assuming all
conditions hold). Go to Analytics → Analysis → Linear Regression → Simple
Regression. In the Dataset dropdown, select RunningTime. Change the predictor
(explanatory) and response variables accordingly. Under the tab “Test of Association,”
check Slope under Alternative Hypothesis and leave Not Zero selected. Under
“Methods” choose Theoretical t-statistic. Keep the Significance Level at 0.05. State the
hypotheses, show work to obtain the t-test statistic using the output, and use the p-value
provided in the output to make your decision. Finally, draw a conclusion in one complete
sentence.
m) If a randomly selected runner had a VO2 Max of 53, predict their 5K finishing time. Use
the regression equation from part (g) and show all work and calculations.
8
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Achiever Papers is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Dissertation Writing Service Works
First, you will need to complete an order form. It's not difficult but, if anything is unclear, you may always chat with us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order form
Once we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignment
As soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download