Homework #4: Chapters 4.12-4.14, Due 10/3/22 by Midnight PTQuestion 1:
Question 2:
In the library on a university campus, there is a sign in the elevator that indicates a weight limit of 2500
pounds. Assume the average weight of students, faculty and staff on campus is right-skewed, with a
mean of 150 pounds, and standard deviation 27 pounds. A random sample of 16 persons from the
campus is selected.
a. Describe the sampling distribution of the sample mean weight.
b. What is the probability that the average weight of the 16 people in the sample is less than 160
pounds?
c. Suppose the sample of 16 people is placed in the library elevator. What is the probability that the
total weight of the 16 persons on the elevator will exceed the weight limit of 2500 pounds?
Question 3:
Suppose individuals with a certain gene have a 0.70 probability of eventually contracting a certain
disease. Using normal approximation to the Binomial, answer the following questions:
a. If 100 individuals with the gene participate in a lifetime study, what is the distribution of the random
variable, X, describing the number of individuals who will contract the disease?
b. Suppose in the study from the problem above you found 78 of the individuals contracted the disease.
Does this seem too high? Justify your answer by finding the probability that at least 78 individuals
contract the disease.
Question 4:
Question 5: MINITAB PROBLEM
The IQ scores of a certain city follows a bell-shaped curve with a mean of 100 and variance of 225.
(a) Using Minitab, generate a random sample of size 100 from this population (do not copy and paste
the whole raw data to your submission). Draw a histogram of your sample values and calculate the
mean, standard deviation, and variance.
(c) Using the Empirical Rule, estimate the percentage of your sample values that fall within 1, 2, and 3
standard deviations of your sample mean.
(d) Draw a Normal Probability Plot of your sample values and determine whether your sample
distribution follows a normal distribution.
Stat 350A: Chapter 4.12
Part 1: Sampling Distributions
• Population: Entire collection of items or individuals you wish to study
• Sample: A subset of the population that has been selected to study or measure
• Statistics: Measurements made on sample data (Ȳ, s, π)
• Parameters: Measurements made on population data (μ, σ, π)
• A point estimate of a population parameter is a sample statistic that represents a feasible value
of the parameter of interest.
• An unbiased estimator is a sample statistic whose mean value is equal to the value of the
population parameter being estimated.
• The sample mean Ȳ is an unbiased estimator of the population mean μ, but Ȳ varies from sample
to sample (sampling variation).
EXAMPLE 1: Suppose we wish to estimate the mean height of Stat 350A students. The following
samples of size 2 were collected.
Random Sample
1
2
3
4
5
Height 1
74”
65”
66”
64”
63”
Height 2
76”
69”
68”
70”
73”
ȳ
EXAMPLE 2: Tossing a die
(A) Tossing a single die 10,000 times.
(B) Tossing a pair of dice 10,000 times and calculating the average of each pair.
(C) Tossing twenty dice 10,000 times and calculating the averages of each toss.
Part 2: The Central Limit Theorem and Sampling Distribution of the Sample Mean
• The Central Limit Theorem (CLT): When drawing a random sample of size n from any nonnormal population with a mean μ and the standard deviation σ is known, then the sample mean,
Ȳ, has a sampling distribution that is approximately normal as long as n is large enough (rule of
thumb: n > 30).
• Assumptions and Conditions:
◦1) The data values must be sampled randomly.
◦2) The sampled values must be independent of one another.
◦3) Sample size should be less than 10% of the population size.
• The Sampling Distribution Model for a Sample Mean Ȳ
◦The mean of the sample averages is μ = μ
Ȳ
◦The standard deviation of the sample averages is σ = σ/√n
Ȳ
◦If a population is normal, then the sampling distribution of the sample
2
mean is normal: Ȳ ~ N(μ, σ /n)
◦If the a population is non-normal, then the sampling distribution of the
sample mean is approximately normal according to the CLT as long as n
is large enough: Ȳ ~ AN(μ, σ2 /n)
◦The Z-score formula for the sample mean is de ned as
Z=
Ȳ-μ
σ/√n
EXAMPLE: Suppose the weights of men are normally distributed with a mean of 173 lbs. and
variance of 900.
(A) What is the probability a randomly selected men weighs more than 200 lbs.?
(B) What is the probability that the mean weight of 9 randomly selected men is more than 200 lbs.?
Part 3: More Examples
EXAMPLE 1: The times of the nishers in a 10km run are normally distributed with a mean of 61
minutes and a variance of 81. A random sample of 30 runners is selected.
(A) Describe the sampling distribution of the average 10km nishing times for this sample.
(B) Find the probability that the average time of the above sample will be more than 65 minutes?
EXAMPLE 2: A rental car company has noticed that the distribution of the number of miles
customers put on rental cars per day is right-skewed. The distribution has a mean of 60 miles and a
standard deviation of 25 miles. A random sample of 120 rental cars is selected.
(A) Describe the sampling distribution of the average number of miles per day for this sample.
(B) What is the probability that the mean number of miles driven per day for this sample is less than
54?
(C) What is the probability that the total number of miles driven per day for this sample exceeds
7400?
Stat 350A: Chapter 4.13
Part 1: Normal Approximation to Binomial
• Let Y ~ Bin(n, π). If n has a very large sample size, calculations (by hand) of the Binomial
distribution can be strenuous. For these large number of trials for Binomial experiments, what we
can do instead is use the Normal distribution to approximate these Binomial probabilities.
• The Normal approximation to the Binomial distribution is an application of the Central Limit
Theorem. Why?
‣ a) Let X ~ Bin(1, π) = Bernoulli(π), then μ = π, σ = √π(1-π). And suppose we run n
Bernoulli trials.
‣ b) X̄ = (X1 + X2 + … + Xn)/n ~ AN(π, π(1-π)/n)
if n is large enough
‣ c) Let Y = X1 + X2 + … + Xn = nX̄ , then Y ~ AN(nπ, nπ(1-π))
• Assumptions:
◦1) Random sample.
◦2) Trials are independent.
◦3) If sampling without replacement, n < 10% of N
◦4) Rule of Thumb for su ciently large sample size: nπ > 5 and n(1-π) > 5
• Approximating Probabilities for Binomial Random Variables
◦To estimate probabilities for a binomial random variable, Z-scores can be used.
◦P(Y < y) ≈ P(Z < z)
y - nπ
Z=
nπ(1-π)
Part 2: Examples
EXAMPLE 1: Eighty percent of all patrons at a local restaurant request water with their meal.
(a) Suppose we sample 7 customers. What is the probability that at least 5 of the 7 customers
selected will request water with their meal?
(b) Suppose we now take a larger random sample of 119 customers. What is the probability that at
most 100 of this sample will request water with their meal?
EXAMPLE 2: Many toothpaste commercials that 3 out of 4 dentists recommend their brand of
toothpaste. A random survey of 400 dentists is taken. Assuming the commercials are correct, what
is the probability that at least 320 dentists from this sample will recommend Brand X toothpaste?
Part 3: Continuity Correction
EXAMPLE 1: Suppose 30% of the population have 20/20 vision. What is the probability of having 5
people with 20/20 vision in a sample of 20?
EXAMPLE 2: Can we use the normal approximation for the previous problem?
EXAMPLE 3: Referring to the above two examples. What is the probability that at least 8 of 20 have
20/20? Use both Binomial and a Normal approximation.
EXAMPLE 4: Referring to the above examples. What is the probability that 5 to 7 people out of 20
have 20/20 vision? Use both Binomial and a Normal approximation.
• Summary:
◦1) If nπ < 5 or n(1 - π) < 5, Normal approximation cannot be used.
◦2) If nπ > 20 and n(1 – π) > 20, Normal approximation with no continuity correction can be
used.
◦3) If 5 < nπ < 20 and 5 < n(1 - π) < 20, Normal approximation with continuity correction can
be used.
Stat 350A - Chapter 4.14
Part 1: The Empirical Rule
• The Empirical Rule states that for any normal or approximately normal distribution, approximate
percentages under the curve can be estimated. Also referred to as the 68-95-99.7% rule, it states:
◦68% of the observations are within one standard deviation of the mean.
◦95% of the observations are within two standard deviations of the mean.
◦99.7% of the observations are within three standard deviations of the mean.
• We can use the Empirical rule to help us assess whether a distribution follows a normal or
approximately normal distribution.
◦1) Calculate the sample mean, Ȳ, and sample standard deviation, s, of the distribution.
◦2) Calculate what percentage of the observations fall within 1, 2, and 3 standard deviations of
the mean.
◦3) Compare these percentages from 2) to 68%, 95%, and 99.7%. If they are close, then the
distribution is normal.
MINITAB EXAMPLE: Let us a take a sample of 100 observations from a normal population with a
mean of 10 and variance of 4.
Part 3: Histogram and Normal Probability Plot
• Normal probability plots give a visual way to determine if a distribution is normal or
approximately normal. It is a Scatterplot of sorted data vs normal scores
• Normal scores are the expected values of the ordered observations in a sample size of n from a
the standard normal curve, i.e., N(0, 1). It calculates where one would expecte data to fall if
sampling from a standard normal distribution.
• If the distribution is normal, the plotted points will lie close to a line. Systematic deviations from
the line indicate a non-normal distribution.
MINITAB EXAMPLE: Suppose we have a normal distribution with a mean of 3 and variance of 0.25.
(A) Take a random sample of 10 measurements from this distribution and draw a NPP.
(B) Take a random sample of 100 measurements from this distribution and draw a NPP.
An Introduction to
Statistical Methods
& Data Analysis
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
An Introduction to
Statistical Methods
& Data Analysis
Seventh Edition
R. Lyman Ott
Michael Longnecker
Texas A&M University
Australia • Brazil • Mexico • Singapore • United Kingdom • United States
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
This is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial
review has deemed that any suppressed content does not materially affect the overall learning experience. The publisher reserves the right to
remove content from this title at any time if subsequent rights restrictions require it. For valuable information on pricing, previous
editions, changes to current editions, and alternate formats, please visit www.cengage.com/highered to search by
ISBN#, author, title, or keyword for materials in your areas of interest.
Important Notice: Media content referenced within the product description or the product text may not be available in the eBook version.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
An Introduction to Statistical Methods and
Data Analysis, Seventh Edition
R. Lyman Ott, Michael Longnecker
Senior Product Team Manager:
Richard Stratton
Content Developer: Andrew Coppola
Associate Content Developer:
Spencer Arritt
Product Assistant: Kathryn Schrumpf
Marketing Manager: Julie Schuster
© 2016, 2010 Cengage Learning
WCN: 02-200-203
ALL RIGHTS RESERVED. No part of this work covered by the copyright
herein may be reproduced, transmitted, stored, or used in any form
or by any means graphic, electronic, or mechanical, including but not
limited to photocopying, recording, scanning, digitizing, taping, Web
distribution, information networks, or information storage and retrieval
systems, except as permitted under Section 107 or 108 of the 1976
United States Copyright Act, without the prior written permission of
the publisher.
Content Project Manager: Cheryll Linthicum
For product information and technology assistance, contact us at
Cengage Learning Customer & Sales Support, 1-800-354-9706.
Art Director: Vernon Boes
For permission to use material from this text or product,
submit all requests online at www.cengage.com/permissions.
Further permissions questions can be e-mailed to
permissionrequest@cengage.com
Manufacturing Planner: Sandee Milewski
Intellectual Property Analyst: Christina
Ciaramella
Intellectual Property Project Manager:
Farah Fard
Production Service and Compositor:
Cenveo Publishing Services
Photo and Text Researcher: Lumina
Datamatics, LTD
Copy Editor:
Illustrator: Macmillan Publishing Services/
Cenveo Publishing Services
Text and Cover Designer: C. Miller
Cover Image: polygraphus/Getty Images
Library of Congress Control Number: 2015938496
ISBN: 978-1-305-26947-7
Cengage Learning
20 Channel Center Street
Boston, MA 02210
USA
Cengage Learning is a leading provider of customized learning solutions
with employees residing in nearly 40 different countries and sales in more
than 125 countries around the world. Find your local representative at
www.cengage.com
Cengage Learning products are represented in Canada by
Nelson Education, Ltd.
To learn more about Cengage Learning Solutions, visit
www.cengage.com
Purchase any of our products at your local college store or at our
preferred online store www.cengagebrain.com
Printed in the United States of America
Print Number: 01 Print Year: 2015
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CONTENTS
Preface
PART 1
CHAPTER 1
1
2
Introduction 2
Why Study Statistics? 6
Some Current Applications of Statistics 9
A Note to the Student 13
Summary 13
Exercises 14
PART 2
Collecting Data
17
Using Surveys and Experimental Studies
to Gather Data 18
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Introduction and Abstract of Research Study 18
Observational Studies 20
Sampling Designs for Surveys 26
Experimental Studies 32
Designs for Experimental Studies 38
Research Study: Exit Polls Versus Election Results 48
Summary 50
Exercises 50
PART 3
CHAPTER 3
Introduction
Statistics and the Scientific Method
1.1
1.2
1.3
1.4
1.5
1.6
CHAPTER 2
xi
Summarizing Data
Data Description
3.1
3.2
3.3
3.4
3.5
3.6
3.7
59
60
Introduction and Abstract of Research Study 60
Calculators, Computers, and Software Systems 65
Describing Data on a Single Variable: Graphical Methods 66
Describing Data on a Single Variable: Measures of Central Tendency 82
Describing Data on a Single Variable: Measures of Variability 90
The Boxplot 104
Summarizing Data from More Than One Variable:
Graphs and Correlation 109
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
v
vi
Contents
3.8
3. 9
3.10
3.11
CHAPTER 4
Research Study: Controlling for Student Background
in the Assessment of Teaching 119
R Instructions 124
Summary and Key Formulas 124
Exercises 125
Probability and Probability Distributions
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
Introduction and Abstract of Research Study 149
Finding the Probability of an Event 153
Basic Event Relations and Probability Laws 155
Conditional Probability and Independence 158
Bayes’ Formula 161
Variables: Discrete and Continuous 164
Probability Distributions for Discrete Random Variables 166
Two Discrete Random Variables: The Binomial and the Poisson 167
Probability Distributions for Continuous Random Variables 177
A Continuous Probability Distribution: The Normal Distribution 180
Random Sampling 187
Sampling Distributions 190
Normal Approximation to the Binomial 200
Evaluating Whether or Not a Population Distribution Is Normal 203
Research Study: Inferences About Performance-Enhancing Drugs
Among Athletes 208
R Instructions 211
Summary and Key Formulas 212
Exercises 214
PART 4
CHAPTER 5
Analyzing THE Data, Interpreting the
Analyses, and Communicating THE Results
Inferences About Population Central Values
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
CHAPTER 6
149
231
232
Introduction and Abstract of Research Study 232
Estimation of m 235
Choosing the Sample Size for Estimating m 240
A Statistical Test for m 242
Choosing the Sample Size for Testing m 255
The Level of Significance of a Statistical Test 257
Inferences About m for a Normal Population, s Unknown 260
Inferences About m When the Population Is Nonnormal and n Is Small:
Bootstrap Methods 269
Inferences About the Median 275
Research Study: Percentage of Calories from Fat 280
Summary and Key Formulas 283
Exercises 285
Inferences Comparing Two Population Central
Values 300
6.1
6.2
Introduction and Abstract of Research Study 300
Inferences About m1 2 m2: Independent Samples 303
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Contents
6.3
6.4
6.5
6.6
6.7
6.8
6.9
CHAPTER 7
7.2
7.3
7.4
7.5
7.6
7.7
CHAPTER 8
366
Introduction and Abstract of Research Study 366
Estimation and Tests for a Population Variance 368
Estimation and Tests for Comparing Two Population Variances 376
Tests for Comparing t . 2 Population Variances 382
Research Study: Evaluation of Methods for Detecting E. coli 385
Summary and Key Formulas 390
Exercises 391
Inferences About More Than Two Population Central
Values 400
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
CHAPTER 9
A Nonparametric Alternative:
The Wilcoxon Rank Sum Test 315
Inferences About m1 2 m2: Paired Data 325
A Nonparametric Alternative:
The Wilcoxon Signed-Rank Test 329
Choosing Sample Sizes for Inferences About m1 2 m2 334
Research Study: Effects of an Oil Spill on Plant Growth 336
Summary and Key Formulas 341
Exercises 344
Inferences About Population Variances
7.1
vii
Introduction and Abstract of Research Study 400
A Statistical Test About More Than Two Population Means:
An Analysis of Variance 403
The Model for Observations in a Completely Randomized Design 412
Checking on the AOV Conditions 414
An Alternative Analysis: Transformations of the Data 418
A Nonparametric Alternative: The Kruskal–Wallis Test 425
Research Study: Effect of Timing on the Treatment
of Port-Wine Stains with Lasers 428
Summary and Key Formulas 433
Exercises 435
Multiple Comparisons
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
445
Introduction and Abstract of Research Study 445
Linear Contrasts 447
Which Error Rate Is Controlled? 454
Scheffé’s S Method 456
Tukey’s W Procedure 458
Dunnett’s Procedure: Comparison of Treatments to a Control 462
A Nonparametric Multiple-Comparison Procedure 464
Research Study: Are Interviewers’ Decisions Affected by Different
Handicap Types? 467
Summary and Key Formulas 474
Exercises 475
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
viii
Contents
CHAPTER 10
Categorical Data
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
10.9
10.10
10.11
CHAPTER 11
555
Introduction and Abstract of Research Study 555
Estimating Model Parameters 564
Inferences About Regression Parameters 574
Predicting New y-Values Using Regression 577
Examining Lack of Fit in Linear Regression 581
Correlation 587
Research Study: Two Methods for Detecting E. coli 598
Summary and Key Formulas 602
Exercises 604
Multiple Regression and the General Linear Model
12.1
12.2
12.3
12.4
12.5
12.6
12.7
12.8
12.9
12.10
12.11
12.12
CHAPTER 13
Introduction and Abstract of Research Study 482
Inferences About a Population Proportion p 483
Inferences About the Difference Between
Two Population Proportions, p1 2 p2 491
Inferences About Several Proportions:
Chi-Square Goodness-of-Fit Test 501
Contingency Tables: Tests for Independence
and Homogeneity 508
Measuring Strength of Relation 515
Odds and Odds Ratios 517
Combining Sets of 2 3 2 Contingency Tables 522
Research Study: Does Gender Bias Exist in the Selection of Students
for Vocational Education? 525
Summary and Key Formulas 531
Exercises 533
Linear Regression and Correlation
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
11.9
CHAPTER 12
482
Introduction and Abstract of Research Study 625
The General Linear Model 635
Estimating Multiple Regression Coefficients 636
Inferences in Multiple Regression 644
Testing a Subset of Regression Coefficients 652
Forecasting Using Multiple Regression 656
Comparing the Slopes of Several Regression Lines 658
Logistic Regression 662
Some Multiple Regression Theory (Optional) 669
Research Study: Evaluation of the Performance of an Electric Drill 676
Summary and Key Formulas 683
Exercises 685
Further Regression Topics
13.1
13.2
13.3
13.4
625
711
Introduction and Abstract of Research Study 711
Selecting the Variables (Step 1) 712
Formulating the Model (Step 2) 729
Checking Model Assumptions (Step 3) 745
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Contents
13.5
13.6
13.7
CHAPTER 14
Analysis of Variance for Completely
Randomized Designs 798
14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9
CHAPTER 15
15.5
15.6
15.7
15.8
16.6
16.7
865
Introduction and Abstract of Research Study 865
Randomized Complete Block Design 866
Latin Square Design 878
Factorial Treatment Structure in a Randomized Complete
Block Design 889
A Nonparametric Alternative—Friedman’s Test 893
Research Study: Control of Leatherjackets 897
Summary and Key Formulas 902
Exercises 904
The Analysis of Covariance
16.1
16.2
16.3
16.4
16.5
CHAPTER 17
Introduction and Abstract of Research Study 798
Completely Randomized Design with a Single Factor 800
Factorial Treatment Structure 805
Factorial Treatment Structures with an Unequal Number
of Replications 830
Estimation of Treatment Differences and Comparisons
of Treatment Means 837
Determining the Number of Replications 841
Research Study: Development of a Low-Fat Processed Meat 846
Summary and Key Formulas 851
Exercises 852
Analysis of Variance for Blocked Designs
15.1
15.2
15.3
15.4
CHAPTER 16
Research Study: Construction Costs for Nuclear Power Plants 765
Summary and Key Formulas 772
Exercises 773
917
Introduction and Abstract of Research Study 917
A Completely Randomized Design with One Covariate 920
The Extrapolation Problem 931
Multiple Covariates and More Complicated Designs 934
Research Study: Evaluation of Cool-Season Grasses for Putting
Greens 936
Summary 942
Exercises 942
Analysis of Variance for Some Fixed-, Random-,
and Mixed-Effects Models 952
17.1
17.2
17.3
17.4
17.5
Introduction and Abstract of Research Study 952
A One-Factor Experiment with Random Treatment Effects 955
Extensions of Random-Effects Models 959
Mixed-Effects Models 967
Rules for Obtaining Expected Mean Squares 971
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
ix
x
Contents
17.6
17.7
17.8
17.9
CHAPTER 18
Split-Plot, Repeated Measures,
and Crossover Designs 1004
18.1
18.2
18.3
18.4
18.5
18.6
18.7
18.8
CHAPTER 19
Nested Factors 981
Research Study: Factors Affecting Pressure Drops
Across Expansion Joints 986
Summary 991
Exercises 992
Introduction and Abstract of Research Study 1004
Split-Plot Designed Experiments 1008
Single-Factor Experiments with Repeated Measures 1014
Two-Factor Experiments with Repeated Measures on
One of the Factors 1018
Crossover Designs 1025
Research Study: Effects of an Oil Spill on Plant Growth 1033
Summary 1035
Exercises 1035
Analysis of Variance for Some Unbalanced
Designs 1050
19.1
19.2
19.3
19.4
19.5
19.6
19.7
Introduction and Abstract of Research Study 1050
A Randomized Block Design with One or More
Missing Observations 1052
A Latin Square Design with Missing Data 1058
Balanced Incomplete Block (BIB) Designs 1063
Research Study: Evaluation of the Consistency
of Property Assessors 1070
Summary and Key Formulas 1074
Exercises 1075
Appendix: Statistical Tables
Answers to Selected Exercises
References
Index
1085
1125
1151
1157
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PREFACE
INDEX
Intended Audience
An Introduction to Statistical Methods and Data Analysis, Seventh Edition, provides
a broad overview of statistical methods for advanced undergraduate and graduate
students from a variety of disciplines. This book is intended to prepare students to
solve problems encountered in research projects, to make decisions based on data
in general settings both within and beyond the university setting, and finally to
become critical readers of statistical analyses in research papers and in news reports.
The book presumes that the students have a minimal mathematical background
(high school algebra) and no prior course work in statistics. The first 11 chapters
of the textbook present the material typically covered in an introductory statistics
course. However, this book provides research studies and examples that connect
the statistical concepts to data analysis problems that are often encountered in
undergraduate capstone courses. The remaining chapters of the book cover regression modeling and design of experiments. We develop and illustrate the statistical
techniques and thought processes needed to design a research study or experiment
and then analyze the data collected using an intuitive and proven four-step approach.
This should be especially helpful to graduate students conducting their MS thesis
and PhD dissertation research.
Major Features of Textbook
Learning from Data
In this text, we approach the study of statistics by considering a four-step process
by which we can learn from data:
1. Defining the Problem
2. Collecting the Data
3. Summarizing the Data
4. Analyzing the Data, Interpreting the Analyses, and Communicating
the Results
Case Studies
In order to demonstrate the relevance and critical nature of statistics in solving realworld problems, we introduce the major topic of each chapter using a case study.
The case studies were selected from many sources to illustrate the broad applicability of statistical methodology. The four-step learning from data process is illustrated through the case studies. This approach will hopefully assist in overcoming
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xi
xii
Preface
the natural initial perception held by many people that statistics is just another
“math course.’’ The introduction of major topics through the use of case studies
provides a focus on the central nature of applied statistics in a wide variety of
research and business-related studies. These case studies will hopefully provide the
reader with an enthusiasm for the broad applicability of statistics and the statistical
thought process that the authors have found and used through their many years
of teaching, consulting, and R & D management. The following research studies
illustrate the types of studies we have used throughout the text.
●● Exit Polls Versus Election Results:
A study of why the exit polls
from 9 of 11 states in the 2004 presidential election predicted John
Kerry as the winner when in fact President Bush won 6 of the 11
states.
●● Evaluation of the Consistency of Property Assessors: A study to
determine if county property assessors differ systematically in their
determination of property values.
●● Effect of Timing of the Treatment of Port-Wine Stains with Lasers:
A prospective study that investigated whether treatment at a younger
age would yield better results than treatment at an older age.
●● Controlling for Student Background in the Assessment of Teaching:
An examination of data used to support possible improvements to
the No Child Left Behind program while maintaining the important
concepts of performance standards and accountability.
Each of the research studies includes a discussion of the whys and hows of the
study. We illustrate the use of the four-step learning from data process with each
case study. A discussion of sample size determination, graphical displays of the
data, and a summary of the necessary ingredients for a complete report of the statistical findings of the study are provided with many of the case studies.
Examples and Exercises
We have further enhanced the practical nature of statistics by using examples and
exercises from journal articles, newspapers, and the authors’ many consulting
experiences. These will provide the students with further evidence of the practical
usages of statistics in solving problems that are relevant to their everyday lives.
Many new exercises and examples have been included in this edition of the book.
The number and variety of exercises will be a great asset to both the instructor and
students in their study of statistics.
Topics Covered
This book can be used for either a one-semester or a two-semester course. Chapters
1 through 11 would constitute a one-semester course. The topics covered would
include
Chapter 1—Statistics and the scientific method
Chapter 2—Using surveys and experimental studies to gather data
Chapters 3 & 4—Summarizing data and probability distributions
Chapters 5–7—Analyzing data: inferences about central values and
variances
Chapters 8 & 9—One-way analysis of variance and multiple
comparisons
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface
xiii
Chapter 10—Analyzing data involving proportions
Chapter 11—Linear regression and correlation
The second semester of a two-semester course would then include model building
and inferences in multiple regression analysis, logistic regression, design of experiments, and analysis of variance:
Chapters 11–13—Regression methods and model building: multiple regression and the general linear model, logistic regression, and building
regression models with diagnostics
Chapters 14–19—Design of experiments and analysis of variance: design
concepts, analysis of variance for standard designs, analysis of covariance, random and mixed effects models, split-plot designs, repeated
measures designs, crossover designs, and unbalanced designs
Emphasis on Interpretation, not Computation
In the book are examples and exercises that allow the student to study how to
calculate the value of statistical estimators and test statistics using the definitional
form of the procedure. After the student becomes comfortable with the aspects of
the data the statistical procedure is reflecting, we then emphasize the use of computer software in making computations in the analysis of larger data sets. We provide
output from three major statistical packages: SAS, Minitab, and SPSS. We find that
this approach provides the student with the experience of computing the value of the
procedure using the definition; hence, the student learns the basics b
ehind each procedure. In most situations beyond the statistics course, the student should be using
computer software in making the computations for both e xpedience and quality of
calculation. In many exercises and examples, the use of the computer allows for more
time to emphasize the interpretation of the results of the computations without having to expend enormous amounts of time and effort in the actual computations.
In numerous examples and exercises, the importance of the following aspects
of hypothesis testing are demonstrated:
1. The statement of the research hypothesis through the summarization
of the researcher’s goals into a statement about population
parameters.
2. The selection of the most appropriate test statistic, including sample
size computations for many procedures.
3. The necessity of considering both Type I and Type II error
rates (a and b) when discussing the results of a statistical test of
hypotheses.
4. The importance of considering both the statistical significance and
the practical significance of a test result. Thus, we illustrate the
importance of estimating effect sizes and the construction of confidence intervals for population parameters.
5. The statement of the results of the statistical test in nonstatistical
jargon that goes beyond the statement ‘‘reject H0’’ or ‘‘fail to
reject H0.’’
New to the Seventh Edition
●● There are instructions on the use of R code. R is a free software package
that can be downloaded from http:/ /lib.stat.cmu.edu/R/CRAN.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xiv
Preface
Click your choice of platform (Linux, MacOS X, or Windows) for the
precompiled binary distribution. Note the FAQs link to the left for
additional information. Follow the instructions for installing the base
system software (which is all you will need).
●● New examples illustrate the breadth of applications of statistics to
real-world problems.
●● An alternative to the standard deviation, MAD, is provided as a
measure of dispersion in a population/sample.
●● The use of bootstrapping in obtaining confidence intervals and
p-values is discussed.
●● Instructions are included on how to use R code to obtain percentiles
and probabilities from the following distributions: normal, binomial,
Poisson, chi-squared, F, and t.
●● A nonparametric alternative to the Pearson correlation coefficient:
Spearman’s rank correlation, is provided.
●● The binomial test for small sample tests of proportions is presented.
●● The McNemar test for paired count data has been added.
●● The Akaike information criterion and Bayesian information criterion
for variable selection are discussed.
Additional Features Retained from Previous Editions
●● Many practical applications of statistical methods and data analysis
from agriculture, business, economics, education, engineering, medicine, law, political science, psychology, environmental studies, and
sociology have been included.
●● The seventh edition contains over 1,000 exercises, with nearly 400 of
the exercises new.
●● Computer output from Minitab, SAS, and SPSS is provided in
numerous examples. The use of computers greatly facilitates the use
of more sophisticated graphical illustrations of statistical results.
●● Attention is paid to the underlying assumptions. Graphical
procedures and test procedures are provided to determine if assumptions have been violated. Furthermore, in many settings, we provide
alternative procedures when the conditions are not met.
●● The first chapter provides a discussion of “What Is Statistics?” We
provide a discussion of why students should study statistics along with
a discussion of several major studies that illustrate the use of statistics
in the solution of real-life problems.
Ancillaries
Student Solutions Manual (ISBN-10: 1-305-26948-9;
ISBN-13: 978-1-305-26948-4), containing select worked solutions
for problems in the textbook.
l A Companion Website at www.cengage.com/statistics/ott, containing
downloadable data sets for Excel, Minitab, SAS, SPSS, and others,
plus additional resources for students and faculty.
l
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface
xv
Acknowledgments
There are many people who have made valuable, constructive suggestions for
the development of the original manuscript and during the preparation of the
subsequent editions. We are very appreciative of the insightful and constructive
comments from the following reviewers:
Naveen Bansal, Marquette University
Kameryn Denaro, San Diego State University
Mary Gray, American University
Craig Leth-Steensen, Carleton University
Jing Qian, University of Massachusetts
Mark Riggs, Abilene Christian University
Elaine Spiller, Marquette University
We are also appreciate of the preparation assistance received from Molly Taylor
and Jay Campbell; the scheduling of the revisions by Mary Tindle, the Senior
Project Manager at Cenveo Publisher Services, who made sure that the book
was completed in a timely manner. The authors of the solutions manual, Soma
Roy, California Polytechnic State University, and John Draper, The Ohio State
University, provided me with excellent input which resulted in an improved set of
exercises for the seventh edition. The person who assisted me the greatest degree
in the preparation of the seventh edition, was Sherry Goldbecker, the copy editor.
Sherry not only corrected my many grammatical errors but also provided rephrasing of many sentences which made for a more straight forward explanation of statistical concepts. The students, who use this book in their statistics classes, will be
most appreciative of Sherry’s many contributions.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PART
1
Introduction
Chapter 1
St atistic s a nd the Sc ientific Method
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 1
1.1
Introduction
1.2
Why Study Statistics?
1.3
Some Current
Applications of Statistics
1.4 A Note to the Student
Statistics and
the Scientific
Method
1.1
1.5
Summary
1.6 Exercises
Introduction
Statistics is the science of designing studies or experiments, collecting data, and
modeling/analyzing data for the purpose of decision making and scientific discovery when the available information is both limited and variable. That is, statistics is
the science of Learning from Data.
Almost everyone, including social scientists, medical researchers, superintendents of public schools, corporate executives, market researchers, engineers,
government employees, and consumers, deals with data. These data could be in the
form of quarterly sales figures, percent increase in juvenile crime, contamination
levels in water samples, survival rates for patients undergoing medical therapy,
census figures, or information that helps determine which brand of car to purchase.
In this text, we approach the study of statistics by considering the four-step process
in Learning from Data: (1) defining the problem, (2) collecting the data, (3) summarizing the data, and (4) analyzing the data, interpreting the analyses, and communicating the results. Through the use of these four steps in Learning from Data,
our study of statistics closely parallels the Scientific Method, which is a set of principles and procedures used by successful scientists in their p
ursuit of knowledge.
The method involves the formulation of research goals, the design of observational
studies and/or experiments, the collection of data, the modeling/analysis of the
data in the context of research goals, and the testing of hypotheses. The conclusion
of these steps is often the formulation of new research goals for a nother study.
These steps are illustrated in the schematic given in Figure 1.1.
This book is divided into sections corresponding to the four-step process in
Learning from Data. The relationship among these steps and the chapters of the
book is shown in Table 1.1. As you can see from this table, much time is spent discussing how to analyze data using the basic methods presented in Chapters 5–19.
2
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.1
Introduction
3
FIGURE 1.1
Scientific Method
Schematic
Formulate research goal:
research hypotheses, models
Design study:
sample size, variables,
experimental units,
sampling mechanism
TABLE 1.1
Organization of the text
Formulate new
research goals:
new models,
new hypotheses
Make decisions:
written conclusions,
oral presentations
Collect data:
data management
Draw inferences:
graphs, estimation,
hypotheses testing,
model assessment
The Four-Step Process
Chapters
1 Defining the Problem
2 Collecting the Data
3 Summarizing the Data
4 Analyzing the Data,
Interpreting the Analyses,
and Communicating
the Results
1 Statistics and the Scientific Method
2 Using Surveys and Experimental Studies to Gather Data
3 Data Description
4 Probability and Probability Distributions
5 Inferences about Population Central Values
6 Inferences Comparing Two Population Central Values
7 Inferences about Population Variances
8 Inferences about More Than Two Population Central Values
9 Multiple Comparisons
10 Categorical Data
11 Linear Regression and Correlation
12 Multiple Regression and the General Linear Model
13 Further Regression Topics
14 Analysis of Variance for Completely Randomized Designs
15 Analysis of Variance for Blocked Designs
16 The Analysis of Covariance
17 Analysis of Variance for Some Fixed-, Random-, and
Mixed-Effects Models
18 Split-Plot, Repeated Measures, and Crossover Designs
19 Analysis of Variance for Some Unbalanced Designs
However, you must remember that for each data set requiring analysis, someone
has defined the problem to be examined (Step 1), developed a plan for collecting
data to address the problem (Step 2), and summarized the data and prepared the
data for analysis (Step 3). Then following the analysis of the data, the results of the
analysis must be interpreted and communicated either verbally or in written form
to the intended audience (Step 4).
All four steps are important in Learning from Data; in fact, unless the problem to be addressed is clearly defined and the data collection carried out properly,
the interpretation of the results of the analyses may convey misleading information because the analyses were based on a data set that did not address the problem
or that was incomplete and contained improper information. Throughout the text,
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4
Chapter 1
Statistics and the Scientific Method
we will try to keep you focused on the bigger picture of Learning from Data
through the four-step process. Most chapters will end with a summary section
that emphasizes how the material of the chapter fits into the study of statistics—
Learning from Data.
To illustrate some of the above concepts, we will consider four situations
in which the four steps in Learning from Data could assist in solving a real-world
problem.
1. Problem: Inspection of ground beef in a large beef-processing facility.
A beef-processing plant produces approximately half a million packages of ground beef per week. The government inspects packages
for possible improper labeling of the packages with respect to the
percent fat in the meat. The inspectors must open the ground beef
package in order to determine the fat content of the ground beef.
The inspection of every package would be prohibitively costly and
time consuming. An alternative approach is to select 250 packages
for inspection from the daily production of 100,000 packages. The
fraction of packages with improper labeling in the sample of 250
packages would then be used to estimate the fraction of packages
improperly labeled in the complete day’s production. If this fraction
exceeds a set specification, action is then taken against the meat
processor. In later chapters, a procedure will be formulated to determine how well the sample fraction of improperly labeled packages
approximates the fraction of improperly labeled packages for the
whole day’s output.
2. Problem: Is there a relationship between quitting smoking and
gaining weight? To investigate the claim that people who quit
smoking often experience a subsequent weight gain, researchers
selected a random sample of 400 participants who had successfully
participated in programs to quit smoking. The individuals were
weighed at the beginning of the program and again 1 year later.
The average change in weight of the participants was an increase of
5 pounds. The investigators concluded that there was evidence that
the claim was valid. We will develop techniques in later chapters to
assess when changes are truly significant changes and not changes
due to random chance.
3. Problem: What effect does nitrogen fertilizer have on wheat production?
For a study of the effects of nitrogen fertilizer on wheat production,
a total of 15 fields was available to the researcher. She randomly
assigned three fields to each of the five nitrogen rates under investigation. The same variety of wheat was planted in all 15 fields. The
fields were cultivated in the same manner until harvest, and the
number of pounds of wheat per acre was then recorded for each of
the 15 fields. The experimenter wanted to determine the optimal
level of nitrogen to apply to any wheat field, but, of course, she was
limited to running experiments on a limited number of fields. After
determining the amount of nitrogen that yielded the largest production of wheat in the study fields, the experimenter then concluded
that similar results would hold for wheat fields possessing characteristics somewhat the same as the study fields. Is the experimenter
justified in reaching this conclusion?
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.1
Introduction
5
4. Problem: Determining public opinion toward a question, issue,
product, or candidate. Similar applications of statistics are brought
to mind by the frequent use of the New York Times/CBS News,
Washington Post/ABC News, Wall Street Journal/NBC News, Harris,
Gallup/Newsweek, and CNN/Time polls. How can these pollsters
determine the opinions of more than 195 million Americans who are
of voting age? They certainly do not contact every potential voter in
the United States. Rather, they sample the opinions of a small number of potential voters, perhaps as few as 1,500, to estimate the reaction of every person of voting age in the country. The amazing result
of this process is that if the selection of the voters is done in an unbiased way and voters are asked unambiguous, nonleading questions,
the fraction of those persons contacted who hold a particular opinion
will closely match the fraction in the total population holding that
opinion at a particular time. We will supply convincing supportive
evidence of this assertion in subsequent chapters.
These problems illustrate the four-step process in Learning from Data.
First, there was a problem or question to be addressed. Next, for each problem a study or experiment was proposed to collect meaningful data to solve the
problem. The government meat inspection agency had to decide both how many
packages to inspect per day and how to select the sample of packages from the
total daily output in order to obtain a valid prediction. The polling groups had to
decide how many voters to sample and how to select these individuals in order
to obtain information that is representative of the population of all voters. Similarly, it was necessary to carefully plan how many participants in the weight-gain
study were needed and how they were to be selected from the list of all such
participants. Furthermore, what variables did the researchers have to measure
on each participant? Was it necessary to know each participant’s age, sex, physical fitness, and other health-related variables, or was weight the only important
variable? The results of the study may not be relevant to the general population
if many of the participants in the study had a particular health condition. In the
wheat experiment, it was important to measure both the soil characteristics of
the fields and the environmental conditions, such as temperature and rainfall, to
obtain results that could be generalized to fields not included in the study. The
design of a study or experiment is crucial to obtaining results that can be generalized beyond the study.
Finally, having collected, summarized, and analyzed the data, it is important
to report the results in unambiguous terms to interested people. For the meat
inspection example, the government inspection agency and the personnel in the
beef-processing plant would need to know the distribution of fat content in the
daily production of ground beef. Based on this distribution, the agency could then
impose fines or take other remedial actions against the production facility. Also,
knowledge of this distribution would enable company production personnel to
make adjustments to the process in order to obtain acceptable fat content in their
ground beef packages. Therefore, the results of the statistical analyses cannot
be presented in ambiguous terms; decisions must be made from a well-defined
knowledge base. The results of the weight-gain study would be of vital interest to
physicians who have patients participating in the smoking-cessation program. If
a significant increase in weight was recorded for those individuals who had quit
smoking, physicians would have to recommend diets so that the former smokers
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
6
Chapter 1
Statistics and the Scientific Method
FIGURE 1.2
Population and sample
Set of all measurements:
the population
Set of measurements
selected from the
population:
the sample
population
sample
would not go from one health problem (smoking) to another (elevated blood
pressure due to being overweight). It is crucial that a careful description of the
participants—that is, age, sex, and other health-related information—be included
in the report. In the wheat study, the experiment would provide farmers with
information that would allow them to economically select the optimum amount of
nitrogen required for their fields. Therefore, the report must contain information
concerning the amount of moisture and types of soils present on the study fields.
Otherwise, the conclusions about optimal wheat production may not pertain to
farmers growing wheat under considerably different conditions.
To infer validly that the results of a study are applicable to a larger group
than just the participants in the study, we must carefully define the population
(see Definition 1.1) to which inferences are sought and design a study in which the
sample (see Definition 1.2) has been appropriately selected from the designated
population. We will discuss these issues in Chapter 2.
DEFINITION 1.1
A population is the set of all measurements of interest to the sample collector.
(See Figure 1.2.)
DEFINITION 1.2
A sample is any subset of measurements selected from the population.
(See Figure 1.2.)
1.2
Why Study Statistics?
We can think of many reasons for taking an introductory course in statistics. One
reason is that you need to know how to evaluate published numerical facts. Every
person is exposed to manufacturers’ claims for products; to the results of sociological, consumer, and political polls; and to the published results of scientific
research. Many of these results are inferences based on sampling. Some inferences are valid; others are invalid. Some are based on samples of adequate size;
others are not. Yet all these published results bear the ring of truth. Some people (particularly statisticians) say that statistics can be made to support almost
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.2
Why Study Statistics?
7
anything. Others say it is easy to lie with statistics. Both statements are true. It
is easy, purposely or unwittingly, to distort the truth by using statistics when
presenting the results of sampling to the uninformed. It is thus crucial that you
become an informed and critical reader of data-based reports and articles.
A second reason for studying statistics is that your profession or employment
may require you to interpret the results of sampling (surveys or experimentation)
or to employ statistical methods of analysis to make inferences in your work. For
example, practicing physicians receive large amounts of advertising describing
the benefits of new drugs. These advertisements frequently display the numerical
results of experiments that compare a new drug with an older one. Do such data
really imply that the new drug is more effective, or is the observed difference in
results due simply to random variation in the experimental measurements?
Recent trends in the conduct of court trials indicate an increasing use of
probability and statistical inference in evaluating the quality of evidence. The use
of statistics in the social, biological, and physical sciences is essential because all
these sciences make use of observations of natural phenomena, through sample
surveys or experimentation, to develop and test new theories. Statistical methods
are employed in business when sample data are used to forecast sales and profit.
In addition, they are used in engineering and manufacturing to monitor product
quality. The sampling of accounts is a useful tool to assist accountants in conducting audits. Thus, statistics plays an important role in almost all areas of science,
business, and industry; persons employed in these areas need to know the basic
concepts, strengths, and limitations of statistics.
The article “What Educated Citizens Should Know About Statistics and Probability,” by J. Utts (2003), contains a number of statistical ideas that need to be
understood by users of statistical methodology in order to avoid confusion in the
use of their research findings. Misunderstandings of statistical results can lead to
major errors by government policymakers, medical workers, and consumers of this
information. The article selected a number of topics for discussion. We will summarize some of the findings in the article. A complete discussion of all these topics
will be given throughout the book.
1. One of the most frequent misinterpretations of statistical findings
is when a statistically significant relationship is established between
two variables and it is then concluded that a change in the explanatory variable causes a change in the response variable. As will be
discussed in the book, this conclusion can be reached only under
very restrictive constraints on the experimental setting. Utts examined a recent Newsweek article discussing the relationship between
the strength of religious beliefs and physical healing. Utts’ article
discussed the problems in reaching the conclusion that the stronger
a patient’s religious beliefs, the more likely the patient would be
cured of his or her ailment. Utts showed that there are numerous
other factors involved in a patient’s health and the conclusion that
religious beliefs cause a cure cannot be validly reached.
2. A common confusion in many studies is the difference between
(statistically) significant findings in a study and (practically) significant findings. This problem often occurs when large data sets are
involved in a study or experiment. This type of problem will be discussed in detail throughout the book. We will use a number of examples that will illustrate how this type of confusion can be avoided by
researchers when reporting the findings of their experimental results.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
8
Chapter 1
Statistics and the Scientific Method
Utts’ article illustrated this problem with a discussion of a study that
found a statistically significant difference in the average heights of
military recruits born in the spring and in the fall. There were 507,125
recruits in the study and the difference in average height was about
1/4 inch. So, even though there may be a difference in the actual average heights of recruits in the spring and the fall, the difference is so
small (1/4 inch) that it is of no practical importance.
3. The size of the sample also may be a determining factor in studies
in which statistical significance is not found. A study may not have
selected a sample size large enough to discover a difference between
the several populations under study. In many government-sponsored
studies, the researchers do not receive funding unless they are able
to demonstrate that the sample sizes selected for their study are of
an appropriate size to detect specified differences in populations if
in fact they exist. Methods to determine appropriate sample sizes
will be provided in the chapters on hypotheses testing and experimental design.
4. Surveys are ubiquitous, especially during the years in which national
elections are held. In fact, market surveys are nearly as widespread
as political polls. There are many sources of bias that can creep
into the most reliable of surveys. The manner in which people are
selected for inclusion in the survey, the way in which questions are
phrased, and even the manner in which questions are posed to the
subject may affect the conclusions obtained from the survey. We will
discuss these issues in Chapter 2.
5. Many students find the topic of probability to be very confusing. One
of these confusions involves conditional probability where the probability of an event occurring is computed under the condition that a
second event has occurred with certainty. For example, a new diagnostic test for the pathogen Escherichia coli in meat is proposed to
the U.S. Department of Agriculture (USDA). The USDA evaluates
the test and determines that the test has both a low false positive rate
and a low false negative rate. That is, it is very unlikely that the test
will declare the meat contains E. coli when in fact it does not contain
E. coli. Also, it is very unlikely that the test will declare the meat does
not contain E. coli when in fact it does contain E. coli. Although the
diagnostic test has a very low false positive rate and a very low false
negative rate, the probability that E. coli is in fact present in the meat
when the test yields a positive test result is very low for those situations in which a particular strain of E. coli occurs very infrequently.
In Chapter 4, we will demonstrate how this probability can be computed in order to provide a true assessment of the performance of a
diagnostic test.
6. Another concept that is often misunderstood is the role of the degree
of variability in interpreting what is a “normal” occurrence of some
naturally occurring event. Utts’ article provided the following example. A company was having an odor problem with its wastewater
treatment plant. It attributed the problem to “abnormal” rainfall during the period in which the odor problem was occurring. A company
official stated that the facility experienced 170% to 180% of its
“normal” rainfall during this period, which resulted in the water in
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.3
Some Current Applications of Statistics
9
the holding ponds t aking longer to exit for irrigation. Thus, there was
more time for the pond to develop an odor. The company official did
not point out that yearly rainfall in this region is extremely variable.
In fact, the historical range for rainfall is between 6.1 and 37.4 inches
with a median rainfall of 16.7 inches. The rainfall for the year of the
odor problem was 29.7 inches, which was well within the “normal”
range for rainfall. There was a confusion between the terms “average” and “normal” rainfall. The concept of natural variability is crucial to correct interpretation of statistical results. In this example, the
company official should have evaluated the percentile for an annual
rainfall of 29.7 inches in order to demonstrate the abnormality of
such a rainfall. We will discuss the ideas of data summaries and percentiles in Chapter 3.
The types of problems expressed above and in Utts’ article represent common
and important misunderstandings that can occur when researchers use statistics in
interpreting the results of their studies. We will attempt throughout the book to discuss possible misinterpretations of statistical results and how to avoid them in your
data analyses. More importantly, we want the reader of this book to become a discriminating reader of statistical findings, the results of surveys, and project reports.
1.3
Some Current Applications of Statistics
Defining the Problem: Obtaining Information
from Massive Data Sets
Data mining is defined to be a process by which useful information is obtained
from large sets of data. Data mining uses statistical techniques to discover patterns
and trends that are present in a large data set. In most data sets, important patterns
would not be discovered by using traditional data exploration techniques because
the types of relationships between the many variables in the data set are either too
complex or because the data sets are so large that they mask the relationships.
The patterns and trends discovered in the analysis of the data are defined
as data mining models. These models can be applied to many different situations,
such as:
●● Forecasting: Estimating future sales, predicting demands on a power
grid, or estimating server downtime
●● Assessing risk: Choosing the rates for insurance premiums, selecting
best customers for a new sales campaign, determining which medical
therapy is most appropriate given the physiological characteristics of
the patient
●● Identifying sequences: Determining customer preferences in online
purchases, predicting weather events
●● Grouping: Placing customers or events into cluster of related items,
analyzing and predicting relationships between demographic characteristics and purchasing patterns, identifying fraud in credit card
purchases
A new medical procedure referred to as gene editing has the potential to
assist thousands of people suffering many different diseases. An article in the
Houston Chronicle (2013 ), describes how data mining techniques are used to
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10
Chapter 1
Statistics and the Scientific Method
explore massive genomic data bases to interpret millions of bits of data in a person’s DNA. This information is then used to identify a single defective gene,
which is cut out, and splice in a correction. This area of research is referred to as
biomedical informatics and is based on the premise that the human body is a data
bank of incredible depth and complexity. It is predicted that by 2015, the average
hospital will have approximately 450 terabytes of patient data consisting of large,
complex images from CT scans, MRIs, and other imaging techniques. However,
only a small fraction of the current medical data has been analyzed, thus opening
huge opportunities for persons trained in data mining. In a case described in the
article, a 7-year-old boy tormented by scabs, blisters, and scars was given a new
lease on life by using data mining techniques to discover a single letter in his faulty
genome.
Defining the Problem: Determining the Effectiveness
of a New Drug Product
The development and testing of the Salk vaccine for protection against poliomyelitis (polio) provide an excellent example of how statistics can be used in solving
practical problems. Most parents and children growing up before 1954 can recall
the panic brought on by the outbreak of polio cases during the summer months.
Although relatively few children fell victim to the disease each year, the pattern
of outbreak of polio was unpredictable and caused great concern because of the
possibility of paralysis or death. The fact that very few of today’s youth have even
heard of polio demonstrates the great success of the vaccine and the testing program that preceded its release on the market.
It is standard practice in establishing the effectiveness of a particular drug product to conduct an experiment (often called a clinical trial) with human participants.
For some clinical trials, assignments of participants are made at random, with half
receiving the drug product and the other half receiving a solution or tablet that does
not contain the medication (called a placebo). One statistical problem concerns the
determination of the total number of participants to be included in the clinical trial.
This problem was particularly important in the testing of the Salk vaccine because
data from previous years suggested that the incidence rate for polio might be less
than 50 cases for every 100,000 children. Hence, a large number of participants had
to be included in the clinical trial in order to detect a difference in the incidence rates
for those treated with the vaccine and those receiving the placebo.
With the assistance of statisticians, it was decided that a total of 400,000
children should be included in the Salk clinical trial begun in 1954, with half of them
randomly assigned the vaccine and the remaining children assigned the placebo. No
other clinical trial had ever been attempted on such a large group of participants.
Through a public school inoculation program, the 400,000 participants were treated
and then observed over the summer to determine the number of children contracting polio. Although fewer than 200 cases of polio were reported for the 400,000
participants in the clinical trial, more than three times as many cases appeared in
the group receiving the placebo. These results, together with some statistical calculations, were sufficient to indicate the effectiveness of the Salk polio vaccine.
However, these conclusions would not have been possible if the statisticians and
scientists had not planned for and conducted such a large clinical trial.
The development of the Salk vaccine is not an isolated example of the use
of statistics in the testing and development of drug products. In recent years,
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.3
Some Current Applications of Statistics
11
the U.S. Food and Drug Administration (FDA) has placed stringent requirements
on pharmaceutical firms wanting to establish the effectiveness of proposed new
drug products. Thus, statistics has played an important role in the development
and testing of birth control pills, rubella vaccines, chemotherapeutic agents in the
treatment of cancer, and many other preparations.
Defining the Problem: lmproving the Reliability
of Evidence in Criminal Investigations
The National Academy of Sciences released a report (National Research Council,
2009) in which one of the more important findings was the need for applying statistical methods in the design of studies used to evaluate inferences from evidence
gathered by forensic technicians. The following statement is central to the report:
“Over the last two decades, advances in some forensic science disciplines, especially the use of DNA technology, have demonstrated that some areas of forensic science have great additional potential to help law enforcement identify
criminals. . . . Those advances, however, also have revealed that, in some cases,
substantive information and testimony based on faulty forensic science analyses may have contributed to wrongful convictions of innocent people. This fact
has demonstrated the potential danger of giving undue weight to evidence and
testimony derived from imperfect testing and analysis.”
There are many sources that may impact the accuracy of conclusions inferred
from the crime scene evidence and presented to a jury by a forensic investigator.
Statistics can play a role in improving forensic analyses. Statistical principles can
be used to identify sources of variation and quantify the size of the impact that
these sources of variation can have on the conclusions reached by the forensic
investigator.
An illustration of the impact of an inappropriately designed study and
statistical analysis on the conclusions reached from the evidence obtained at
a crime scene can be found in Spiegelman et al. (2007). They demonstrate that
the evidence used by the FBI crime lab to support the claim that there was not
a second assassin of President John F. Kennedy was based on a faulty analysis
of the data and an overstatement of the results of a method of forensic testing
called Comparative Bullet Lead Analysis (CBLA). This method applies a chemical analysis to link a bullet found at a crime scene to the gun that had discharged
the bullet. Based on evidence from chemical analyses of the recovered bullet fragments, the 1979 U.S. House Select Committee on Assassinations concluded that all
the bullets striking President Kennedy were fired from Lee Oswald’s rifle. A new
analysis of the bullets using more appropriate statistical analyses demonstrated
that the evidence presented in 1979 was overstated. A case is presented for a new
analysis of the assassination bullet fragments, which may shed light on whether the
five bullet fragments found in the Kennedy assassination are derived from three or
more bullets and not just two bullets, as was presented as the definitive evidence
that Oswald was the sole shooter in the assassination of President Kennedy.
Defining the Problem: Estimating Bowhead Whale
Population Size
Raftery and Zeh (1998) discuss the estimation of the population size and rate of
increase in bowhead whales, Balaena mysticetus. The importance of such a study
derives from the fact that bowheads were the first species of great whale for
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
12
Chapter 1
Statistics and the Scientific Method
which commercial whaling was stopped; thus, their status indicates the recovery
prospects of other great whales. Also, the International Whaling Commission
uses these estimates to determine the aboriginal subsistence whaling quota for
Alaskan Eskimos. To obtain the necessary data, researchers conducted a visual
and acoustic census off Point Barrow, Alaska. The researchers then applied statistical models and estimation techniques to the data obtained in the census to
determine whether the bowhead population had increased or decreased since
commercial whaling was stopped. The statistical estimates showed that the
bowhead population was increasing at a healthy rate, indicating that stocks of
great whales that have been decimated by commercial hunting can recover after
hunting is discontinued.
Defining the Problem: Ozone Exposure
and Population Density
Ambient ozone pollution in urban areas is one of the nation’s most pervasive environmental problems. Whereas the decreasing stratospheric ozone layer may lead
to increased instances of skin cancer, high ambient ozone intensity has been shown
to cause damage to the human respiratory system as well as to agricultural crops
and trees. The Houston, Texas, area has ozone concentrations and are rated second only to those of Los Angeles. that exceed the National Ambient Air Quality
Standard. Carroll et al. (1997) describe how to analyze the hourly ozone measurements collected in Houston from 1980 to 1993 by 9 to 12 monitoring stations.
Besides the ozone level, each station recorded three meteorological variables:
temperature, wind speed, and wind direction.
The statistical aspect of the project had three major goals:
1. Provide information (and/or tools to obtain such information)
about the amount and pattern of missing data as well as about the
quality of the ozone and the meteorological measurements.
2. Build a model of ozone intensity to predict the ozone concentration
at any given location within Houston at any given time between 1980
and 1993.
3. Apply this model to estimate exposure indices that account for
either a long-term exposure or a short-term high-concentration
exposure; also, relate census information to different exposure
indices to achieve population exposure indices.
The spatial–temporal model the researchers built provided estimates demonstrating that the highest ozone levels occurred at locations with relatively small
populations of young children. Also, the model estimated that the exposure of
young children to ozone decreased by approximately 20% from 1980 to 1993. An
examination of the distribution of population exposure had several policy implications. In particular, it was concluded that the current placement of monitors
is not ideal if one is concerned with assessing population exposure. This project
involved all four components of Learning from Data: planning where the monitoring stations should be placed within the city, how often the data should be
collected, and what variables should be recorded; conducting spatial–temporal
graphing of the data; creating spatial–temporal models of the ozone data, meteorological data, and demographic data; and, finally, writing a report that could
assist local and federal officials in formulating policy with respect to decreasing
ozone levels.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.5
Summary
13
Defining the Problem: Assessing Public Opinion
Public opinion, consumer preference, and election polls are commonly used to
assess the opinions or preferences of a segment of the public regarding issues,
products, or candidates of interest. We, the American public, are exposed to the
results of these polls daily in newspapers, in magazines, on the internet, on the
radio, and on television. For example, the results of polls related to the following
subjects were printed in local newspapers:
●● Public confidence in the potential for job growth in the coming year
●● Reactions of Texas residents to the state legislature’s failure to expand
Medicaid coverage
●● Voters’ preferences for tea party candidates in the fall congressional
elections
●● Attitudes toward increasing the gasoline tax in order to increase
funding for road construction and maintenance
●● Product preference polls related to specific products (Toyota vs. Ford,
DirecTV vs. Comcast, Dell vs. Apple, Subway vs. McDonald’s)
●● Public opinion on a national immigration policy
A number of questions can be raised about polls. Suppose we consider a poll
on the public’s opinion on a proposed income tax increase in the state of Michigan.
What was the population of interest to the pollster? Was the pollster interested in
all residents of Michigan or just those citizens who currently pay income taxes?
Was the sample in fact selected from this population? If the population of interest
was all persons currently paying income taxes, did the pollster make sure that all
the individuals sampled were current taxpayers? What questions were asked and
how were the questions phrased? Was each person asked the same question? Were
the questions phrased in such a manner as to bias the responses? Can we believe
the results of these polls? Do these results “represent’’ how the general public
currently feels about the issues raised in the polls?
Opinion and preference polls are an important, visible application of statistics for the consumer. We will discuss this topic in more detail in Chapters 2 and
10. We hope that after studying this material you will have a better understanding
of how to interpret the results of these polls.
1.4 A Note to the Student
We think with words and concepts. A study of the discipline of statistics requires
us to memorize new terms and concepts (as does the study of a foreign language).
Commit these definitions, theorems, and concepts to memory.
Also, focus on the broader concept of making sense of data. Do not let details
obscure these broader characteristics of the subject. The teaching objective of this
text is to identify and amplify these broader concepts of statistics.
1.5
Summary
The discipline of statistics and those who apply the tools of that discipline deal
with Learning from Data. Medical researchers, social scientists, accountants,
agronomists, consumers, government leaders, and professional statisticians are all
involved with data collection, data summarization, data analysis, and the effective
communication of the results of data analysis.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
14
Chapter 1
Statistics and the Scientific Method
1.6
Exercises
1.1
Introduction
Bio.
1.1 H
ansen (2006) describes a study to assess the migration and survival of salmon released
from fish farms located in Norway. The mingling of escaped farmed salmon with wild salmon
raises several concerns. First, the assessment of the abundance of wild salmon stocks will be
biased if there is a presence of large numbers of farmed salmon. Second, potential interbreeding between farmed and wild salmon may result in a reduction in the health of the wild stocks.
Third, diseases present in farmed salmon may be transferred to wild salmon. Two batches of
farmed salmon were tagged and released in two locations, one batch of 1,996 fish in northern
Norway and a second batch of 2,499 fish in southern Norway. The researchers recorded the
time and location at which the fish were captured by either commercial fisherman or anglers
in fresh water. Two of the most important pieces of information to be determined by the
study were the distance from the point of the fish’s release to the point of its capture and the
length of time it took for the fish to be captured.
a. Identify the population that is of interest to the researchers.
b. Describe the sample.
c. What characteristics of the population are of interest to the researchers?
d. If the sample measurements are used to make inferences about the population
characteristics, why is a measure of reliability of the inferences important?
Env.
1.2
Soc.
1.3 In 2014, Congress cut $8.7 billion from the Supplemental Nutrition Assistance Program
(SNAP), more commonly referred to as food stamps. The rationale for the decrease is that
providing assistance to people will result in the next generation of citizens being more dependent on the government for support. Hoynes (2012) describes a study to evaluate this claim. The
study examines 60,782 families over the time period of 1968 to 2009 which is subsequent to the
introduction of the Food Stamp Program in 1961. This study examines the impact of a positive and policy-driven change in economic resources available in utero and during childhood
on the economic health of individuals in adulthood. The study assembled data linking family
background in early childhood to adult health and economic outcomes. The study concluded
that the Food Stamp Program has effects decades after initial exposure. Specifically, access
to food stamps in childhood leads to a significant reduction in the incidence of metabolic
syndrome (obesity, high blood pressure, and diabetes) and, for women, an increase in economic self-sufficiency. Overall, the results suggest substantial internal and external benefits
of SNAP.
a. Identify the population that is of interest to the researchers.
b. Describe the sample.
c. What characteristics of the population are of interest to the researchers?
d. If the sample measurements are used to make inferences about the population
characteristics, why is a measure of reliability of the inferences important?
During 2012, Texas had listed on FracFocus, an industry fracking disclosure site, nearly
6,000 oil and gas wells in which the fracking methodology was used to extract natural gas.
Fontenot et al. (2013 ) reports on a study of 100 private water wells in or near the Barnett Shale
in Texas. There were 91 private wells located within 5 km of an active gas well using fracking, 4
private wells with no gas wells located within a 14 km radius, and 5 wells outside of the Barnett
Shale with no gas well located with a 60 km radius. They found that there were elevated levels
of potential contaminants such as arsenic and selenium in the 91 wells closest to natural gas
extraction sites compared to the 9 wells that were at least 14 km away from an active gas well
using the £racking technique to extract natural gas.
a. Identify the population that is of interest to the researchers.
b. Describe the sample.
c. What characteristics of the population are of interest to the researchers?
d. If the sample measurements are used to make inferences about the population
characteristics, why is a measure of reliability of the inferences important?
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.6
Exercises
15
Med.
1.4 Of all sports, football accounts for the highest incidence of concussion in the United States
due to the large number of athletes participating and the nature of the sport. While there is general agreement that concussion incidence can be reduced by making rule changes and teaching
proper tackling technique, there remains debate as to whether helmet design may also reduce the
incidence of concussion. Rowson et al. (2014) report on a retrospective analysis of head impact
data collected between 2005 and 2010 from eight collegiate football teams. Concussion rates for
players wearing two types of helmets, Riddell VSR4 and Riddell Revolution, were compared. A
total of 1,281,444 head impacts were recorded, from which 64 concussions were diagnosed. The
relative risk of sustaining a concussion in a Revolution helmet compared with a VSR4 helmet
was 46.1%. This study illustrates that differences in the ability to reduce concussion risk exist
between helmet models in football. Although helmet design may never prevent all concussions
from occurring in football, evidence illustrates that it can reduce the incidence of this injury.
a. Identify the population that is of interest to the researchers.
b. Describe the sample.
c. What characteristics of the population are of interest to the researchers?
d. If the sample measurements are used to make inferences about the population
characteristics, why is a measure of reliability of the inferences important?
Pol. Sci.
1.5 During the 2004 senatorial campaign in a large southwestern state, illegal immigration was
a major issue. One of the candidates argued that illegal immigrants made use of educational
and social services without having to pay property taxes. The other candidate pointed out that
the cost of new homes in their state was 20–30% less than the national average due to the low
wages received by the large number of illegal immigrants working on new home construction. A
random sample of 5,500 registered voters was asked the question, “Are illegal immigrants generally a benefit or a liability to the state’s economy?” The results were as follows: 3,500 people
responded “liability,” 1,500 people responded “benefit,” and 500 people responded “uncertain.”
a. What is the population of interest?
b. What is the population from which the sample was selected?
c. Does the sample adequately represent the population?
d. If a second random sample of 5,000 registered voters was selected, would the
results be nearly the same as the results obtained from the initial sample of
5,000 voters? Explain your answer.
Edu.
1.6 An American history professor at a major university was interested in knowing the history
literacy of college freshmen. In particular, he wanted to find what proportion of college freshmen
at the university knew which country controlled the original 13 colonies prior to the American
Revolution. The professor sent a questionnaire to all freshman students enrolled in HIST 101 and
received responses from 318 students out of the 7,500 students who were sent the questionnaire.
One of the questions was “What country controlled the original 13 colonies prior to the American
Revolution?”
a. What is the population of interest to the professor?
b. What is the sampled population?
c. Is there a major difference in the two populations. Explain your answer.
d. Suppose that several lectures on the American Revolution had been given in
HIST 101 prior to the students receiving the questionnaire. What possible source
of bias has the professor introduced into the study relative to the population of
interest?
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PART
2
Collecting Data
Chapter 2
U sing Surveys and Ex perim ental Studi es
to G ather Data
Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 2
Using Surveys
and Experimental
Studies to Gather
Data
2.1
2.1
Introduction and
Abstract of Research
Study
2.2
Observational Studies
2.3
Sampling Designs for
Surveys
2.4 Experimental Studies
2.5
Designs for Experimental
Studies
2.6 Research Study: Exit
Polls Versus Election
Results
2.7
Summary
2.8 Exercises
Introduction and Abstract of Research Study
As mentioned in Chapter 1, the first step in Learning from Data is to define the
problem. The design of the data collection process is the crucial step in intelligent data gathering. The process takes a conscious, concerted effort focused on the
following steps:
●● Specifying the objective of the study, survey, or experiment
●● Identifying the variable(s) of interest
●● Choosing an appropriate design for the survey or experimental study
●● Collecting the data
To specify the objective of the study, you must understand the problem being
addressed. For example, the transportation department in a large city wants to
assess the public’s perception of the city’s bus system in order to increase the use
of buses within the city. Thus, the department needs to determine what aspects of
the bus system determine whether or not a person will ride the bus. The objective
of the study is to identify factors that the transportation department can alter to
increase the number of people using the bus system.
To identify the variables of interest, you must examine the objective of the
study. For the bus system,...
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Achiever Papers is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Dissertation Writing Service Works
First, you will need to complete an order form. It's not difficult but, if anything is unclear, you may always chat with us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download