Unit 4 Measures of Dispersion 1 question data set and calculate the standard deviation

Take the numbers of second ten cases (64, 56, 17, 38, 94, 78, 101, 71, 63, 65) from Exercise 2 (p. 92) as your data set and calculate the standard deviation. Show your work step by step as that of in-class exercise 4.3. Unit 4 Measures of Dispersion
A. Introduction
1. Unit 3 has discussed the central tendency. We are going to examined measures of
dispersion in Unit 4.
2. Measures of dispersion (variation or variability) answer the question—how typical is the
average value?
3. Range, variance, and standard deviation are the three commonly used measures of
dispersion. Following table presents the statistics of students’ performance in an exam.
You may find the count is 30 (N = 30). It means that 30 students had participated this
exam. You may also find the value of range (44.00), standard deviation (9.78002), and
variance (95.64888). How can we calculate those values? How can we interpret those
values? Following sections will provide useful guidelines.
B. Range
1. The Range is the difference between the minimum value (L) and maximum value (H) in
the data set. The Range = H – L
2. In the above table, the minimum value (L) is 56 and the maximum value (H) is 100. As
such, the range is 44 (100 – 56). It is that simple!
3. The range can sometimes be misleading when there are extremely high or low values. So
we may be better off using variance or standard deviation.
C. Variance
1. Variance is the expectation of the squared deviation of a random variable from its mean.
Informally, it measures how far a set of (random) numbers are spread out from their
average value.
2. Because the computation of variance is based upon the mean, it requires interval or ratio
level data. Following presents the formula of variance for population and sample:
(1) Population: ² = f(x – )²/N
(2) Sample: s² = f(x – x-bar)²/N – 1
1
3. The main difference between the formula of population and sample is the denominator.
The denominator in the formula of population is “N” while and the denominator in the
formula of sample is “N – 1,” a concept named degree of freedom that will be introduced
in Unit 6. Hence, the text uses a “compromised” formula: s² = f(x – x-bar)²/N
In this unit and its homework, we will use this compromised formula to calculate variance.
D. Standard Deviation
1. The standard deviation is a measure of the amount of variation or dispersion of a set of
values. A low standard deviation indicates that the values tend to be close to the mean of
the set, while a high standard deviation indicates that the values are spread out over a
wider range.
2. In short, the standard deviation is the positive square root of
the variance. Following presents the formula of standard
deviation for population and sample, respectively:
(1) Population:  = ²
(2) Sample: s = s²
(3) Let’s take another look of the statistics of students’
performance in an exam, in which standard deviation is
9.78002 and variance is 95.64888. Use your calculator to
take a positive square root of the variance (95.64888),
you will see the standard deviation as 9.78002 (rounding).
Is it a coincident? No, this will always be the case as it is a
property of the standard deviation.
In-Class Exercise 4.1: Following table presents the statistical approach that may fit to the
program of SPSS (Statistical Package for Social Science). This table consists of five columns,
and you’ve learned the first three columns in Unit 3. The fourth column presents the dispersion
from the mean (x-bar) of each measurement (X). The fifth column presents the frequency
multiplying the dispersion from the mean of each measurement (f(x – x-bar)²). How can we find
out the variance and standard deviation?
How to calculate variance and standard deviation of table 5.1 (when all f is one)
x
f
fx
(x – x-bar)
f(x – x-bar)²
6
1
6
(6 – 4) = 2
4
5
1
5
(5 – 4) = 1
1
4
1
4
(4 – 4) = 0
0
3
1
3
(3 – 4) = –1
1
2
1
2
(2 – 4) = –2
4
N=5
∑fX = 20
∑f(X – X-bar) = 0 ∑f(X – X-bar)²=10
First, we need to find out the mean (you’ve learned that from prior chapter) for the
calculation of the fourth column. Mean: X-bar = ∑fX / N = 20 / 5 = 4. Then we find out
that the sum of dispersions of individual measurements from the mean is zero. To
continue our calculation, we have to square the value of each dispersion in the fifth
2
column. Then you will obtain 10 by adding each squared dispersions (4+1+0+1+4 = 10).
Now you may calculate the variance as below (the same color is sued helping you to
trace):
Variance: s² = f(x – x-bar)²/N = 10 / 5 = 2
After you obtain the variance, you just take a positive square root of the variance, you
will obtain the standard deviation as below:
Standard deviation: s = s² = 2 = 1.4142135624 = 1.4 (rounding to the nearest tenth)
In-Class Exercise 4.2: Following table presents the statistical approach that may fit to the
program of SPSS (Statistical Package for Social Science) when not all frequency is one. This
table consists of six columns, and you’ve learned the first three columns in Unit 3. The fourth
column presents the dispersion from the mean (x-bar) of each measurement (X). The fifth
column presents the squared dispersions ((x – x-bar)²). The sixth column presents the frequency
multiplying the dispersion from the mean of each measurement (f(x – x-bar)²). How can we find
out the variance and standard deviation?
How to calculate variance and standard deviation of table 5.2 (when not all f is one)
x
f
fx
(x – x-bar)
(x – x-bar)²
f(x – x-bar)²
10
1
10
(10 – 4.9) = 5.1
26.01
26.01
9
2
18
(9 – 4.9) = 4.1
16.81
33.62
8
3
24
(8 – 4.9) = 3.1
9.61
28.83
7
1
7
(7 – 4.9) = 2.1
4.41
4.41
6
2
12
(6 – 4.9) = 1.1
1.21
2.42
5
12
60
(5 – 4.9) = 0.1
0.01
0.12
4
1
4
(4 – 4.9) = –0.9
0.81
0.81
3
1
3
(3– 4.9) = –1.9
3.61
3.61
2
3
6
(2 – 4.9) = –2.9
8.41
25.23
1
2
2
(1 – 4.9) = –3.9
15.21
30.42
0
2
0
(0 – 4.9) = –4.9
24.01
48.02
N = 30 ∑fX = 146 ∑f(X – X-bar) = 1.1
∑f(X – XShould be 0, because
bar)² = 203.5
of rounding to the
nearest tenth of the
mean
Mean: X-bar = ∑fX / N = 146 / 30 = 4.8666667 = 4.9 (rounding to the nearest tenth)
Variance: s² = f(x – x-bar)²/N = 203.5 / 30 = 6.7833333333
Standard deviation: s = s² = 6.7833333333 = 2.6044833141 = 2.6 (rounding to the
nearest tenth)
3
In-Class Exercise 4.3: Extract the first ten cases from Exercise 2 (p. 92) as your data set and
calculate the standard deviation. Show your work step by step.
The value of first ten cases: 70, 35, 86, 81, 63, 71, 58, 53, 99, 85
Step1: Rank the value of first ten cases (x) as the first column as below
Step 2: Make the table as below:
x
f
fx
(x – x-bar)
35
1
35
35 – 70 = –35
53
1
53
53 – 70 = –17
58
1
58
58 – 70 = –12
63
1
63
63 – 70 = –7
70
1
70
70 – 70 = 0
71
1
71
71 – 70 = 1
81
1
81
81 – 70 = 11
85
1
85
85 – 70 = 15
86
1
86
86 – 70 = 16
99
1
99
99 – 70 = 29
N = 10 ∑fX = 701 ∑f(X – X-bar) = 1
Should be 0, because of
rounding to the nearest
tenth of the mean
f(x – x-bar)²
1225
289
144
49
0
1
121
225
256
841
∑f(X – X-bar)² = 3151
Step 3: Calculate the mean:
X-bar = ∑fX / N = 701 / 10 = 70.1 = 70 (rounding to the nearest integer)
Step 4: Complete the column (x – x-bar) = 1 (Note: the sum of dispersion should be zero because
of rounding of mean in step 3)
Step 5: Complete the column (x – x-bar)² = 3151
Step 6: Calculate the variance
s² = f(x – x-bar)²/N = 3151/10 = 315.1
Step 7: Calculate the standard deviation
s = s² = 315.1 = 17.75105631 = 17.8 (rounding to the nearest tenth)
4
Measures of Dispersion 93
a. Construct a frequency distribution from these data. Start with the highest score
(X) and proceed down to the lowest score. The f column will indicate the fre-
quency or how often the score occurred.
b. Calculate the mean, variance, and standard deviation.
3. In chapter 1, question 9 presented data from the New York City Police Depart-
ment on Index Crimes for the period 1990-2005. The following number of mur-
ders was reported:
1990: 2262
1995: 1181
1998: 629
2001:
2005:
649
540
a. Construct a frequency distribution from these data. Start with the highest score (X)
and proceed down to the lowest score. The fcolumn will indicate the frequency or
how often the score occurred.
b. Calculate the mean, variance, and standard deviation.
c. Construct a line graph using these data. Write a paragraph about the mean and the
pattern demonstrated in the graph.
DATA ANALYSIS
1. Using the State Data Set I and SPSS or Excel, calculate the mean, variance, and
standard deviation for the following index crimes for all 50 states in 1997 (Variable
Names: murder, rape, robbery, assault, and burglary). What can you say about
your results?
2. Using the State Data Set I and SPSS or Excel, calculate the mean, variance, and
standard deviation for the number of Handgun Homicides (Variable Name:
hghomi) in all 50 states. What can you say about your results?
3. With the State Data Set I, use SPSS or Excel to get a frequency distribution and
measures of dispersion (range, variance, and standard deviation) for the following
variables and interpret the results:
a. Number of Juveniles Arrested (under 18), 1997-Variable Name juvarr
b. Number of Adults Arrested, 1997-Variable Name adultarr
c. Number of Juveniles Arrested Violent Crimes (under 18), 1997-Variable Name
juvarrv
d. Number of Juveniles Arrested Property Crimes (under 18), 1997-Variable
Name juvarrp
e. Number of Adults Arrested Property Crimes, 1997-Variable Name adultarrp
4. Using the State Data Set II and SPSS or Excel, calculate the mean, variance, and
standard deviation for the following index crimes for all 50 states in 2003 (Variable
Names: murder, rape, robbery, assault, and burglary). What can you say about
your results?92
Chapter Five
■Calculating Measures of Dispersion with Excel
The measures of dispersion can be calculated with Excel also. Standard Devia-
tion, Variance, and Range are calculated using the same steps found in chapter 4 and
displayed in Table 4.3. Summary statistics in Excel provide measures of central ten-
dency and dispersion as well. For this reason, we will not repeat steps here and refer
you back to the previous chapter.
Summary
The deviation score-the distance between the mean of the distribution and each
score in it–helps determine the spread in scores. In this chapter, we calculate devia-
tion scores, the variance, and the standard deviation. Further analysis of the variance
is the heart of statistical explanation in criminal justice.
Unlike the index of dispersion and the range, the variance and standard deviation
are calculated for every score in the distribution. For this reason, they are much more
useful measures of dispersion.
These measures are used later when we attempt to determine the source of vari-
ability. Such an analysis can help us determine the factors that contribute to crime or
the success or failure of a program or policy. We discuss how these measures are used
in the next chapter.
KEY TERMS
Index of Dispersion: used to analyze dispersion of nominal level data and varies
between 0 and 1
Range (H-L): used to analyze dispersion of ordinal or higher data and is the distance
between the highest and lowest scores in a distribution
Variance: the mean of the squared deviations from the mean
Standard Deviation: the square root of the variance
Sum of Squares (Σfx²): the endpoint of the process involved in calculating the vari-
ance and standard deviation. First, determine the distance between each score and
the mean of the distribution. Then square these deviation scores, multiply them by
f, and sum these scores to arrive at the sum of squares.
EXERCISES
1. Return to the data for questions 1 and 2 in chapter 3 and calculate the variance and
standard deviation.
2. A court administrator wants to examine burglary case disposition times in his city.
A random sample of 50 burglary cases disposed of during the previous year is
drawn. The numbers that follow represent the number of days needed for each case:
70 35 86 81 63 71 58 53 99 85 64 56 17 38 94 78 101
71 63 65 58 49 88 70 51 61 80 67 53 74 73 29 64 48
98 78 67 65 76 59 50 65 98 91 66 64 69 86 63 7492
Chapter Five
■Calculating Measures of Dispersion with Excel
The measures of dispersion can be calculated with Excel also. Standard Devia-
tion, Variance, and Range are calculated using the same steps found in chapter 4 and
displayed in Table 4.3. Summary statistics in Excel provide measures of central ten-
dency and dispersion as well. For this reason, we will not repeat steps here and refer
you back to the previous chapter.
Summary
The deviation score-the distance between the mean of the distribution and each
score in it–helps determine the spread in scores. In this chapter, we calculate devia-
tion scores, the variance, and the standard deviation. Further analysis of the variance
is the heart of statistical explanation in criminal justice.
Unlike the index of dispersion and the range, the variance and standard deviation
are calculated for every score in the distribution. For this reason, they are much more
useful measures of dispersion.
These measures are used later when we attempt to determine the source of vari-
ability. Such an analysis can help us determine the factors that contribute to crime or
the success or failure of a program or policy. We discuss how these measures are used
in the next chapter.
KEY TERMS
Index of Dispersion: used to analyze dispersion of nominal level data and varies
between 0 and 1
Range (H-L): used to analyze dispersion of ordinal or higher data and is the distance
between the highest and lowest scores in a distribution
Variance: the mean of the squared deviations from the mean
Standard Deviation: the square root of the variance
Sum of Squares (Σfx²): the endpoint of the process involved in calculating the vari-
ance and standard deviation. First, determine the distance between each score and
the mean of the distribution. Then square these deviation scores, multiply them by
f, and sum these scores to arrive at the sum of squares.
EXERCISES
1. Return to the data for questions 1 and 2 in chapter 3 and calculate the variance and
standard deviation.
2. A court administrator wants to examine burglary case disposition times in his city.
A random sample of 50 burglary cases disposed of during the previous year is
drawn. The numbers that follow represent the number of days needed for each case:
70 35 86 81 63 71 58 53 99 85 64 56 17 38 94 78 101
71 63 65 58 49 88 70 51 61 80 67 53 74 73 29 64 48
98 78 67 65 76 59 50 65 98 91 66 64 69 86 63 74Measures of Dispersion 93
a. Construct a frequency distribution from these data. Start with the highest score
(X) and proceed down to the lowest score. The f column will indicate the fre-
quency or how often the score occurred.
b. Calculate the mean, variance, and standard deviation.
3. In chapter 1, question 9 presented data from the New York City Police Depart-
ment on Index Crimes for the period 1990-2005. The following number of mur-
ders was reported:
1990: 2262
1995: 1181
1998: 629
2001:
2005:
649
540
a. Construct a frequency distribution from these data. Start with the highest score (X)
and proceed down to the lowest score. The fcolumn will indicate the frequency or
how often the score occurred.
b. Calculate the mean, variance, and standard deviation.
c. Construct a line graph using these data. Write a paragraph about the mean and the
pattern demonstrated in the graph.
DATA ANALYSIS
1. Using the State Data Set I and SPSS or Excel, calculate the mean, variance, and
standard deviation for the following index crimes for all 50 states in 1997 (Variable
Names: murder, rape, robbery, assault, and burglary). What can you say about
your results?
2. Using the State Data Set I and SPSS or Excel, calculate the mean, variance, and
standard deviation for the number of Handgun Homicides (Variable Name:
hghomi) in all 50 states. What can you say about your results?
3. With the State Data Set I, use SPSS or Excel to get a frequency distribution and
measures of dispersion (range, variance, and standard deviation) for the following
variables and interpret the results:
a. Number of Juveniles Arrested (under 18), 1997-Variable Name juvarr
b. Number of Adults Arrested, 1997-Variable Name adultarr
c. Number of Juveniles Arrested Violent Crimes (under 18), 1997-Variable Name
juvarrv
d. Number of Juveniles Arrested Property Crimes (under 18), 1997-Variable
Name juvarrp
e. Number of Adults Arrested Property Crimes, 1997-Variable Name adultarrp
4. Using the State Data Set II and SPSS or Excel, calculate the mean, variance, and
standard deviation for the following index crimes for all 50 states in 2003 (Variable
Names: murder, rape, robbery, assault, and burglary). What can you say about
your results?