Ch.1 Data and Statistics1.1 APPLICATIONS IN BUSINESS AND ECONOMICS
Global business and economic environment requires access of vast amounts of statistical information.
Managers and decision makers understand the information and know how to use it effectively. Public
accounting firms use sampling procedures when conducting audits for their clients. Financial analysts
use a variety of statistical information to guide their investment recommendations. Electronic scanners
at retail checkout counters collect data, which can be used for marketing researches. Manufacturers
need such data and their statistical summaries. Quality monitor and control based on statistics are
needed in productions. Economists provide forecasts about the future state of economy using statistical
methods, such as regression or other time series analysis.
1.2 DATA
Data are the facts and figures collected about certain kinds of states in general. All the data collected in
a particular study are referred to as the data set. Following table is a data set of 25 major companies in
the S&P 500 Companies.
Table 1.1 DATA SET FOR 25 MAJOR S&P 500 COMPANIES (from Business Week 2005)
Company
Exchange
Ticker
Abbott Laboratories
Altria Group
Apollo Group
Bank of New York
Bristol-Myers Squibb
Cincinnati Financial
Comcast
Deere
eBay
Federated Dept. Stores
Hasbro
IBM
International Paper
Knight-Ridder
Manor Care
Medtronic
National Semiconductor
Novellus Systems
Pitney Bowes
Pulte Homes
SBC Communications
St. Paul Travelers
Teradyne
United Rental Inc
Wells Fargo
N
N
NQ
N
N
NQ
NQ
N
NQ
N
N
N
N
N
N
N
N
NQ
N
N
N
N
N
N
N
ABT
AG
APOL
BK
BMY
CINF
CMCSA
DE
EBAY
FD
HAS
IBM
IP
KRI
HCR
MDT
NSM
NVLS
PBI
PHM
SBC
STA
TER
URI
WFC
BusinessWeek
Rank
90
148
174
305
346
161
296
36
19
353
373
216
370
397
285
53
155
386
339
12
371
264
412
5
159
Share Price
($)
46
66
74
30
26
45
32
71
43
56
21
93
37
66
34
52
20
30
46
78
24
38
15
91
59
Earnings per
Share ($)
2.02
4.57
0.90
1.85
1.21
2.73
0.43
5.77
0.57
3.86
0.96
4.94
0.98
4.13
1.90
1.79
1.03
1.06
2.05
6.67
1.52
1.53
0.84
8.94
4.09
An observation is the set of measurements obtained for each elements in a data set. The data set in
Table 1.1 contains 25 elements (observations) and includes five variables; Exchange (where the stock
is traded), Ticker Symbol (abbreviation used in stock market), Business Week Rank (measure of company
strength), Share Price, and Earnings per Share. The set of measurement for the first observation (Abbott
Lab) is N, ABT, 90, 46, and 2.02, etc. Share Price and Earnings per Share of the above table are
quantitative variables. Data collection requires scale of measurements; nominal scale, ordinal scale,
interval scale, and ratio scale. (Nominal scale: name or label, ordinal scale: order or rank, interval scale:
expressed by interval, ratio scale: expressed by ratio) Cross-sectional data are data collected at one
epoch, and time series data are data repeatedly collected during a certain time span. Time series data
are often illustrated by a graph. The complete collection of individuals, items, or data under
consideration in a statistical study is referred as the population. Below are a few more examples of
other qualitative variables. Some data can be encoded for digital processing. Blood type may be
recorded as 1, 2, 3 and 4. However, there is no order associated with these, and arithmetic operations
are not performed.
Qualitative variable
Blood type
Marital Status
Type of crime
Color of road sign
Religion
Nominal level
A, B, O, AB
single, married, divorced
misdemeanor, felony
red, white, blue, brown, green
Christian, Buddhist, Moslem, . .
1.3 DATA SOURCES
Data can be obtained from existing sources or from new surveys. Various industry associations and
organizations as well as government agencies collect and keep important data sets. The internet
continues to grow as an important source of data and statistical information.
Table 1.3 Data Available From Selected Government Agencies
Government Agency
Census Bureau
Federal Reserve Board
Office of Management and Budget
Department of Commerce
Bureau of Labor Statistics
Data Available
population data, number of household, . .
data on the money supply, installment credit, . .
data on revenue, expenditure, . .
data on business activity, . .
consumer spending, hourly earnings, . .
Beware of the possibility of data errors!
1.4 DESCRIPTIVE STATISTICS
Certain kinds of data, which can be summarized and presented in table, graph, etc. are descriptive data.
And such summary and study of those data are descriptive statistics.
[Ch. 1 Homework]
[1] Mini sound system, typically including an AM/FM tuner, a CD player, a tape deck, and a USB socket
in a book-sized box with two speakers. The data in Table 1.7 show the retail price, sound quality, and
etc. (Consumer Reports Buying Guide).
How many elements does this data set contain?
How many variables are in the data set?
Which of the variables are quantitative and which are qualitative?
Compute the average price.
What percentage of the mini systems provides FM tuner rating very good or excellent?
What percentage of the mini systems includes both CD player and USB socket?
Table 1.7 Sample of 10 Mini Sound Systems
Brand and Model
Audioquest A800
JVC SD1000
Panasonic PM11
RCA R1283
Sharp BA2600
Magnat ML100
Sony NX5000
LG X200
Samsung Q9
Hansen Prince7
Price ($)
250
190
170
180
150
190
280
160
170
270
Sound
Quality
Good
Good
Very Good
Fair
Good
Very Good
Good
Good
Very Good
Very Good
FM Tuning
Fair
Very Good
Excellent
Very Good
Poor
Good
Very Good
Very Good
Good
Excellent
CD
Player
1
1
1
0
1
1
1
1
1
1
Tape
Deck
0
1
0
1
0
0
0
1
0
0
USB
Socket
1
2
1
2
1
1
2
1
2
2
[2] Annual earnings of an auto company is illustrated in the following graph.
Fig. 1.8 Annual Earnings of Volkswagen
Figure 1.8 provides a bar graph summarizing the earnings of Volkswagen company for the years 1997
to 2005 (Business Week, December 26, 2005). Here are a few questions about the graph.
Are the data qualitative or quantitative?
Are the data cross-sectional or time series?
What is the variable of interest?
Comment on the trend in the Volkswagenβs earnings over time. The Business Week article (Dec 26,
2005) estimated the earnings for 2006 at $0.6 billion. Does Fig. 1.8 indicate if this estimate appears
to be reasonable?
[3] Table 1.1 shows some data of 25 MAJOR S&P 500 COMPANIES. Which one would you regard as
the best company to invest in at the given epoch? And why?
[4] Midterm marks of 10 students in a large class are 70, 73, 78, 75, 63, 80, 72, 88, 74, 77. Which of
the following statements are correct, and which should be challenged as being too generalized?
a. The average midterm mark for the sample of 10 is 75.
b. The average midterm mark of the class is 75.
c. An estimate of the average midterm mark of the class is 75.
d. The average midterm mark of the class is between 72 and 78.
e. If other five students are included in the sample, their marks would be between 63 and 88.
Ch. 2 Descriptive Statistics: Tables and Graphs
2.1 SUMMARIZING QUALITATIVE DATA
Tabular and graphical methods are commonly used to show both qualitative and quantitative data.
Frequency distribution is a tabular summary of data showing the number (frequency) of items in each
of several non-overlapping classes. The relative frequency of a class is the fraction of item belonging
to it.
relative frequency =
πΉππππ’ππππ¦ ππ π‘βπ ππππ π
π (ππ’ππππ ππ πππ πππ£ππ‘πππ)
Following table is an example summarizing soft drink consumption. (note: In science and engineering
the technical term βfrequency fβ is strictly defined as βinverse of period T,β i.e., f =1/T )
Table 2.3 Relative Frequency of Soft Drink Purchases
Soft Drink
Coke Classic
Diet Coke
Dr. Pepper
Pepsi
Sprite
Frequency
Relative
Frequency
19
8
5
13
5
0.38
0.16
0.10
0.26
0.10
Bar graph depicts qualitative data summarized in frequency or relative frequency. Following two
graphs are based on the data in Table 2.3.(drawn using EXCEL)
Soft Drink Consumption
20
15
10
5
0
Pepsi
Dr. Pepper
Diet Coke
Coke Classic
Sprite
Pie chart provides another graphical device presenting relative frequency (percent frequency) for
qualitative data. Following two equivalent pie charts are again based on the same data of the table above.
The sector of Coke Classic consists of 0.38(360) = 136.8 degrees, and that of Pepsi consists of 0.26(360)
= 93.6 degrees, and so on.
Soft Drink Consumption
Coke Classic
Pepsi
Diet Coke
Sprite
Dr. Pepper
Pie Chart of Soft Drink Purchases
Coke Classic
Dr. Pepper
Sprite
Diet Coke
Pepsi
The sum of relative frequencies is always unity; 1.00. Classes with 5% frequency or less may be grouped
into an aggregate class called βothers.β
2.2 SUMMARIZING QUANTITATIVE DATA
Three steps to define the classes for quantitative data are; 1) determine the number of classes, 2)
determine the width of each class, 3) determine the class limit. As an example, for the quantitative data
in Table 2.4, one can summarize them into Table 2.5.
Table 2.4 Audit Times
12
14
19
18
15
15
18
17
20
27
22
23
22
21
33
28
14
18
16
13
Table 2.5 Frequency Distribution of Audit Times
audit times
frequency
10~14
4
15~19
8
20~24
5
25~29
2
30~34
1
Table 2.4 shows the time in days to complete year-end
audits for 20 clients of a small accounting firm.
Table 2.5 shows the frequency distribution of the audit times in Table 2.4. Here, the class width is 5 days and
class midpoints are 12, 17, 22, 27, and 32.
Total
20
Figure 2.4 Histogram for the Audit Time Data
Above is shown a histogram based on Table 2.5. The cumulative frequency distribution is another kind
of tabular summary of quantitative data. Rather than showing the frequency of each class, the number
of data less than upper class limit is given in cumulative frequency distribution. See the next table.
Table 2.7 Cumulative Frequency, Cumulative Relative Frequency,
and Cumulative Percent Frequency Distribution for the Audit Data
cumulative
frequency
cumulative
relative
frequency
cumulative
percent
frequency
L.E. 14
4
0.2
20
L.E. 19
12
0.6
65
L.E. 24
17
0.85
85
L.E. 29
19
0.95
95
L.E. 34
20
1
100
audit times
L.E. = Less than or Equal
Consider a class L.E.(less than or equal) 14 or 19 => corresponding frequency is 4 or 4+8 and so on.
A graph of cumulative distribution, called ogive, shows data values on the horizontal axis and the
cumulative frequency (or cumulative relative frequency) on the vertical axis. Fig. 2.6 illustrates the
ogive for the cumulative frequency data in the former table.
Fig. 2.6 Ogive for the Audit Time Data
2.3 EXPLORATORY DATA ANALYSIS: The Stem and Leaf Display
In a stem and leaf display each value is divided into a stem and a leaf. The leaves for each stem is
shown separately. The stem and leaf diagram preserves the information on individual observations.
Followings are the result of an aptitude test.
Table 2.8 Number of Correct Answers on Aptitude Test
112
73
126
82
92
115
95
84
68
100
72
92
128
104
108
76
141
119
98
85
69
76
118
132
96
91
81
113
115
94
97
86
127
134
100
102
80
98
107
106
107
73
124
83
92
81
106
75
95
119
6(60)
7(70)
8(80)
9(90)
10(100)
11(110)
12(120)
13(130)
14(140)
8
2
0
1
0
2
4
2
1
9
3
1
2
0
3
6
4
3
1
2
2
5
7
5
2
2
4
5
8
6
3
4
6
8
6
4
5
6
9
5
5
6
9
6
6
7
7
8
8
8
Above is given the corresponding stem and leaf display
(stretched). Stem unit =10 is to be multiplied.
2.4 CROSS TABULATIONS AND SCATTER DIAGRAMS
Bar graphs and all other former presentations summarize data for one variable. Cross tabulation and
scatter diagram are two methods showing relationship between two variables. A cross tabulation is a
tabular summary of data for two variables. Following is an example for cross tabulation: quality rating
and meal price for 300 restaurants in Los Angeles. The raw data is only partly shown. Quality classes
are βexcellentβ, βvery goodβ, and βgood,β while meal price varies from 10 to 50 dollars.
Table 2.9 Meal Price & Quality
Index
Number
1
2
3
4
5
Quality
Rating
Good
Very Good
Good
Excellent
Very Good
Meal
Price ($)
297
298
299
300
Good
Good
Very Good
Very Good
16
15
38
31
18
22
28
38
33
Table 2.10 Cross Tabulation of Price & Quality (L.A. Restaurant)
meal price
quality rating
10-19
20-29
30-39
40-49
total
Excellent
2
14
28
22
66
Very Good
34
64
46
6
150
Good
42
40
2
0
84
total
78
118
76
28
300
Cross tabulation is widely used in examining the relationship between two variables. In some cases the
conclusion based on aggregated cross tabulation may be reversed if we look at un-aggregated data. This
is called Simpsonβs paradox.
Scatter diagram is a graphical representation of the relationship between two quantitative variables.
Trend-line is an approximation line of the relationship. In the following figure and table are given a
sample set of two dimensional data.
Fig. 2.8 Scatter Diagram & Trend-line for given X-Y data
x
-22
-33
2
29
-13
21
-13
-23
14
3
-37
34
9
-33
20
-3
-15
12
-20
-7
y
22
49
8
-16
10
-28
27
35
-5
-3
48
-29
-18
31
-16
14
18
17
-11
-22
[Ch.2 Homework]
[1] Based on the data (SoftDrink), make a bar graph and a pie chart for soft drink purchase frequency
(use EXCEL built-in function COUNTIF). Here are 7 kinds of soft drink; Coke Classic, Mountain Dew,
Diet Coke, Pepsi, Gatorade, Dr. Pepper and Sprite.
[2] Based on the data (Scatter), make a scatter diagram of with the P-Q trend line. (use Scatter in Chart
and Add Trend Line etc.) Scatter data set is slightly altered from the one shown in Fig. 2.8.
[3] Make another βCross Tabulation of Price & Quality of Restaurantβ using modified data (Restaurant),
which now include more observations with one more quality rating as βPoor.β Use Pivot Table and
associated. Set the meal price range with $5 interval; 5-9, 10-14, 15-19, . . You may add a relevant
comment on the cross tabulation.
Ch. 3 Descriptive Statistics: Numerical Measures II
3.4 FIVE NUMBER SUMMARY AND BOX PLOT
Five-Number Summary can briefly describe the distribution of dataset. The five numbers used are;
1) smallest value, 2) first quartile Q1, 3) median Q2, 4) third quartile Q3, and 5) largest value.
Box Plot is a graphical summary of data based on the five-number summary.
Fig. 3.5 Box plot of the salary data (dotted line is called whisker)
3.5 ASSOCIATION BETWEEN TWO VARIABLES
Covariance is a measure of association between two variables. Given a dataset of two variables π₯π
and π¦π ; (π₯1 , π¦1 ), (π₯2 , π¦2 ), . . ., and (π₯π , π¦π ), then the covariance is defined as;
Covariance π π₯π¦ =
β(π₯π βπ₯Μ
)(π¦π βπ¦Μ
)
πβ1
Below a set of two-variable data are shown together with the deviations, their squares and products.
Table 3.8: Data for TV commercial ππ and Sales ππ
π₯π
π₯π β π₯Μ
π¦π
π¦π β π¦Μ
(π₯π β π₯Μ
)2 (π¦π β π¦Μ
)2
2
-1
50
-1
1
1
5
2
57
6
4
36
1
-2
41
-10
4
100
3
0
54
3
0
9
4
1
54
3
1
9
1
-2
38
-13
4
169
5
2
63
12
4
144
3
0
48
-3
0
9
4
1
59
8
1
64
2
-1
46
-5
1
25
mean
3
mean
51
total
20
total
566
(π₯π β π₯Μ
)(π¦π β π¦Μ
)
1
12
20
0
3
26
24
0
8
5
total
99
The variance of π₯π is found as (π π₯ )2 = 20/9 , while the variance of π¦π is found as (π π¦ )2 = 566/9.
99
The covariance of π₯π and π¦π is found as π π₯π¦ = 9 = 11.
In fact this two variable data set is about a case of audio sales. And they are composed of data of
weekend TV commercial – π₯π and the sales during the following week – π¦π . The graph of this data set
is shown in Fig. 3.7.
Fig. 3.7 Relation between Audio TV Commercials and Sales
Fig. 3.9 Two-dim data and covariance
Each three graphs in Fig. 3.9 illustrate; 1) positive covariance,
2) zero covariance, and 3) negative covariance respectively (from
top to bottom). Since the covariance of audio TV commercials
and sales is positive as π π₯π¦ = 11, one can presume that TV
commercials do promote sales – at least, for audio.
Correlation Coefficient is defined as;
ππ₯π¦ =
π π₯π¦
π π₯ π π¦
For the above TV commercial and sales data, the correlation
coefficient is found as ππ₯π¦ =
11
β20/9 β566/9
= 0.93. This implies
quite strong association, since ππ₯π¦ = 1 for perfect correlation.
ππ₯π¦ = +1 :
perfect positive linear relationship
ππ₯π¦ = β1 :
perfect negative linear relationship
3.6 WEIGHTED MEAN AND GROUPED DATA
If each observation is not of equal importance, then proper weighting can be taken in estimation of
mean;
weighted mean; π₯Μ
=
π€1 π₯1 +π€2 π₯2 + . . .+π€π
π€1 +π€2 + . . .+π€π
=
β π€π π₯π
β π€π
As an example, consider the following sample of five purchases (cost: price per 1 kg in dollars).
purchase
1
2
3
4
5
cost
6.05
6.80
5.60
5.95
6.55
mass (kg)
1200
500
2750
1000
800
Simple average cost per 1 kg is $6.19. However, the weighted mean is found as $5.96.
simple mean cost; (6.05 + 6.80 + 5.60 + 5.95 + 6.55)β5 = 6.19
total purchase amount; 1200 + 500 + 2750 + 1000 + 800 = 6250 (kg)
weighted mean cost;
(1200 Γ 6.05 + 500 Γ 6.80 + 2750 Γ 5.60 + 1000 Γ 5.95 + 800 Γ 6.55)β6250 = 5.96
For grouped data, the mean value can be acquired as;
π₯Μ
=
π1 π1 +π2 π2 +. . .
π1 +π2 + . . .
=
β ππ ππ
π
where ππ is frequency of class i and ππ is midpoint of class i. (π = β ππ : sum of frequencies)
Variance of grouped data is acquired as;
π 2 =
β ππ (ππ βπ₯Μ
)2
πβ1
As an example, we calculate mean and variance for the audit data in two ways (Table 2.4;
ungrouped, Table 2.5; grouped data). By using the two formulae above, for the grouped data, we find
the mean value and the standard deviation as 19 and 5.48 respectively. While for the ungrouped data
of each 20 clients, we find the mean and the standard deviation as 19.25 and 5.44. For very large
population, it is evident that grouped data are preferable.
Table 2.5 Frequency Distribution of Audit Times (extended)
Μ
audit days
π΄π
ππ
ππ π΄π
π΄π β π
10-14
12
4
48
-7
15-19
17
8
136
-2
20-24
22
5
110
3
25-29
27
2
54
8
30-34
32
1
32
13
sum =>
20
380
π₯Μ
=
β ππ ππ
β ππ (ππ β π₯Μ
)2 = 570 , π 2 =
π
Μ
)π
(π΄π β π
49
4
9
64
169
Μ
)π
ππ (π΄π β π
196
32
45
128
169
570
380
= 20 = 19
β ππ (ππ βπ₯Μ
)2
πβ1
= 30 ,
π = β30 = 5.477
[Ch. 3(II) Homework]
[1] Automobiles traveling on a highway were checked for speed by police
radar system. In the table (right) are shown the speed data acquired during 20
minutes (total: 689 automobiles checked). Make a bar graph. What is the mean
speed? Find the variance and the standard deviation.
price
65
72
75
77
78
81
83
87
95
capacity
851
1045
1115
1223
1257
1375
1388
1534
1670
[2] Road & Track provided a sample of tire ratings (in
dollars) and load-carrying capacities (in kg) of tires (left
table). Develop a scatter diagram for the data (tire price on
x-axis). Calculate the correlation coefficient, and add short
comment about the relationship between tire price and loadcarrying capacity.
speed(km/h)
65-70
70-75
75-80
80-85
85-90
90-95
95-100
100-105
105-110
110-115
115-120
120-125
125-130
freq.
5
16
29
57
79
105
113
112
93
52
23
4
1
[3] The motion picture industry is a competitive business. More than 50 studios produce a total 300 to
400 new motion pictures each year, and the financial success of each motion picture varies
considerably. The opening weekend gross sales, the total gross sales, the number of theaters the movie
was shown in, and the number of weeks the motion picture was in the top 60 are common variables
used to measure the success of a motion picture. Based on the given dataset, write a short βManagerial
Reportβ using descriptive statistics including the relationships between total gross sales and other
variables. (Use EXCEL built-in function CORREL)
(head part of the dataset)
Motion Picture
Coach Carter
Ladies in Lavender
Batman Begins
Unleashed
Pretty Persuasion
Fever Pitch
Harry Potter and the Goblet of Fire
…
Opening Gross
($million)
29.17
0.15
48.75
10.90
0.06
12.40
102.69
…
Total Gross
($million)
67.25
6.65
205.28
24.47
0.23
42.01
287.18
…
Number of
Theaters
2,574
119
3,858
1,962
24
3,275
3,858
…
Weeks in
Top 60
16
22
18
8
4
14
13
..
[4] Check again for yourself about the mean value and standard deviation for the ungrouped / grouped
audit data of Ch. 2. Show your calculation procedures.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Achiever Papers is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Dissertation Writing Service Works
First, you will need to complete an order form. It's not difficult but, if anything is unclear, you may always chat with us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order form
Once we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writerβs assignment
As soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download