You are on page 1of 4

CHAPTER 1

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. A statistician is
an expert with at least a master’s degree in mathematics or statistics, while a data analyst is anyone who works
with data. Descriptive statistics is the collection, organization, presentation, and summary of data with charts
or numerical summaries. Inferential statistics refers to generalizing from a sample to a population, estimating
unknown parameters, drawing conclusions, and making decisions. Statistics is used in all branches of business.
Statistical challenges include imperfect data, practical constraints, and ethical dilemmas. Effective technical
report writing requires attention to style, grammar, organization, and proper use of tables and graphs. Business
data analysts must learn to write a good executive summary and learn the 3 Ps for oral presentations: pace,
planning, and practice. Statistical tools are used to test theories against empirical data. Pitfalls include
nonrandom samples, incorrect sample size, and lack of causal links. The field of statistics is relatively new and
continues to grow as mathematical frontiers expand.
1. Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. TRUE
2. Inferential statistics refers to generalizing from a sample to a population, estimating unknown
parameters, drawing conclusions, and making decisions. TRUE
3. Descriptive statistics refers to the collection, organization, presentation, and summary of data. TRUE
4. Using graphs and data to give authority to poor data is an example of statistical generalization. FALSE
5. A strong correlation between A and B would suggest that B must be caused by A) FALSE
6. A statistical test my be significant yet have no practical importance. TRUE
7. To protect professional integrity, any data analyst must know and follow accepted procedures, maintain
data integrity, carry out accurate calculations, report procedures faithfully, protect confidential
information, cite sources, and acknowledge sources of financial support. TRUE
8. In preparing an oral statistical presentation the 3 P's refer to Pace, Planning and Performance. FALSE

CHAPTER 2
A data set is an array with n rows and m columns. Data sets may be univariate (one variable), bivariate (two
variables), or multivariate (three or more variables). There are two basic data types: attribute data (categories
that are described by labels) or numerical (meaningful numbers). Numerical data are discrete if the values are
integers or can be counted or continuous if any interval can contain more data values. Nominal measurements
are names, ordinal measurements are ranks, interval measurements have meaningful distances between data
values, and ratio measurements have meaningful ratios and a zero reference point. Time series data are
observations measured at n different points in time or over sequential time intervals, while cross-sectional data
are observations among n entities such as individuals, firms, or geographic regions. Among probability
samples, simple random samples pick items from a list using random numbers, systematic samples take every
kth item, cluster samples select geographic regions, and stratified samples take into account known population
proportions. Nonprobability samples include convenience or judgment samples, gaining time but sacrificing
randomness. Survey design requires attention to question wording and scale definitions. Survey techniques
(mail, telephone, interview, Web, direct observation) depend on time, budget, and the nature of the questions
and are subject to various sources of error.
1. Attribute data have values that are described by words rather than numbers. TRUE
2. Numerical data can be either discrete or continuous. TRUE
3. The number of checks processed at a bank in a day is an example of attribute data. FALSE
4. The weight of a bag of dog food is an example of discrete data. FALSE
5. Nominal data refer to data that can be categorized and ordered. FALSE
6. Temperature measured in degrees Fahrenheit is an example of interval data. TRUE
7. Ordinal data are data that can be ranked. TRUE
8. Generally researchers would prefer sample data rather than census data in describing some population of
interest. FALSE
9. Sample bias is a result of non-randomness in a sample. TRUE
10. A sampling frame is used to help identify the target population in a statistical study. TRUE
11. Internet surveys posted on popular websites such as MSN.com rely on convenience sampling. TRUE
12. Analysis of stock market prices during the depression would require the use of time series data. TRUE
CHAPTER 3
For a set of observations on a single numerical variable, a dot plot displays the individual data values, while a
frequency distribution classifies the data into classes called bins for a histogram of frequencies for each bin.
The number of bins and their limits are matters left to your judgment, though Sturges’s Rule offers advice on
the number of bins. The line chart shows values of one or more time series variables plotted against time. A log
scale is sometimes used in time series charts when data vary by orders of magnitude. The bar chart shows a
numerical data value for each category of an attribute. However, a bar chart can also be used for a time series.
A scatter plot can reveal the association (or lack of association) between two variables X and Y. The pie chart
(showing a numerical data value for each category of an attribute if the data values are parts of a whole) is
common but should be used with caution. Sometimes a simple table is the best visual display. Creating effective
visual displays is an acquired skill. Excel offers a wide range of charts from which to choose. Deceptive graphs
are found frequently in both media and business presentations, and the consumer should be aware of common
errors.
1. The Pareto chart is used to display the "vital few" causes of problems. TRUE
2. Dot plots are similar to histograms with many bins (classes). TRUE
3. Sturges' Rule is not an ironclad requirement, but merely a suggestion. TRUE
4. Pie charts can be useful in describing attribute data. TRUE
5. Scatter plots are widely used in business, education, and science. TRUE
6. A data set with two values that are tied for the highest number of occurrences is called bimodal. TRUE
7. Frequency histograms must have equal bin widths in order to avoid visual distortion of the data TRUE
8. A scatter plot is useful in visualizing trends in time series data. FALSE
CHAPTER 4
The mean and median describe a sample’s central tendency and also indicate skewness. The mode is useful for
discrete data with a small range. The trimmed mean eliminates extreme values. The geometric mean mitigates
high extremes but fails when zeros or negative values are present. The midrange is easy to calculate but is
sensitive to extremes. Dispersion is typically measured by the standard deviation while relative dispersion is
given by the coefficient of variation for nonnegative data. Standardized data reveal outliers or unusual data
values, and the Empirical Rule offers a comparison with a normal distribution. In measuring dispersion, the
mean absolute deviation or MAD is easy to understand, but lacks nice mathematical properties. Quartiles are
meaningful even for fairly small data sets, while percentiles are used only for large data sets. Box plots show
the quartiles and data range. We can estimate many common descriptive statistics from grouped data. Sample
coefficients of skewness and kurtosis allow more precise inferences about the shape of the population being
sampled instead of relying on histograms.
1. The midrange is very sensitive to outliers. TRUE
2. A trimmed mean may be preferable to a mean when a data set has some extreme values. TRUE
3. Given the data set 10, 5, 2, 6, 3, 4, 20, the median value is 5. TRUE
4. When data are right-skewed, we expect the median to be greater than the mean FALSE
5. If there are 19 data values, the median will have 10 values above it and 9 below it because n is odd.
FALSE
6. If the standard deviations of two samples are the same, so will be their coefficients of variation. FALSE
7. A certain Health Maintenance Organization (HMO) examined the number of office visits by its
members in the last year.
This data would probably be skewed to the left due to low outliers. FALSE
8. Skewness and kurtosis are both measures of a distribution's dispersion. FALSE
9. The coefficient of variation is useful to compare data set with dissimilar units of measurements. TRUE
10. Typically, outliers are any data values which fall beyond ±2 standard deviations of the mean. FALSE
11. When applying the Empirical Rule to a distribution of grades, if a student scored one standard deviation
below the mean she would be at the 25th percentile of the distribution. FALSE
12. A leptokurtic distribution is more sharply peaked (i.e. thinner tails) than a normal distribution. TRUE
CHAPTER 5
The sample space for a random experiment describes all possible outcomes. Simple events in a discrete sample
space can be enumerated, while outcomes of a continuous sample space can only be described by a rule. An
empirical probability is based on relative frequencies, a classical probability can be deduced from the nature of
the experiment, and a subjective probability is based on judgment. An event’s complement is every outcome
except the event. The odds are the ratio of an event’s probability to the probability of its complement. The
union of two events is all outcomes in either or both, while the intersection is only those events in both.
Mutually exclusive events cannot both occur, and collectively exhaustive events cover all possibilities.
Dichotomous or polytomous events are mutually exclusive and collectively exhaustive. The conditional
probability of an event is its probability given that another event has occurred. Two events are independent if
the conditional probability of one is the same as its unconditional probability. The joint probability of
independent events is the product of their probabilities. A contingency table is a cross-tabulation of frequencies
for two variables with categorical outcomes and can be used to calculate probabilities. A tree visualizes events
in a sequential diagram. Bayes’s Theorem shows how to revise a prior probability to obtain a conditional or
posterior probability when another event’s occurrence is known. The number of arrangements of sampled items
drawn from a population is found with the formula for permutations (if order is important) or combinations (if
order does not matter).
1. The sum of all the probabilities of simple events in a sample space equals one. TRUE
2. The probability of an event will always be a value greater than zero, but less than one. FALSE
3. The union of two events A and B is the event consisting of all outcomes in the sample space that are
contained in both event A and event B. FALSE
4. The general law of addition for probabilities says P(A U B) = P(A) + P(B) – P(A ∩ B) TRUE
5. Two events A and B are independent only if P(A | B) is the same as P(A). TRUE
6. For any event A, the probability of A is 0 ≤ P(A) ≤ 1. TRUE
7. If events A and B are mutually exclusive, the joint probability of the events is zero. TRUE
8. The probability of A and its complement (A') will always sum to one. TRUE
9. If P(A) = 0.50 and P(B) = 0.30 and P(A ∩ B) = 0.15, then A and B independent are events. TRUE
10. When two or more events can occur at the same time, they are said to be mutually exclusive. FALSE
11. The probability of events A or B occurring can be found by summing the probabilities of the individual
events. FALSE
12. A contingency table is a cross-tabulation of frequencies for two variables with categorical outcomes,
and can be used to calculate probabilities. TRUE
CHAPTER 6
A random variable assigns a numerical value to each outcome in the sample space of a stochastic process. A
discrete random variable has a countable number of distinct values. Probabilities in a discrete probability
distribution must be between zero and one, and must sum to one. The expected value is the mean of the
distribution, measuring central tendency, and its variance is a measure of dispersion. A known distribution is
described by its parameters, which imply its probability distribution function (PDF) and its cumulative
distribution function (CDF).
As summarized in Table 6.14 the uniform distribution has two parameters (a, b) that define its range a ≤ X ≤ b.
The Bernoulli distribution has one parameter (π, the probability of success) and two outcomes (0 or 1). The
binomial distribution has two parameters (n, π). It describes the sum of n independent Bernoulli random
experiments with constant probability of success. It may be skewed left (π > .50) or right (π < .50) or symmetric
(π = .50) but becomes less skewed as n increases. The Poisson distribution has one parameter (λ, the mean
arrival rate). It describes arrivals of independent events per unit of time or space. It is always right-skewed,
becoming less so as λ increases. The hypergeometric distribution has three parameters (N,n,s). It is like a
binomial, except that sampling of n items is without replacement from a finite population of N items containing
s successes. The geometric distribution is a one-parameter model (π, the probability of success) that describes
the number of trials until the first success. Figure 6.33 shows the relationships among these five discrete
models.
1. A discrete random variable has a countable number of distinct values. TRUE
2. The expected value of a discrete random variable E(X), is the sum of all X values weighted by their
respective probabilities. TRUE
3. The outcomes from the roll of one die can be described as a discrete uniform distribution with μ= 3.5
and s = 2.5. FALSE
4. When π = 0.7 the discrete binomial distribution is negatively skewed. TRUE
5. The hypergeometric distribution assumes that the probability of a success remains the same from one
trial to the next. FALSE
6. Although the shape of the Poisson distribution is positively skewed, it becomes more nearly
symmetrical if its mean becomes larger. TRUE
7. The Poisson distribution describes the number of occurrences within a randomly-chosen unit of time or
space. TRUE

You might also like