The term inference refers to a key concept in statistics in which we draw a
conclusion from available evidence.
The purpose of descriptive statistics is to summarize or display data so we can quickly obtain an overview. Inferential statistics allows us to make claims or conclusions about a population based on a sample of data from that population. A population represents all possible outcomes or measurements of interest. A sample is a subset of a population. We use the term population in statistics to represent all possible measurements or outcomes that are of interest to us in a particular study. The term sample refers to a portion of the population that is representative of the population from which it was selected. Data is simply defned as the value assined to a specifc observation or measurement. !ata that is used to describe somethin of interest about a population is called a parameter. "or instance# let$s say that the population of interest is my wife$s three%year%old preschool class and my measurement of interest is how many times the little urchins use the bathroom in a day. &f we averae the number of trips per child# this fure would be considered a parameter because the entire population was measured. 'owever# if we want to make a statement about the averae number of bathroom trips per day per three% year%old in the country# then !ebbie$s class could be our sample. We can consider the averae that we observe from her class a statistic if we assume it could be used to estimate all three year%olds in the country. !ata that describes a characteristic about a population is known as a parameter. !ata that describes a characteristic about a sample is known as a statistic. Information is data that is transformed into useful facts that can be used for a specifc purpose# such as makin a decision. We classify the sources of data into two broad cateories( primary and secondary. )ou can obtain primary data in many ways# such as direct observation# surveys# and e*periments. Direct observation( "ocus roups are a direct observational technique where the sub+ects are aware that data is bein collected. ,usinesses use focus roups to ather information in a roup settin controlled by a moderator. The sub+ects are usually paid for their time and are asked to comment on specifc topics. Experiments: This method is more direct than observation because the sub+ects will participate in an e*periment desined to determine the e-ectiveness of a treatment. An e*ample of a treatment could be the use of a new medical dru. Two roups would be established. The frst is the e*perimental roup who receive the new dru# and the second is the control roup who think they are ettin the new dru but are in fact ettin no medication. The reactions from each roup are measured and compared to determine whether the new dru was e-ective. The beneft of e*periments is that they allow the statistician to control factors that could in.uence the results# such as ender# ae# and education of the participants. The concern about collectin data throuh e*periments is that the response of the sub+ects miht be in.uenced by the fact that they are participatin in a study. The desin of e*periments for a statistical study is a very comple* topic and oes beyond the scope of this book. Surveys( This technique of data collection involves directly askin the sub+ect a series of questions. The questionnaire needs to be carefully desined to avoid any bias or confusion for those participatin. /oncerns also e*ist about the in.uence the survey will have on the participant$s responses. 0esearch has shown that the manner in which the questions are asked can a-ect the responses a person provides on a questionnaire. A question posed in a positive tone will tend to invoke a more positive response and vice versa. A ood stratey is to test your questionnaire with a small roup of people before releasin it to the eneral public. Another way to classify data is by one of two types( quantitative or qualitative. Types of measurement scales: A nominal level of measurement deals strictly with qualitative data. 1bservations are simply assined to predetermined cateories. 1ne e*ample is ender of the respondent# with the cateories bein male and female. This data type does not allow us to perform any mathematical operations# such as addin or multiplyin. We also cannot rankorder this list in any way from hihest to lowest. This type is considered the lowest level of data and# as a result# is the most restrictive when choosin a statistical technique to use for the analysis. )ou can use numbers at the nominal level of measurement. 2ven in this case# the rules of the nominal scale still remain. An e*ample would be zip codes or telephone numbers# which can$t be added or placed in a meaninful order of reater than or less than. 2ven thouh the data appears to be numbers# it$s handled +ust like qualitative data. 1n the food chain of data# ordinal is the ne*t level up. &t has all the properties of nominal data with the added feature that we can rank%order the values from hihest to lowest. An e*ample is if you were to have a lawnmower race. 3et$s say the fnishin order was 4cott# Tom# and ,ob. We still can$t perform mathematical operations on this data# but we can say that 4cott$s lawnmower was faster than ,ob$s. 'owever# we cannot say how much faster. 1rdinal data does not allow us to make measurements between the cateories and to say# for instance# that 4cott$s lawnmower is twice as ood as ,ob$s 5it$s not6. 1rdinal data can be either qualitative or quantitative. An e*ample of quantitative data is ratin movies with 7# 8# 9# or : stars. 'owever# we still may not claim that a :%star movie is : times as ood as a 7%star movie. ;ovin up the scale of data# we fnd ourselves at the interval level# which is strictly quantitative data. <ow we can et to work with the mathematical operations of addition and subtraction when comparin values. "or this data# we can measure the di-erence between the di-erent cateories with actual numbers and also provide meaninful information. Temperature measurement in derees "ahrenheit is a common e*ample here. "or instance# => derees is ? derees warmer than @? derees. 'owever# multiplication and division can$t be performed on this data. Why notA 4imply because we cannot arue that 7>> derees is twice as warm as ?> derees. The kin of data types is the ratio level. <ow we can perform all four mathematical operations to compare values with absolutely no feelins of uilt. 2*amples of this type of data are ae# weiht# heiht# and salary. 0atio data has all the features of interval data with the added beneft of a true > point. The term true zero point means that a > data value indicates the absence of the ob+ect bein measured. "or instance# > salary indicates the absence of any salary. The distinction between interval and ratio data is a fne line. To help identify the proper scale# use the twice as much rule. &f the phrase twice as much accurately describes the relationship between two values that di-er by a multiple of 8# then the data can be considered ratio level. &nterval data does not have a true > point. "or e*ample# > derees "ahrenheit does not represent the absence of temperature# even thouh it may feel like it. Frequency distributions is simply a table that oranizes the number of data values into intervals. The intervals in a frequency distribution are o-icially known as classes# and the number of observations in each class is known as class frequencies. /onstructin a frequency distribution( % from classes of equal size. % make classes mutually e*clusive# or in other words# prevent classes from overlappin. % try to have no fewer than ? classes and no more than 7? classes % avoid open%ended classes# if possible 5for instance# a hihest class of 7?Bover6. % include all data values from the oriinal table in a class. &n other words# the classes should be e*haustive. Relative Frequency Distribution 0ather than display the number of observations in each class# this method calculates the percentae of observations in each class by dividin the frequency of each class by the total number of observations. Cumulative Frequency Distribution /umulative frequency distributions indicate the percentae of observations that are less than or equal to the current class. &t totals the percentaes of each class as you move down the column. Cohn used his phone D times or less on D: percent of the days in the month. rap!in" a Frequency Distribution# t!e $isto"ram A historam is simply a bar raph showin the number of observations in each class as the heiht of each bar. % the frst thin we need to do is open 2*cel to a blank sheet and enter our data in /olumn A startin in /ell A7. % ne*t enter the upper limits to each class in /olumn , startin in /ell ,7. % o to the Tools menu at the top of the 2*cel window and select !ata Analysis. % The /hart Wizard allows me more control over the fnal appearance. Statistical Flo%er &o%er# t!e Stem and 'eaf Display The ma+or beneft of this approach is that all the oriinal data points are visible on the display.
The stem in the display is the frst column of numbers# which represents the frst diit of the olf scores. The leaf in the display is the second diit of the olf scores# with 7 diit for each score. ,ecause there were ? scores in the =>s# there are ? diits to the riht of =. 'ere# the stem labeled = 5?6 stores all the scores between =? and =E. The stem D 5>6 stores all the scores between D> and D:. C!artin" a Frequency Distribution (ar C!arts ,ar charts are a useful raphical tool when you are plottin individual data values ne*t to each other. The historam that we visited earlier in the chapter is actually a special type of bar chart that plots frequencies rather than actual data values. 'ow do & choose between a pie chart and a bar chartA &f your ob+ective is to compare the relative size of each class to one another# use a pie chart. ,ar charts are more useful when you want to hihliht the actual data values. 'ine C!arts is used to help identify patterns between two sets of data. 3ine charts prove very useful when you are interested in e*plorin patterns between two di-erent types of data. They are also helpful when you have many data points and want to show all of them on one raph. ,ecause the line connectin the data points seems to have an overall upward trend# my suspicions hold true. &t seems the more showers our waterloed darlins take# the hiher the utility bill. )easures of Central Tendency There e*ist two broad cateories of descriptive statistics that are commonly used. The frst# measures of central tendency# describes the center point of our data set with a sinle value. &t$s a valuable tool to help us summarize many pieces of data with one number. The second cateory# measures of dispersion describe how far individual data values have strayed from the mean. The mean or avera"e is the most common measure of central tendency and is calculated by addin all the values in our data set and then dividin this result by the number of observations. A %ei"!ted mean allows you to assin more weiht to certain values and less weiht to others. )ean of rouped Data from a Frequency Distribution * e*ample(
b The mean of a frequency distribution where data is rouped into classes is only an appro*imation to the mean of the oriinal data set from which it was derived. This is true because we make the assumption that the oriinal data values are at the midpoint of each class# which is not necessarily the case. The true mean of the 9> oriinal data values in the cell phone e*ample is only :.? calls per day rather than :.@. The median is the value in the data set for which half the observations are hiher and half the observations are lower. We fnd the median by arranin the data values in ascendin order and identifyin the halfway point. When there is an even number of data points# the median will be the averae of the two center points. Fsin our e*ample with the video ames# we rearrane our data set in ascendin order( 9 : : : ? @ = = E 7= Accordin to the mean of this frequency distribution# Cohn averaes :.@ calls per day on his cell phone. ,ecause we have an even number of data points 57>6# the median is the averae of the two center points. &n this case# that will be the values ? and @# resultin in a median of ?.? hours of video ames per week. <otice that there are four data values to the left 59# :# :# and :6 of these center points and four data values to the riht 5=# =# E# and 7=6. The mode is simply the observation in the data set that occurs the most frequently. &f you think all the data in your data set is relevant# then the mean is your best choice. This measurement is a-ected by both the number and manitude of your values. 'owever# very small or very lare values can have a sinifcant impact on the mean# especially if the size of the sample is small. &f this is a concern# perhaps you should consider usin the median. The median is not as sensitive to a very lare or small value. /onsider the followin data set from the oriinal video ame e*ample(9 : : : ? @ = = E 7= The number 7= is rather lare when compared to the rest of the data. The mean of this sample was @.@# whereas the median was ?.?. &f you think 7= is not a typical value that you would e*pect in this data set# the median would be your best choice for central tendency. The poor lonely mode has limited applications. &t is primarily used to describe data at the nominal scaleGthat is# data that is rouped in descriptive cateories such as ender. &f @> percent of our survey respondents were male# then the mode of our data would be male. "rom !ata Analysis% !escriptive 4tatistics( mean# median# mode. )easures of Dispersion Ran"e is the simplest measure of dispersion and is calculated by fndin the di-erence between the hihest value and the lowest value in the data set. = E D 77 : % rane H 77 B : H = 'owever# the limitation is that it only relies on two data points to describe the variation in the sample. <o other values between the hihest and lowest points are part of the rane calculation. +ariance summarizes the squared deviation of each data value from the mean. The variance is a measure of dispersion that describes the relative distance between the data points in the set and the mean of the data set. This measure is widely used in inferential statistics.
The frst step in calculatin the variance is to determine the mean of the data set. The rest of the calculations can be facilitated by the followin table. The fnal sample variance calculation becomes this( s8H 8@#DI ?%7. ,sin" t!e Ra% Score )et!od is a more e-icient way to calculate the variance of a data set. s8H 5the sum of each data value after it has been squared% the square of the sum of all the data values6I n%7 T!e +ariance of a &opulation Standard deviation is simply the square root of the variance. Cust as with the variance# there is a standard deviation for both the sample and population. To calculate the standard deviation# you must frst calculate the variance and then take the square root of the result. The standard deviation is actually a more useful measure than the variance because the standard deviation is in the units of the oriinal data set. Calculatin" t!e Standard Deviation of rouped Data T!e Empirical Rule: %or-in" %it! Standard Deviation The values of many lare data sets tend to cluster around the mean or median so that the data distribution in the historam resembles a bell%shape# symmetrical curve. When this is the case# the empirical rule tells us that appro*imately @D percent of the data values will be within one standard deviation from the mean. "or e*ample# suppose that the averae e*am score for my lare statistics class is DD points and the standard deviation is :.> points and that the distribution of rades is bell%shape around the mean. ,ecause one standard deviation above the mean would be E8 5DD J :6 and one standard deviation below the mean would be D: 5DD B :6# the empirical rule tells me that appro*imately @D percent of the e*am scores will fall between D: and E8 points. Accordin to the empirical rule# if a distribution follows a bellshapeGa symmetrical curve centered around the meanGwe would e*pect appro*imately @D# E?# and EE.= percent of the values to fall within one# two# and three standard deviations around the mean respectively. &n eneral# we can use the followin equation to e*press the rane of values within k standard deviations around the mean( KJI% k L. C!ebys!ev.s T!eorem /hebyshev$s theorem is a mathematical rule similar to the empirical rule e*cept that it applies to any distribution rather than +ust bell%shape# symmetrical distributions. /hebyshev$s theorem states that for any number k reater than 7# at least 57 B 7Ik 8 6* 7>> percent of the values will fall within k standard deviations from the mean. Fsin this equation# we can state the followin( % at least =? percent of the data values will fall within two standard deviations from the mean by settin k H 8 into /hebyshev$s equation. % at least DD.E percent of the data values will fall within three standard deviations from the mean by settin k H 9 into the equation. % at least E9.= percent of the data values will fall within four standard deviations from the mean by settin k H : into the equation. 2*ample( This table supports /hebyshev$s theorem# which predicts that at least =? percent of the values will fall within two standard deviations from the mean. "rom the data set# we can observe that E? percent actually fall between 8>.9 and :E.7 home runs 59D out of :>6. The same e*planation holds true for three and four standard deviations around the mean. )easures of Relative &osibtion describe the percentae of the data below a certain point. /uartiles divide the data set into four equal sements after it has been arraned in ascendin order. Appro*imately 8? percent of the data points will fall below the frst quartile# M7. Appro*imately ?> percent of the data points will fall below the second quartile# M8. And# you uessed it# =? percent should fall below the third quartile# M9. 76 4tep 7( Arrane your data in ascendin order. 86 4tep 8( "ind the median of the data set. This is M8. 96 4tep 9( "ind the median of the lower half of the data set 5in parenthesis6. This is M7. :6 4tep :( "ind the median of the upper half of the data set 5in parenthesis6. This is M9. Interquartile ran"e % the &M0 measures the spread of the center half of our data set. &t is simply the di-erence between the third and frst quartiles# as follows( &M0 H M9 B M7. The interquartile rane is used to identify outliers# which are the black sheep of our data set. These are e*treme values whose accuracy is questioned and can cause unwanted distortions in statistical results. Any values that are more than( M9 J 7.?&M0 or less than( M7 B 7.?&M0 should be discarded. 2*ample( 7> :8 :? :@ ?7 ?8 ?D =9 4ince there are eiht data values# M7 will be the median of the frst four values 5the midpoint between the second and third values6. M7H 5:8J:?6I8H :9.? 3ikewise# M9 will be the median of the last four values 5the midpoint between the si*th and seventh values6. M8H 5?8J?D6I8H ?@. &0M H M9% M7H ?@% :9.?H 78.? Any values reater than M9 J 7.? &0MH =:.=? or less than M7% 7.? &0MH 8:.=? should be considered an outliner# therefore the value 7> would be an outliner in this data set. The values for variance and standard deviation reported by 2*cel are for a sample. &f your data set represents a population# you need to recalculate the results usin N in the denominator rather than n B 7. &robability topics Experiment. The process of measurin or observin an activity for the purpose of collectin data. An e*ample is rollin a pair of dice. 0utcome. A particular result of an e*periment. An e*ample is rollin a pair of threes with the dice. Sample space. All the possible outcomes of the e*periment. The sample space for our e*periment is the numbers N8# 9# :# ?# @# =# D# E# 7># 77# and 78O. 4tatistics people like to put NO around the sample space values Event. 1ne or more outcomes that are of interest for the e*periment and which isIare a subset of the sample space. An e*ample is rollin a total of 8# 9# :# or ? with two dice. Classical &robability refers to a situation when we know the number of possible outcomes of the event of interest and can calculate the probability of that event with the followin equation( PQARH <umber of possible outcomes in which 2vent A occursI Total number of possible outcomes in the sample space. Empirical &robability % when we don$t know enouh about the underlyin process to determine the number of outcomes associated with an event. This type of probability observes the number of occurrences of an event throuh an e*periment and calculates the probability from a relative frequency distribution. PQARH "requency in which 2vent A occursI Total number of observations. 1ne e*ample of empirical probability is to answer the ae%old question What is the probability that Cohn will et out of bed in the mornin for school after his frst wake% up callA ,ased on these observations# if 2vent A H Cohn ettin out of bed on the frst wake%up call# then PQAR H >.7? Fsin the previous table# we can also e*amine the probability of other events. 3et$s say 2vent , H Cohn requirin more than 8 wake%up calls to et out of bedS then PQ,R H>.:> J >.8? H >.@?. &f & choose to run another 8>%day e*periment of Cohn$s wakin behavior# & would most likely see di-erent results than those in the previous table. 'owever# if & were to observe 7>> days of this data# the relative frequencies would approach the true or classical probabilities of the underlyin process. This pattern is known as the law of lare numbers. The law of lare numbers states that when an e*periment is conducted a lare number of times# the empirical probabilities of the process will convere to the classical probabilities. Sub1ective probability We use sub+ective probability when classical and empirical probabilities are not available. Fnder these circumstances# we rely on e*perience and intuition to estimate the probabilities. (asic &roperties of &robability * one event &f PQAR H 7# then 2vent A must occur with certainty. &f PQAR H ># then 2vent A will not occur with certainty. The probability of 2vent A must be between > and 7. The sum of all the probabilities for the events in the sample space must be equal to 7. The complement to 2vent A is defned as all the outcomes in the sample space that are not part of 2vent A and is denoted as A$. Fsin this defnition# we can state the followin( PQAR J PQA$R H 7 or PQAR H 7 B PQA$R. T!e Intersection of Events 2*ample( <ow that my children are older and livin away from home# & cherish those moments when the phone rins and & see one of their numbers appear on my caller &!. 2*perience has tauht me that & can cateorize these calls as either crisis# involvin such thins as a computer# a car# an AT; card# or a cell phoneS or noncrisis# when they call +ust to see if &$m alive and well enouh to help with their ne*t crisis. The followin table# called a continency table# cateorizes the last ?> phone calls by child and type of call. /ontinency tables show the actual or relative frequency of two types of data at the same time. &n this case# the data types are child and type of call. 2vent A H the ne*t phone call will come from /hristin. 2vent , H the ne*t phone call will involve a crisis. PQARH 8>I?>H >.: What about the probability that the ne*t phone call will come from /hristin and will involve a crisisA This event is known as the intersection of 2vents A and , and is described by AT,. The number of phone calls from our continency table that meet both criteria is 7:# so( PQA and ,R H PQAR T PQ,RH 7:I?>H >.8D A continency table indicates the number of observations that are classifed accordin to two variables. The intersection of 2vents A and , represents the number of instances where 2vents A and , occur at the same time 5that is# the same phone call is both from /hristin and a crisis6. The probability of the intersection of two events is known as a 1oint probability. T!e union of Events A and , represents the number of instances where either 2vent A or , occur 5that is# the number of calls that were either from /hristin or were a crisis6. PQA and ,R H PQAR F PQ,RH 9:I?>H >.@D /lassical probability requires knowlede of the underlyin process in order to count the number of possible outcomes of the event of interest. 2mpirical probability relies on historical data from a frequency distribution to calculate the likelihood that an event will occur. The law of lare numbers states that when an e*periment is conducted a lare number of times# the empirical probabilities of the process will convere to the classical probabilities. The intersection of 2vents A and , represents the number of instances where 2vents A and , occur at the same time. The union of 2vents A and , represents the number of instances where either 2vent A or , occur. Conditional &robability We defne conditional probability as the probability of 2vent A knowin that 2vent , has already occurred. 2*ample( the followin table shows the outcomes of our last 8> matches# alon with the type of warm%up before we started keepin score. Without any additional information# the simple probability of each of these events is as follows( PQARHEI8>H>.:? PQ,RH79I8>H>.@?# PQA$RH77I8>H>.??# PQ,$RH=I8>H>.9? 4imple or prior probabilities are always based on the total number of observations. &n the previous e*ample# it is 8> matches. Unowin this piece of info# what is the probability that !ebbie will win the matchA This is the conditional probability of 2vent A iven that 2vent , has occurred. 3ookin at the previous table# we can see that 2vent , has occurred 79 times. ,ecause !ebbie has won : of those matches 5A6# the probability of A iven , is calculated as follows( PQAI,RH:I79H>.97 We can also calculate the probability that !ebbie will win( PQAI,$RH?I=H>.=7 /onditional probabilities are also known as posterior probabilities. /onditional probabilities are very useful for determinin the probabilities of compound events as you will see in the followin sections. Independent versus Dependent Events 2vents A and , are said to be independent of each other if the occurrence of 2vent , has no e-ect on the probability of 2vent A. Fsin conditional probability# 2vents A and , are independent of one another if( PQAI,R H PQAR &f 2vents A and , are not independent of one another# then they are said to be dependent events.