You are on page 1of 45

AB1202:Statistical& QuantitativeMethods

Lecture1 Introduction&DataPresentation 20132014,S2 DrMichaelLi

Outline
CourseBriefing Introduction
WhatisStatistics Data&DataSources PopulationsandSamples Examples

Tabular&GraphicalMethodsofDataPresentation
FrequencyDistribution Histograms&ParetoCharts StemPlots&ScatterPlots
2

CourseInformation
COURSEINSTRUCTORS
Dr. Michael Li Dr. Chen Shaoxiang S3-B1A-19 S3-B2A-30 67904659 67906143 zfli@ntu.edu.sg aschen@ntu.edu.sg

COURSEASSESSMENT
Components
Coursework Final Examination (Open-book) Total

Marks
40% 60% 100%

Coursework Components
Class Participation Case Study (Group) Two In-Class Quizzes Sub-Total

Marks
20% 30% 50% 100%

COURSEDELIVERY
12lectures+12tutorials(pleasepayattentiontoMI mobilityinitiative) Twoinclassquizzes:duringTutorial7(week9,afterrecess)&Tutorial11 (week13)respectively Statisticalsoftwareknowledge(required):SPSS(averypowerful/useful statisticssoftware),Excel(addonforstatisticalanalysis),TreePlan (decision trees)
3

CourseCoverage
MakingSenseofDataandSummarizingData ConceptofProbability BayesTheorem RandomVariables&ProbabilityDistributions Binomial, Uniform,Normal,Covariance(AppendixB) DecisionAnalysis SamplingDistributions StatisticalInference:ConfidenceIntervals&HypothesisTesting DesignofExperiment&AnalysisofVariance RegressionModels Simple&MultipleRegressions Requiredtextbook
BruceL.Bowerman,RichardT.OConnellandEmilyS.Murphree. BusinessStatisticsinPractice,SixthEditionMcGrawHill/Irwin,2012
4

WhatIsStatistics?
1. 2. 3. Collecting Data
e.g., Survey

Presenting Data
e.g., Charts & Tables

Data Analysis

Why?

Characterizing Data
e.g., Average

Statistics is the science of data. It involves collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical information.
19841994T/MakerCo.

Decision Making

BasicConcepts
Data:factsandfiguresfromwhich
conclusionscanbedrawn Dataset:thedatathatarecollectedfora particularstudy

Elements:maybepeople,objects, events,orotherentries

Variable:anycharacteristicofan element Measurement:Awaytoassignavalue ofavariabletotheelement


Quantitative:thepossiblemeasurements ofthevaluesofavariablearenumbers thatrepresentquantities Qualitative:thepossiblemeasurements fallintoseveralcategories

Crosssectionaldata:Datacollectedat thesameorapproximatelythesame pointintime


Example:mobilephonebillsof employeesatabankduringaparticular month

Timeseriesdata:datacollectedover differenttimeperiods
Mosteconomicsdataaretimeseries data,e.g.,inflation,unemployment rate,CPI,exchangerate,etc. Periodic(monthly,quarterly,oryearly) corporatesalesfiguresarealsotime seriesdata
6

CrossSectionalData SGExample

Source:SingaporePopulation2012(DepartmentofStatistics)

Amomentofpondering: Anyinsightsfromthedata? Anyimpactonyou?

TimeSeriesData SGExample

Source:SingaporePopulation2012 LiveBirthsrefertoalllivebirthsoccurringwithinSingaporeanditsterritorialwaters. TotalFertilityRatereferstotheaveragenumberoflivebirthseachfemalewouldhave duringherreproductiveyears.


8

DataSources
Existingsources(secondary):dataalreadygatheredby publicorprivatesources
Library Government Datacollectionagency Internet

Experimentalandobservationalstudies(primary):data thatwecollectourselvesforaspecificpurpose
Responsevariable:themainvariableofinterest,e.g.,salary Factors:othervariablesrelatedtoresponsevariable,e.g., education,experiences,etc.
9

DataSourcesfromNTULibrary
NTULibraryBusinessDatabases(someexamples):
CompustatGlobal
Currency,statement,balancesheet,flowoffunds,andsupplemental dataitemsdataoflistedglobalcompaniesfrom1989onwards

BusinessMonitorInternational
Countryrisksandbusinessenvironment

Datamonitor360
Intelligencesincompanies,industries,productsandcountries,etc.

GlobalMarketInformationDatabase(GMID)
Businessintelligenceoncountries,consumersandindustries

InternationalFinancialStatistics(IMF)
Statisticsonexchangerates,internationalreserves,banking,balance ofpayments,governmentfinances,prices,etcformostcountriesin theworld
10

SingaporeGovernmentDataSources
StatisticsSingapore
Economicdata,sectorleveldata,demographicdata, householdsurveydata,nationalcensusdata

HousingDevelopmentBoard(HDB)
Resaleflatprices

UrbanRedevelopmentBoard(URA)
Privateresidentialtransactions

LandTransportAuthority(LTA) Onemotoring
Vehiclepopulation,COEprices,realtimetrafficetc

SingaporeTourismBoard(STB)
Annual,quarterlyandmonthlytourismstatistics
11

KeyConcepts:PopulationsandSamples
Population Thesetofallelementsaboutwhichwewish todrawconclusions(people,objectsor events) Anexaminationoftheentirepopulationof measurements Aselectedsubsetoftheunitsofa population

Census

Sample

12

StatisticalMethods
Statistical Methods

Descriptive Statistics

Inferential Statistics

thescienceofdescribing theimportantaspectsof asetofmeasurements

thescienceofusingasampleof measurementstomake generalizationsaboutthe importantaspectsofa populationofmeasurements


13

Example1:EstimatingCellPhoneCosts(p.8)
Abankwishestodecidewhethertohireacellular managementservicetochooseitsemployeescalling plans.
Over10,000employees,ondifferenttypesofcallingplans

Thecellularservicecompanysuggestsstudyingthe callingpatternsofmobileuserson500minuteper monthplans


Purpose:whethercellularcostscanbesubstantiallyreduced Thebankhas2,136employeesonavarietyof500minute permonthplans,withdifferentbasicmonthlyrates,different coveragecharges,anddifferentadditionalchargesforlong distancecallsandroaming.
14

CellPhoneCosts(cont.)
Selectingarandomsample(from2,136employees)
Arandomsampleof100employeeson500minuteplan Keyobservation:manyoverages andunderage

Datafile: Lect01Data.xlsx Worksheet:CellUse

Excelfunction: Countif(range,criteria)

15

Example2:RatingaNewDesign
Abrandingcompanyisstudyingtoseeifchangesshould bemadeinthebottledesignforapopularsoftdrink.
Respondentsareshoppersfromalargeshoppingmallona particularSaturday Exposedtothenewbottledesignandaskedtorate:
Fiveitemswitha7pointLikertscale (surveyinstrument) Acompositescoreisthesumofallfiveitems Ruleofthumb:ascoreof25isthesmallestscoreforasuccess

16

RatingaNewDesign(cont.)
Samplingmethod:interceptionmethod
Notacompletelyrandomsample,butcangeneratean approximatelyrandomsample(how?) Asamplesizeof60
Worksheet: Design

Keyobservations:57of60(i.e.,95%%)compositescoresareatleast25

17

Example3:EstimatingCarGasMileage
Studyoftaxcreditofferedbythefederalgovernmentto automakersforimprovingfueleconomyofgasoline poweredmidsizecars Automakerhasintroducedanewmodelandwishesto demonstrateitqualifiesforthetaxcredit USEPAFuelEconomy:
http://www.epa.gov/fueleconomy/ Marketaverage:26milespergallon(mpg)(year2009) Taxincentivegoal:animprovementof5mpg,i.e.,atleast31 mpg

18

EstimatingCarGasMileage(cont.)
Anapproximatelyrandomsampleof50cars
Onecarfromeachof50consecutiveproductionshifts EachselectedcarissubjecttoanEPAtest
7.5milecitydrivingtrip&a10milehighwaydriving Acombinedmileageforthecar

Varyfrom29.8mpgto33.3mpg 38ourof50(76%)ofthemileagesaregreaterthan31mpg.
19

DataPresentationTechniques
GraphicallySummarizingQualitativeData
Frequencydistribution,barchart,piechart,Paretochart

GraphicallySummarizingQuantitativeData
Frequencydistribution,histograms,ogives

StemandLeafDisplays Crosstabulation Tables ScatterPlots

20

FrequencyDistributionforQualitativeData
Withqualitativedata,namesidentifythedifferent categories Thisdatacanbesummarizedusingafrequency distribution
Frequencydistribution:
Atablethatsummarizesthenumberofitemsineachofseveralnon overlappingclasses

21

Example2.1: 2006JeepPurchasingPatterns
Table2.1listsall251vehiclessoldin2006bytheJeepdealers
Itdoesnotrevealmuchusefulinformation

Afrequencydistributionisausefulsummary
SimplycountthenumberoftimeseachmodelappearsinTable2.1

Worksheet:JeepSales

22

RelativeFrequencyandPercentFrequency
Relativefrequencysummarizestheproportionof itemsineachclass
Foreachclass,dividethefrequencyoftheclassbythetotal numberofobservations Multiplyby100toobtainthepercentfrequency

Worksheet:JeepSales
23

BarChartsandPieCharts
Barchart:Averticalorhorizontalrectanglerepresents thefrequencyforeachcategory
Heightcanbefrequency,relativefrequency,orpercent frequency

Piechart:Acircledividedintosliceswherethesizeof eachslicerepresentsitsrelativefrequencyorpercent frequency UsingExceltodrawbarchartandpiechart easy

24

ExcelBarandPieChartoftheJeepSalesData

Worksheet:JeepSales
25

ParetoChart
Paretochart:Abarcharthavingthedifferentkindsof defectslistedonthehorizontalscale
Barheightrepresentsthefrequencyofoccurrence Barsarearrangedindecreasingheightfromlefttoright Sometimesaugmentedbyplottingacumulativepercentage pointforeachbar

Worksheet:Labels
26

GraphicallySummarizingQuantitativeData
Oftenneedtosummarizeanddescribetheshapeof thedistribution Onewayistogroupthemeasurementsintoclasses ofafrequencydistributionand
Classifyandcount Thefrequencydistributionisatable

Thendisplaythedataintheformofahistogram
Thehistogramisapictureofthefrequencydistribution
27

ConstructingaFrequencyDistribution
Stepsinmakingafrequencydistribution:
1. 2. 3. 4. 5. Findthenumberofclasses Findtheclasslength Formnonoverlappingclassesofequalwidth Tallyandcount Graphthehistogram

Example2.2:Paymenttime Asampleof60observations,min=10days,max=65days
28

NumberofClasses&ClassLength
NumberofClasses
GroupallofthendataintoKnumber ofclasses Kisthesmallestwholenumberfor which2K n(aguideonly) InExamples2.2n=65
ForK=6,26 =64,<n ForK=7,27 =128,>n SouseK=7classes

Classlength
Findthelengthofeachclassasthe largestmeasurementminusthe smallestdividedbythenumberof classesfoundearlier(K) ForExample2.2,(2910)/7 = 2.7143
Becausepaymentsmeasuredindays, roundtothreedays
29

Histogram UsingExcel

25 20 15 10 5 0 10<13 13<16 3 14

Histogram
23

12 8 4 1 16<19 19<22 22<25 25<28 28<31 30

Histogram UsingSPSS

SPSSdatafile: Lect01PaymentTime.sav
Note:Moststatisticalsoftwaregenerateshistogramsautomatically sothereis nouniquehistogramsolongasthegraphshowsthedatapattern.

Histograms:ThreeGeneralCases
Symmetrical: Therightand lefttailsofthe histogram appeartobe mirrorimages ofeachother

Skewedtothe right:Theright tailofthe histogramis longerthan thelefttail

Skewedtothe left:Thelefttail ofthe histogramis longerthanthe righttail


32

CumulativeDistributions
Anotherwaytosummarizeadistributionistoconstructa cumulativedistribution Todothis,usethesamenumberofclasses,classlengths,and classboundariesusedforthefrequencydistribution Ratherthanacount,werecordthenumberofmeasurements thatarelessthantheupperboundaryofthatclass,inother words,arunningtotal.

33

Ogive
Ogive:Agraphofacumulative distribution
Plotapointaboveeachupper classboundaryatheightof cumulativefrequency Connectpointswithlinesegments Canalsobedrawnusing
Cumulativerelativefrequencies Cumulativepercentfrequencies

Worksheet:PayTime

34

StemandLeafDisplays
Purposeistoseetheoverallpatternofthedata,by groupingthedataintoclasses
thevariationfromclasstoclass theamountofdataineachclass thedistributionofthedatawithineachclass

Bestforsmalltomoderatelysizeddatadistributions

35

Thestemandleafdisplay:
29+0.8=29.8 298 3013455677888 310012334444455667778899 3201112334455778 33+0.3=33.3 3303

CarMileageExample
Lookingatthestemandleaf display,thedistributionappears almostsymmetrical Theupperportion(29,30,31)is almostamirrorimageofthelower portionofthedisplay(31,32,33) Butnotexactlyamirrorreflection

SPSSdatafile: Lect01GasMiles.sav
36

ConstructingaStemandLeafDisplay
Norulesthatdictatethenumberofstemvalues
Cansplitthestemsasneeded UseSPSS(Excelcannotgeneratestemplots)

SPSSdatafile:Lect01PaymentTime.sav
37

StemandLeafDisplay SPSS

Stemandleafdisplayfor PaymentTimedata

Stemandleafdisplayfor CarMileagedata

Note:StepandleafdisplaysareNOTunique!

Crosstabulation Tables

Classifiesdataontwodimensions
Rowsclassifyaccordingtoonedimension Columnsclassifyaccordingtoaseconddimension

1. 2. 3.

Requiresthreevariables
Therowvariable Thecolumnvariable Thevariablecountedinthecells

SPSScaneasilycreatecrosstabulationtables
39

Example2.5:InvestorSatisfaction
Therawdata:fundtype&satisfactionlevel

40

InvestorSatisfaction:Crosstabulation
Acrosstabulationtableoffundtypevs.satisfactionlevel

41

Crosstabulations UsingSPSS
AnalyzeDescrip veSta s csCrosstabs

SPSSdatafile: Lect01Invest.sav

42

ScatterPlots
Usedtostudyrelationshipsbetweentwovariables
Placeonevariableonthexaxis Placeasecondvariableontheyaxis Placedotonpaircoordinates

Software
Excel:easy&simple SPSS:easy&sophisticated!

TypesofRelationships
Linear:Astraightlinerelationshipbetweenthetwovariables
Positive:Whenonevariablegoesup,theothervariablegoesup Negative:Whenonevariablegoesup,theothervariablegoesdown

NoLinearRelationship:Thereisnocoordinatedlinearmovementbetween thetwovariables

43

ScatterPlots UsingExcel
Worksheet SalesPlot

44

EndofLecture1

NEXTLECTURE:CHAPTER3 DESCRIPTIVESTATISTICS
45

You might also like