Chapter 1a

Statistics
Statistics is the science that studies the collection and interpretation of numerical data. Statistics divides the study of data into three parts: Generation or production of data Organisation and analysis of data (Data Analysis) Drawing conclusions from the data (Statistical Inference) Data Analysis Conclusions
Statistics
Statistics
Statistics
Business Applications of Statistics

Accounting: Accountants use statistical sampling procedures when conducting audits for their clients. Finance: Financial managers use a variety of statistical information, including price-earnings ratios, to guide their investment recommendations. Marketing: Point-of-sale scanners at retail checkout counters are used to collect data for market research. Production: Statistical quality control charts are used to monitor the output of a production process. Economics: Economists use statistical information in making economic forecasts.





1 Data Collection and Presentation

Data are the facts and gures collected, analyzed, and summarized for presentation and interpretation. Together, the data collected in a study are referred to as the data set for the study. The elements are the entities on which data are collected. A variable is a characteristic of interest for the elements. The set of measurements collected for a particular element is called an observation.




Data Collection
An observational study observes individuals and collects information on them but does not attempt to inuence the data collected. Example: A health insurance company wants to nd out if it should market its health insurance plans differently to men and women. A market researcher interviews men and women and asks them to list the features of the plan that he or she considers essential.
Data Collection
Data Collection
An experiment deliberately inuences events and investigates the effects of the intervention. Example: A study was carried out on a group of students to investigate the effects of repeated exposure to an advertising message. The students were asked to watch Some students were given a 40-minute television programme that included ads for a particular brand of fast food. a programme to watch with 1 ad, others watched a programme with 3 ads and others watched a programme with 5 ads.
An experiment deliberately inuences events and investigates the effects of the intervention. Example: A study was carried out on a group of students to investigate the effects of repeated exposure to an advertising message. The students were asked to watch a 40-minute television programme that included ads for a particular brand of fast food. Some students were given a programme to watch with 1 ad, others watched a programme with 3 ads and others watched a programme with 5 ads.
Qualitative and Quantitative Data

The statistical analysis that is appropriate for a variable depends on whether the data are qualitative or quantitative. Qualitative data are labels or names used to identify an attribute of each element. Qualitative data are usually non-numeric e.g. gender or social class. Quantitative data indicate how much or how many. Quantitative data are numeric, e.g. age or height.




Quantitative data may be discrete or continuous. Discrete data: values change by whole numbers or steps e.g. family size. Continuous data: values can be any (decimal) number in a given range e.g. height.
Cross-Sectional and Time Series Data

Cross-sectional data are collected at the same or approximately the same point in time. Example: data detailing the number of planning permissions issued in June 2011 in each of the counties of Ireland. Time series data are collected over several time periods. Example: data detailing the number of planning permissions issued in county Limerick, each month from January 2005 to December 2010.



Scales of Data Measurement
As well as categorising data into certain types, we can also categorise data by levels or scales of measurement. Qualitative data is usually measured at a nominal or ordinal scale of measurement. Nomimal: the data gives the or person or object a label or tells us what category a person or object falls into e.g. the colour of a car.
Ordinal: the data has all the properties of nominal data but we also get more information, since the order of the data is meaningful very easy easy e.g. do you think this module is challenging very challenging
Whatever you answer is still just a word i.e. non-numeric, but it gives more information than the colour of a car. All students can be ranked depending on what they think of the module.
Quantitative data is usually measured at an interval or ratio scale of measurement. Interval: the data is numeric and has order in the same way as ordinal data has order. We can also measure the difference between two observations e.g. we can measure the difference between two temperatures of 10 and 20 , namely 10 degrees. We cannot measure the difference between two students where one thinks the module is easy and the other nds it challenging i.e. we will not get a meaningful number.
Ratio: the data is numeric and has all the properties of interval data. The ratio of the data is also meaningful e.g. we can say things like Im half her age or twice her weight. The data has a meaningful and unique zero point, for example in measuring age the zero point is when you were born. An interval scale is based on measurements from an arbitrary zero. For example, 0 Celsius and 0 Fahrenheit are two different assignments of zero temperature.
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Sampling
We often read headlines in newspapers saying things like "61% of people are satised with the governments performance". How can the newspaper make such a statement when they havent asked everyone in the country their opinion of the government? The newspaper has taken a representative subset of the population and assumed that what happens for that subset is what happens for the whole population.
Sampling
Sampling
Is this assumption valid? How do you select a representative subset? What mistakes can you make selecting this subset and what can be done to correct these mistakes? Without understanding the concepts behind selecting a subset of the population i.e. sampling, we can make serious errors in our conclusions about the population.
Denitions
First, we need to dene some terms. These will be illustrated using the example of a pre-election poll on which political party is going to win the election. Population: the entire group of objects/subjects about which information is wanted. For our example, the population is all adults on the electoral register. Sample: any subset of a population e.g. a representative subset of individuals from the electoral register.
Denitions
Denitions
Unit: any individual member of the population e.g. an individual on the electoral register. Sampling frame: a list of the individuals in the population e.g. the electoral register. Variable: something whose value we can measure for each person - its value will differ from person to person e.g. the political party a person says they will vote for.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we The value of a We estimate it ask everyone in the population.
parameter is xed but usually unknown. by using information from a sample.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we The value of a We estimate it ask everyone in the population.
parameter is xed but usually unknown. by using information from a sample.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we We estimate it ask everyone in the population. The value of a parameter is xed but usually unknown. by using information from a sample.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we ask everyone in the population. The value of a parameter is xed but usually unknown. We estimate it by using information from a sample.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we ask everyone in the population. The value of a parameter is xed but usually unknown. We estimate it by using information from a sample.
Statistic: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the sample; for example, the percentage of a representative sample of adults who will vote for a particular political party or the average age of this representative sample. We can nd out these values since we can ask everyone in the sample, but if we took a different sample of people we might get a different value. known. The value of a statistic is not xed but is We estimate the value of the parameter by using
the value of the statistic.
Example 1 A telesales company in Cork uses a device that dials residential phone numbers in the city at random. Of the rst 100 numbers dialed, 7 are unlisted. 10% of all Cork residential phones are unlisted. The 10% is a parameter (refers to whole population). The 7% is a statistic (refers to sample).
Example 2 A politician is interested in whether her constituents are in favour of the household charge. Her staff reports that letters on the issue have been received from 100 constituents and that 80 of these are opposed to the charge. Identify the population, variable measured and the sample. Population: all voters in her constituency. Variable measured: percentage of constituents opposed to the charge. Sample: the 100 constituents who wrote letters.
Non-probability Sampling Methods

Sampling methods are the different ways of selecting a subset of individuals from the population. We can select a group simply because its easy for us to contact these people and they are willing to answer our questions. This method of sampling is appropriately The sample is identied called convenience sampling. consumer research.
primarily by convenience e.g. volunteer panels for The advantage is relatively easy sample selection and data collection but it is impossible to evaluate the goodness of the sample in terms of its representativeness of the population.

Sampling methods are the different ways of selecting a subset of individuals from the population. We can select a group simply because its easy for us to contact these people and they are willing to answer our questions. This method of sampling is appropriately called convenience sampling. consumer research. The sample is identied primarily by convenience e.g. volunteer panels for The advantage is relatively easy sample selection and data collection but it is impossible to evaluate the goodness of the sample in terms of its representativeness of the population.

Sampling methods are the different ways of selecting a subset of individuals from the population. We can select a group simply because its easy for us to contact these people and they are willing to answer our questions. This method of sampling is appropriately called convenience sampling. consumer research. The sample is identied primarily by convenience e.g. volunteer panels for The advantage is relatively easy sample selection and data collection but it is impossible to evaluate the goodness of the sample in terms of its representativeness of the population.

Sampling methods are the different ways of selecting a subset of individuals from the population. We can select a group simply because its easy for us to contact these people and they are willing to answer our questions. This method of sampling is appropriately called convenience sampling. The sample is identied primarily by convenience e.g. volunteer panels for consumer research. The advantage is relatively easy sample selection and data collection but it is impossible to evaluate the goodness of the sample in terms of its representativeness of the population.

Sampling methods are the different ways of selecting a subset of individuals from the population. We can select a group simply because its easy for us to contact these people and they are willing to answer our questions. This method of sampling is appropriately called convenience sampling. The sample is identied primarily by convenience e.g. volunteer panels for consumer research. The advantage is relatively easy sample selection and data collection but it is impossible to evaluate the goodness of the sample in terms of its representativeness of the population.
We can also use an expert to pick the people he or she considers most representative of the population. This is called judgement sampling. The quality of the sample depends on the judgement of the person selecting it. Quota sampling is another method of sampling widely used in opinion polling and market research. Interviewers are each given a quota of subjects of a specied type to survey; for example, an interviewer might be told to nd 20 adult men and 20 adult women, 10 teenage girls and 10 teenage boys to interview about their television viewing.
It is clear that these methods are open to mistakes being made. Lets go back to the example of the pre-election poll on who is going to win the election. If I pick a group of people to answer this question simply because its convenient for me, my sample may include all people from a particular background and exclude everyone from a different background. Can I be certain that the conclusions for these people can be applied to the whole population?
Bias in Sampling
If there is a tendency for a certain group of the population to be omitted from the sample, or if people who refuse to co-operate from a group which is, in some way, different to the population, we have what is called a biased sample. We often use the word bias, but its denition in the context of sampling is a systematic tendency to overestimate or underestimate the population parameter of interest. For the election example, if a sample consists solely of people from an area with high unemployment, I may under-estimate the level of support for a particular political party.
Bias in Sampling
Bias in Sampling
How can we eliminate bias? Bias can be eliminated by taking a random sample. This is a sample where everyone in the population has the same chance of getting into the sample, and the fact that one individual has got into the sample does not affect the chances of another individual getting into the sample i.e. everyone has an independent and equal chance of being included in the sample. This method of sampling is called simple random sampling.
Probability Sampling Methods

All methods of sampling based on taking a random sample are called probability sampling methods, since we know the chance (or probability) of someone getting into our sample i.e. it is the same for everyone. Convenience, judgement and quota sampling methods are called non-probability sampling methods, since we dont know what the chances are for an individual getting into our sample i.e. it is not the same for everyone.
Probability Sampling Methods

All methods of sampling based on taking a random sample are called probability sampling methods, since we know the chance (or probability) of someone getting into our sample i.e. it is the same for everyone. Convenience, judgement and quota sampling methods are called non-probability sampling methods, since we dont know what the chances are for an individual getting into our sample i.e. it is not the same for everyone.
Many methods of analysis make the assumption that the sample from which the data are collected is a random sample. There are many different types of probability sampling. These include:
Many methods of analysis make the assumption that the sample from which the data are collected is a random sample. There are many different types of probability sampling. These include:
Stratied Random sampling: Stratied sampling techniques are generally used when the population is heterogeneous (dissimilar) and where certain homogeneous (similar) sub-populations can be isolated. These sub-populations are called strata. The population is divided into strata, such that each unit in the population belongs to one and only one stratum. The basis for forming the strata could be age, gender, industry type etc. A simple random sample is taken from each stratum.
Cluster sampling: the population is divided into separate groups of units called clusters. one and only one cluster. Each unit belongs to All units A A simple random sample of
clusters is selected from a list of all clusters.
within each chosen cluster are included in the sample. cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a
researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland.
Cluster sampling: the population is divided into separate groups of units called clusters. Each unit belongs to All units A one and only one cluster. A simple random sample of clusters is selected from a list of all clusters. within each chosen cluster are included in the sample. cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland.
Cluster sampling: the population is divided into separate groups of units called clusters. Each unit belongs to one and only one cluster. A simple random sample of clusters is selected from a list of all clusters. All units within each chosen cluster are included in the sample. cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland. A
Cluster sampling: the population is divided into separate groups of units called clusters. Each unit belongs to one and only one cluster. A simple random sample of clusters is selected from a list of all clusters. All units within each chosen cluster are included in the sample. A cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland.
Systematic sampling: For example, if a sample of size 50 is required from a population with 5000 units, we would include in the sample one unit for every 5000/50 = 100 units in the population. One of the rst 100 units of the population is selected at random from a list of all members of the population. Other sample units are found by starting with the rst unit and then selecting every 100th unit that follows in the population list. In effect, the sample of 50 is identied by moving systematically through the population and identifying every 100th unit after the rst randomly selected unit.
Use of random number tables

Suppose we want to choose 4 people randomly from 50. We can use tables of random numbers (or computer generated random numbers) to do this. Here is a list taken from such a table: 80428 45440 50112 20876 27228 34292 10329 15881 Take the last two digits of each: 28 40 12 76 28 92 29 81 Eliminate repetitions and numbers not between 1 and 50: 28 40 12 29 These are our four numbers.


Lack of precision in sampling

Lack of precision is where the result of sampling is not repeatable. Every time we take a sample and ask the question of interest, we get a different answer. How can we make conclusions for the population if we get different answers from each sample? How can we correct this problem? If we increase the sample size, we also increase the repeatability or precision of our results.




A large random sample will include many people with lots of different characteristics whereas a small sample will not have the same range of people or characteristics. Thus, the results from one small sample may differ considerably from the results of another small sample depending on the range of people and characteristics it includes.
A large random sample will include many people with lots of different characteristics whereas a small sample will not have the same range of people or characteristics. Thus, the results from one small sample may differ considerably from the results of another small sample depending on the range of people and characteristics it includes.
Errors in sampling
Sampling errors (related to the act of selecting the sample) e.g.a biased sampling method; sampling frame which differs from the population. Non-sampling errors (not related to the act of selecting the sample) e.g. missing data, response errors, processing errors, wording of questions.
Errors in sampling
Sampling errors (related to the act of selecting the sample) e.g.a biased sampling method; sampling frame which differs from the population. Non-sampling errors (not related to the act of selecting the sample) e.g. missing data, response errors, processing errors, wording of questions.
Questionnaires
Questionnaires are often used to collect data. Some of the issues involved in using questionnaires are: How do you contact the individual - mail, telephone, interview? How is the issue of anonymity (i.e. subjects are anonymous even to the person carrying out the survey) versus condentiality (i.e. all data about individuals is known to the person carrying out the survey but kept condential) handled? What is the response rate?
Questionnaires
Questionnaires
Questionnaires
Questionnaires
If there is a high non-response rate, are there differences between those who do or do not respond i.e. volunteer bias? We should try to assess differences between responders and non-responders and try to get as high a response rate as possible e.g. use second or third mailings to those who dont respond to the rst. Are there missing data - was the questionnaire too long, complicated, misunderstood? Analysis of questionnaire data - how will the data be processed? How long will it take to input data?

Chapter 1a

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1a

Uploaded by

Copyright:

Available Formats

Statistics

Business Applications of Statistics

Business Applications of Statistics

Business Applications of Statistics

Business Applications of Statistics

Business Applications of Statistics

Business Applications of Statistics

1 Data Collection and Presentation

1 Data Collection and Presentation

1 Data Collection and Presentation

1 Data Collection and Presentation

1 Data Collection and Presentation

Qualitative and Quantitative Data

Qualitative and Quantitative Data

Qualitative and Quantitative Data

Qualitative and Quantitative Data

Qualitative and Quantitative Data

Cross-Sectional and Time Series Data

Cross-Sectional and Time Series Data

Cross-Sectional and Time Series Data

Cross-Sectional and Time Series Data

Scales of Data Measurement

Scales of Data Measurement

Scales of Data Measurement

parameter is xed but usually unknown. by using information from a sample.

parameter is xed but usually unknown. by using information from a sample.

the value of the statistic.

the value of the statistic.

the value of the statistic.

the value of the statistic.

the value of the statistic.

Non-probability Sampling Methods

Non-probability Sampling Methods

Non-probability Sampling Methods

Non-probability Sampling Methods

Non-probability Sampling Methods

Probability Sampling Methods

Probability Sampling Methods

clusters is selected from a list of all clusters.

Use of random number tables

Use of random number tables

Use of random number tables

Lack of precision in sampling

Lack of precision in sampling

Lack of precision in sampling

Lack of precision in sampling

Lack of precision in sampling

You might also like