Professional Documents
Culture Documents
Statistics is the science that studies the collection and interpretation of numerical data. Statistics divides the study of data into three parts: Generation or production of data Organisation and analysis of data (Data Analysis) Drawing conclusions from the data (Statistical Inference) Data Analysis Conclusions
Statistics
Statistics is the science that studies the collection and interpretation of numerical data. Statistics divides the study of data into three parts: Generation or production of data Organisation and analysis of data (Data Analysis) Drawing conclusions from the data (Statistical Inference) Data Analysis Conclusions
Statistics
Statistics is the science that studies the collection and interpretation of numerical data. Statistics divides the study of data into three parts: Generation or production of data Organisation and analysis of data (Data Analysis) Drawing conclusions from the data (Statistical Inference) Data Analysis Conclusions
Statistics
Statistics is the science that studies the collection and interpretation of numerical data. Statistics divides the study of data into three parts: Generation or production of data Organisation and analysis of data (Data Analysis) Drawing conclusions from the data (Statistical Inference) Data Analysis Conclusions
Data Collection
An observational study observes individuals and collects information on them but does not attempt to inuence the data collected. Example: A health insurance company wants to nd out if it should market its health insurance plans differently to men and women. A market researcher interviews men and women and asks them to list the features of the plan that he or she considers essential.
Data Collection
An observational study observes individuals and collects information on them but does not attempt to inuence the data collected. Example: A health insurance company wants to nd out if it should market its health insurance plans differently to men and women. A market researcher interviews men and women and asks them to list the features of the plan that he or she considers essential.
Data Collection
An observational study observes individuals and collects information on them but does not attempt to inuence the data collected. Example: A health insurance company wants to nd out if it should market its health insurance plans differently to men and women. A market researcher interviews men and women and asks them to list the features of the plan that he or she considers essential.
An experiment deliberately inuences events and investigates the effects of the intervention. Example: A study was carried out on a group of students to investigate the effects of repeated exposure to an advertising message. The students were asked to watch Some students were given a 40-minute television programme that included ads for a particular brand of fast food. a programme to watch with 1 ad, others watched a programme with 3 ads and others watched a programme with 5 ads.
An experiment deliberately inuences events and investigates the effects of the intervention. Example: A study was carried out on a group of students to investigate the effects of repeated exposure to an advertising message. The students were asked to watch a 40-minute television programme that included ads for a particular brand of fast food. Some students were given a programme to watch with 1 ad, others watched a programme with 3 ads and others watched a programme with 5 ads.
An experiment deliberately inuences events and investigates the effects of the intervention. Example: A study was carried out on a group of students to investigate the effects of repeated exposure to an advertising message. The students were asked to watch a 40-minute television programme that included ads for a particular brand of fast food. Some students were given a programme to watch with 1 ad, others watched a programme with 3 ads and others watched a programme with 5 ads.
An experiment deliberately inuences events and investigates the effects of the intervention. Example: A study was carried out on a group of students to investigate the effects of repeated exposure to an advertising message. The students were asked to watch a 40-minute television programme that included ads for a particular brand of fast food. Some students were given a programme to watch with 1 ad, others watched a programme with 3 ads and others watched a programme with 5 ads.
Quantitative data may be discrete or continuous. Discrete data: values change by whole numbers or steps e.g. family size. Continuous data: values can be any (decimal) number in a given range e.g. height.
Quantitative data may be discrete or continuous. Discrete data: values change by whole numbers or steps e.g. family size. Continuous data: values can be any (decimal) number in a given range e.g. height.
Quantitative data may be discrete or continuous. Discrete data: values change by whole numbers or steps e.g. family size. Continuous data: values can be any (decimal) number in a given range e.g. height.
As well as categorising data into certain types, we can also categorise data by levels or scales of measurement. Qualitative data is usually measured at a nominal or ordinal scale of measurement. Nomimal: the data gives the or person or object a label or tells us what category a person or object falls into e.g. the colour of a car.
As well as categorising data into certain types, we can also categorise data by levels or scales of measurement. Qualitative data is usually measured at a nominal or ordinal scale of measurement. Nomimal: the data gives the or person or object a label or tells us what category a person or object falls into e.g. the colour of a car.
As well as categorising data into certain types, we can also categorise data by levels or scales of measurement. Qualitative data is usually measured at a nominal or ordinal scale of measurement. Nomimal: the data gives the or person or object a label or tells us what category a person or object falls into e.g. the colour of a car.
Ordinal: the data has all the properties of nominal data but we also get more information, since the order of the data is meaningful very easy easy e.g. do you think this module is challenging very challenging
Whatever you answer is still just a word i.e. non-numeric, but it gives more information than the colour of a car. All students can be ranked depending on what they think of the module.
Ordinal: the data has all the properties of nominal data but we also get more information, since the order of the data is meaningful very easy easy e.g. do you think this module is challenging very challenging
Whatever you answer is still just a word i.e. non-numeric, but it gives more information than the colour of a car. All students can be ranked depending on what they think of the module.
Ordinal: the data has all the properties of nominal data but we also get more information, since the order of the data is meaningful very easy easy e.g. do you think this module is challenging very challenging
Whatever you answer is still just a word i.e. non-numeric, but it gives more information than the colour of a car. All students can be ranked depending on what they think of the module.
Quantitative data is usually measured at an interval or ratio scale of measurement. Interval: the data is numeric and has order in the same way as ordinal data has order. We can also measure the difference between two observations e.g. we can measure the difference between two temperatures of 10 and 20 , namely 10 degrees. We cannot measure the difference between two students where one thinks the module is easy and the other nds it challenging i.e. we will not get a meaningful number.
Quantitative data is usually measured at an interval or ratio scale of measurement. Interval: the data is numeric and has order in the same way as ordinal data has order. We can also measure the difference between two observations e.g. we can measure the difference between two temperatures of 10 and 20 , namely 10 degrees. We cannot measure the difference between two students where one thinks the module is easy and the other nds it challenging i.e. we will not get a meaningful number.
Quantitative data is usually measured at an interval or ratio scale of measurement. Interval: the data is numeric and has order in the same way as ordinal data has order. We can also measure the difference between two observations e.g. we can measure the difference between two temperatures of 10 and 20 , namely 10 degrees. We cannot measure the difference between two students where one thinks the module is easy and the other nds it challenging i.e. we will not get a meaningful number.
Quantitative data is usually measured at an interval or ratio scale of measurement. Interval: the data is numeric and has order in the same way as ordinal data has order. We can also measure the difference between two observations e.g. we can measure the difference between two temperatures of 10 and 20 , namely 10 degrees. We cannot measure the difference between two students where one thinks the module is easy and the other nds it challenging i.e. we will not get a meaningful number.
Ratio: the data is numeric and has all the properties of interval data. The ratio of the data is also meaningful e.g. we can say things like Im half her age or twice her weight. The data has a meaningful and unique zero point, for example in measuring age the zero point is when you were born. An interval scale is based on measurements from an arbitrary zero. For example, 0 Celsius and 0 Fahrenheit are two different assignments of zero temperature.
Ratio: the data is numeric and has all the properties of interval data. The ratio of the data is also meaningful e.g. we can say things like Im half her age or twice her weight. The data has a meaningful and unique zero point, for example in measuring age the zero point is when you were born. An interval scale is based on measurements from an arbitrary zero. For example, 0 Celsius and 0 Fahrenheit are two different assignments of zero temperature.
Ratio: the data is numeric and has all the properties of interval data. The ratio of the data is also meaningful e.g. we can say things like Im half her age or twice her weight. The data has a meaningful and unique zero point, for example in measuring age the zero point is when you were born. An interval scale is based on measurements from an arbitrary zero. For example, 0 Celsius and 0 Fahrenheit are two different assignments of zero temperature.
Ratio: the data is numeric and has all the properties of interval data. The ratio of the data is also meaningful e.g. we can say things like Im half her age or twice her weight. The data has a meaningful and unique zero point, for example in measuring age the zero point is when you were born. An interval scale is based on measurements from an arbitrary zero. For example, 0 Celsius and 0 Fahrenheit are two different assignments of zero temperature.
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Classify the following by data type and scale of measurement: Sector of business e.g. manufacturing or service qualitative, nominal Cigarette consumption e.g. light, moderate, heavy qualitative, ordinal Number of visits to the doctor per year quantitative, ratio, discrete Weight Income Longitude quantitative, ratio, continuous quantitative, ratio, continuous quantitative, interval, continuous
Sampling
We often read headlines in newspapers saying things like "61% of people are satised with the governments performance". How can the newspaper make such a statement when they havent asked everyone in the country their opinion of the government? The newspaper has taken a representative subset of the population and assumed that what happens for that subset is what happens for the whole population.
Sampling
We often read headlines in newspapers saying things like "61% of people are satised with the governments performance". How can the newspaper make such a statement when they havent asked everyone in the country their opinion of the government? The newspaper has taken a representative subset of the population and assumed that what happens for that subset is what happens for the whole population.
Sampling
We often read headlines in newspapers saying things like "61% of people are satised with the governments performance". How can the newspaper make such a statement when they havent asked everyone in the country their opinion of the government? The newspaper has taken a representative subset of the population and assumed that what happens for that subset is what happens for the whole population.
Is this assumption valid? How do you select a representative subset? What mistakes can you make selecting this subset and what can be done to correct these mistakes? Without understanding the concepts behind selecting a subset of the population i.e. sampling, we can make serious errors in our conclusions about the population.
Is this assumption valid? How do you select a representative subset? What mistakes can you make selecting this subset and what can be done to correct these mistakes? Without understanding the concepts behind selecting a subset of the population i.e. sampling, we can make serious errors in our conclusions about the population.
Is this assumption valid? How do you select a representative subset? What mistakes can you make selecting this subset and what can be done to correct these mistakes? Without understanding the concepts behind selecting a subset of the population i.e. sampling, we can make serious errors in our conclusions about the population.
Is this assumption valid? How do you select a representative subset? What mistakes can you make selecting this subset and what can be done to correct these mistakes? Without understanding the concepts behind selecting a subset of the population i.e. sampling, we can make serious errors in our conclusions about the population.
Denitions
First, we need to dene some terms. These will be illustrated using the example of a pre-election poll on which political party is going to win the election. Population: the entire group of objects/subjects about which information is wanted. For our example, the population is all adults on the electoral register. Sample: any subset of a population e.g. a representative subset of individuals from the electoral register.
Denitions
First, we need to dene some terms. These will be illustrated using the example of a pre-election poll on which political party is going to win the election. Population: the entire group of objects/subjects about which information is wanted. For our example, the population is all adults on the electoral register. Sample: any subset of a population e.g. a representative subset of individuals from the electoral register.
Denitions
First, we need to dene some terms. These will be illustrated using the example of a pre-election poll on which political party is going to win the election. Population: the entire group of objects/subjects about which information is wanted. For our example, the population is all adults on the electoral register. Sample: any subset of a population e.g. a representative subset of individuals from the electoral register.
Unit: any individual member of the population e.g. an individual on the electoral register. Sampling frame: a list of the individuals in the population e.g. the electoral register. Variable: something whose value we can measure for each person - its value will differ from person to person e.g. the political party a person says they will vote for.
Unit: any individual member of the population e.g. an individual on the electoral register. Sampling frame: a list of the individuals in the population e.g. the electoral register. Variable: something whose value we can measure for each person - its value will differ from person to person e.g. the political party a person says they will vote for.
Unit: any individual member of the population e.g. an individual on the electoral register. Sampling frame: a list of the individuals in the population e.g. the electoral register. Variable: something whose value we can measure for each person - its value will differ from person to person e.g. the political party a person says they will vote for.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we The value of a We estimate it ask everyone in the population.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we The value of a We estimate it ask everyone in the population.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we We estimate it ask everyone in the population. The value of a parameter is xed but usually unknown. by using information from a sample.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we ask everyone in the population. The value of a parameter is xed but usually unknown. We estimate it by using information from a sample.
Parameter: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the population; for example, the percentage of adults on the electoral register who will vote for a particular political party or the average age of the voters. We will never nd out these values unless we ask everyone in the population. The value of a parameter is xed but usually unknown. We estimate it by using information from a sample.
Statistic: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the sample; for example, the percentage of a representative sample of adults who will vote for a particular political party or the average age of this representative sample. We can nd out these values since we can ask everyone in the sample, but if we took a different sample of people we might get a different value. known. The value of a statistic is not xed but is We estimate the value of the parameter by using
Statistic: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the sample; for example, the percentage of a representative sample of adults who will vote for a particular political party or the average age of this representative sample. We can nd out these values since we can ask everyone in the sample, but if we took a different sample of people we might get a different value. known. The value of a statistic is not xed but is We estimate the value of the parameter by using
Statistic: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the sample; for example, the percentage of a representative sample of adults who will vote for a particular political party or the average age of this representative sample. We can nd out these values since we can ask everyone in the sample, but if we took a different sample of people we might get a different value. known. The value of a statistic is not xed but is We estimate the value of the parameter by using
Statistic: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the sample; for example, the percentage of a representative sample of adults who will vote for a particular political party or the average age of this representative sample. We can nd out these values since we can ask everyone in the sample, but if we took a different sample of people we might get a different value. known. The value of a statistic is not xed but is We estimate the value of the parameter by using
Statistic: this represents some value (e.g. an average value or a percentage) that we are interested in calculating for the sample; for example, the percentage of a representative sample of adults who will vote for a particular political party or the average age of this representative sample. We can nd out these values since we can ask everyone in the sample, but if we took a different sample of people we might get a different value. known. The value of a statistic is not xed but is We estimate the value of the parameter by using
Example 1 A telesales company in Cork uses a device that dials residential phone numbers in the city at random. Of the rst 100 numbers dialed, 7 are unlisted. 10% of all Cork residential phones are unlisted. The 10% is a parameter (refers to whole population). The 7% is a statistic (refers to sample).
Example 1 A telesales company in Cork uses a device that dials residential phone numbers in the city at random. Of the rst 100 numbers dialed, 7 are unlisted. 10% of all Cork residential phones are unlisted. The 10% is a parameter (refers to whole population). The 7% is a statistic (refers to sample).
Example 1 A telesales company in Cork uses a device that dials residential phone numbers in the city at random. Of the rst 100 numbers dialed, 7 are unlisted. 10% of all Cork residential phones are unlisted. The 10% is a parameter (refers to whole population). The 7% is a statistic (refers to sample).
Example 1 A telesales company in Cork uses a device that dials residential phone numbers in the city at random. Of the rst 100 numbers dialed, 7 are unlisted. 10% of all Cork residential phones are unlisted. The 10% is a parameter (refers to whole population). The 7% is a statistic (refers to sample).
Example 1 A telesales company in Cork uses a device that dials residential phone numbers in the city at random. Of the rst 100 numbers dialed, 7 are unlisted. 10% of all Cork residential phones are unlisted. The 10% is a parameter (refers to whole population). The 7% is a statistic (refers to sample).
Example 2 A politician is interested in whether her constituents are in favour of the household charge. Her staff reports that letters on the issue have been received from 100 constituents and that 80 of these are opposed to the charge. Identify the population, variable measured and the sample. Population: all voters in her constituency. Variable measured: percentage of constituents opposed to the charge. Sample: the 100 constituents who wrote letters.
Example 2 A politician is interested in whether her constituents are in favour of the household charge. Her staff reports that letters on the issue have been received from 100 constituents and that 80 of these are opposed to the charge. Identify the population, variable measured and the sample. Population: all voters in her constituency. Variable measured: percentage of constituents opposed to the charge. Sample: the 100 constituents who wrote letters.
Example 2 A politician is interested in whether her constituents are in favour of the household charge. Her staff reports that letters on the issue have been received from 100 constituents and that 80 of these are opposed to the charge. Identify the population, variable measured and the sample. Population: all voters in her constituency. Variable measured: percentage of constituents opposed to the charge. Sample: the 100 constituents who wrote letters.
Example 2 A politician is interested in whether her constituents are in favour of the household charge. Her staff reports that letters on the issue have been received from 100 constituents and that 80 of these are opposed to the charge. Identify the population, variable measured and the sample. Population: all voters in her constituency. Variable measured: percentage of constituents opposed to the charge. Sample: the 100 constituents who wrote letters.
Example 2 A politician is interested in whether her constituents are in favour of the household charge. Her staff reports that letters on the issue have been received from 100 constituents and that 80 of these are opposed to the charge. Identify the population, variable measured and the sample. Population: all voters in her constituency. Variable measured: percentage of constituents opposed to the charge. Sample: the 100 constituents who wrote letters.
Example 2 A politician is interested in whether her constituents are in favour of the household charge. Her staff reports that letters on the issue have been received from 100 constituents and that 80 of these are opposed to the charge. Identify the population, variable measured and the sample. Population: all voters in her constituency. Variable measured: percentage of constituents opposed to the charge. Sample: the 100 constituents who wrote letters.
primarily by convenience e.g. volunteer panels for The advantage is relatively easy sample selection and data collection but it is impossible to evaluate the goodness of the sample in terms of its representativeness of the population.
We can also use an expert to pick the people he or she considers most representative of the population. This is called judgement sampling. The quality of the sample depends on the judgement of the person selecting it. Quota sampling is another method of sampling widely used in opinion polling and market research. Interviewers are each given a quota of subjects of a specied type to survey; for example, an interviewer might be told to nd 20 adult men and 20 adult women, 10 teenage girls and 10 teenage boys to interview about their television viewing.
We can also use an expert to pick the people he or she considers most representative of the population. This is called judgement sampling. The quality of the sample depends on the judgement of the person selecting it. Quota sampling is another method of sampling widely used in opinion polling and market research. Interviewers are each given a quota of subjects of a specied type to survey; for example, an interviewer might be told to nd 20 adult men and 20 adult women, 10 teenage girls and 10 teenage boys to interview about their television viewing.
We can also use an expert to pick the people he or she considers most representative of the population. This is called judgement sampling. The quality of the sample depends on the judgement of the person selecting it. Quota sampling is another method of sampling widely used in opinion polling and market research. Interviewers are each given a quota of subjects of a specied type to survey; for example, an interviewer might be told to nd 20 adult men and 20 adult women, 10 teenage girls and 10 teenage boys to interview about their television viewing.
We can also use an expert to pick the people he or she considers most representative of the population. This is called judgement sampling. The quality of the sample depends on the judgement of the person selecting it. Quota sampling is another method of sampling widely used in opinion polling and market research. Interviewers are each given a quota of subjects of a specied type to survey; for example, an interviewer might be told to nd 20 adult men and 20 adult women, 10 teenage girls and 10 teenage boys to interview about their television viewing.
It is clear that these methods are open to mistakes being made. Lets go back to the example of the pre-election poll on who is going to win the election. If I pick a group of people to answer this question simply because its convenient for me, my sample may include all people from a particular background and exclude everyone from a different background. Can I be certain that the conclusions for these people can be applied to the whole population?
It is clear that these methods are open to mistakes being made. Lets go back to the example of the pre-election poll on who is going to win the election. If I pick a group of people to answer this question simply because its convenient for me, my sample may include all people from a particular background and exclude everyone from a different background. Can I be certain that the conclusions for these people can be applied to the whole population?
It is clear that these methods are open to mistakes being made. Lets go back to the example of the pre-election poll on who is going to win the election. If I pick a group of people to answer this question simply because its convenient for me, my sample may include all people from a particular background and exclude everyone from a different background. Can I be certain that the conclusions for these people can be applied to the whole population?
It is clear that these methods are open to mistakes being made. Lets go back to the example of the pre-election poll on who is going to win the election. If I pick a group of people to answer this question simply because its convenient for me, my sample may include all people from a particular background and exclude everyone from a different background. Can I be certain that the conclusions for these people can be applied to the whole population?
Bias in Sampling
If there is a tendency for a certain group of the population to be omitted from the sample, or if people who refuse to co-operate from a group which is, in some way, different to the population, we have what is called a biased sample. We often use the word bias, but its denition in the context of sampling is a systematic tendency to overestimate or underestimate the population parameter of interest. For the election example, if a sample consists solely of people from an area with high unemployment, I may under-estimate the level of support for a particular political party.
Bias in Sampling
If there is a tendency for a certain group of the population to be omitted from the sample, or if people who refuse to co-operate from a group which is, in some way, different to the population, we have what is called a biased sample. We often use the word bias, but its denition in the context of sampling is a systematic tendency to overestimate or underestimate the population parameter of interest. For the election example, if a sample consists solely of people from an area with high unemployment, I may under-estimate the level of support for a particular political party.
Bias in Sampling
If there is a tendency for a certain group of the population to be omitted from the sample, or if people who refuse to co-operate from a group which is, in some way, different to the population, we have what is called a biased sample. We often use the word bias, but its denition in the context of sampling is a systematic tendency to overestimate or underestimate the population parameter of interest. For the election example, if a sample consists solely of people from an area with high unemployment, I may under-estimate the level of support for a particular political party.
How can we eliminate bias? Bias can be eliminated by taking a random sample. This is a sample where everyone in the population has the same chance of getting into the sample, and the fact that one individual has got into the sample does not affect the chances of another individual getting into the sample i.e. everyone has an independent and equal chance of being included in the sample. This method of sampling is called simple random sampling.
How can we eliminate bias? Bias can be eliminated by taking a random sample. This is a sample where everyone in the population has the same chance of getting into the sample, and the fact that one individual has got into the sample does not affect the chances of another individual getting into the sample i.e. everyone has an independent and equal chance of being included in the sample. This method of sampling is called simple random sampling.
How can we eliminate bias? Bias can be eliminated by taking a random sample. This is a sample where everyone in the population has the same chance of getting into the sample, and the fact that one individual has got into the sample does not affect the chances of another individual getting into the sample i.e. everyone has an independent and equal chance of being included in the sample. This method of sampling is called simple random sampling.
Many methods of analysis make the assumption that the sample from which the data are collected is a random sample. There are many different types of probability sampling. These include:
Many methods of analysis make the assumption that the sample from which the data are collected is a random sample. There are many different types of probability sampling. These include:
Stratied Random sampling: Stratied sampling techniques are generally used when the population is heterogeneous (dissimilar) and where certain homogeneous (similar) sub-populations can be isolated. These sub-populations are called strata. The population is divided into strata, such that each unit in the population belongs to one and only one stratum. The basis for forming the strata could be age, gender, industry type etc. A simple random sample is taken from each stratum.
Stratied Random sampling: Stratied sampling techniques are generally used when the population is heterogeneous (dissimilar) and where certain homogeneous (similar) sub-populations can be isolated. These sub-populations are called strata. The population is divided into strata, such that each unit in the population belongs to one and only one stratum. The basis for forming the strata could be age, gender, industry type etc. A simple random sample is taken from each stratum.
Stratied Random sampling: Stratied sampling techniques are generally used when the population is heterogeneous (dissimilar) and where certain homogeneous (similar) sub-populations can be isolated. These sub-populations are called strata. The population is divided into strata, such that each unit in the population belongs to one and only one stratum. The basis for forming the strata could be age, gender, industry type etc. A simple random sample is taken from each stratum.
Stratied Random sampling: Stratied sampling techniques are generally used when the population is heterogeneous (dissimilar) and where certain homogeneous (similar) sub-populations can be isolated. These sub-populations are called strata. The population is divided into strata, such that each unit in the population belongs to one and only one stratum. The basis for forming the strata could be age, gender, industry type etc. A simple random sample is taken from each stratum.
Cluster sampling: the population is divided into separate groups of units called clusters. one and only one cluster. Each unit belongs to All units A A simple random sample of
within each chosen cluster are included in the sample. cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a
researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland.
Cluster sampling: the population is divided into separate groups of units called clusters. Each unit belongs to All units A one and only one cluster. A simple random sample of clusters is selected from a list of all clusters. within each chosen cluster are included in the sample. cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland.
Cluster sampling: the population is divided into separate groups of units called clusters. Each unit belongs to one and only one cluster. A simple random sample of clusters is selected from a list of all clusters. All units within each chosen cluster are included in the sample. cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland. A
Cluster sampling: the population is divided into separate groups of units called clusters. Each unit belongs to one and only one cluster. A simple random sample of clusters is selected from a list of all clusters. All units within each chosen cluster are included in the sample. A cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland.
Cluster sampling: the population is divided into separate groups of units called clusters. Each unit belongs to one and only one cluster. A simple random sample of clusters is selected from a list of all clusters. All units within each chosen cluster are included in the sample. A cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland.
Cluster sampling: the population is divided into separate groups of units called clusters. Each unit belongs to one and only one cluster. A simple random sample of clusters is selected from a list of all clusters. All units within each chosen cluster are included in the sample. A cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland.
Cluster sampling: the population is divided into separate groups of units called clusters. Each unit belongs to one and only one cluster. A simple random sample of clusters is selected from a list of all clusters. All units within each chosen cluster are included in the sample. A cluster could be a housing estate or other well-dened area. Cluster sampling is typically used when a researcher cannot get a complete list of the members of a population they wish to study, but can get a complete list of groups or clusters of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, surveying people who live in all the different counties in Ireland.
Systematic sampling: For example, if a sample of size 50 is required from a population with 5000 units, we would include in the sample one unit for every 5000/50 = 100 units in the population. One of the rst 100 units of the population is selected at random from a list of all members of the population. Other sample units are found by starting with the rst unit and then selecting every 100th unit that follows in the population list. In effect, the sample of 50 is identied by moving systematically through the population and identifying every 100th unit after the rst randomly selected unit.
Systematic sampling: For example, if a sample of size 50 is required from a population with 5000 units, we would include in the sample one unit for every 5000/50 = 100 units in the population. One of the rst 100 units of the population is selected at random from a list of all members of the population. Other sample units are found by starting with the rst unit and then selecting every 100th unit that follows in the population list. In effect, the sample of 50 is identied by moving systematically through the population and identifying every 100th unit after the rst randomly selected unit.
Systematic sampling: For example, if a sample of size 50 is required from a population with 5000 units, we would include in the sample one unit for every 5000/50 = 100 units in the population. One of the rst 100 units of the population is selected at random from a list of all members of the population. Other sample units are found by starting with the rst unit and then selecting every 100th unit that follows in the population list. In effect, the sample of 50 is identied by moving systematically through the population and identifying every 100th unit after the rst randomly selected unit.
Systematic sampling: For example, if a sample of size 50 is required from a population with 5000 units, we would include in the sample one unit for every 5000/50 = 100 units in the population. One of the rst 100 units of the population is selected at random from a list of all members of the population. Other sample units are found by starting with the rst unit and then selecting every 100th unit that follows in the population list. In effect, the sample of 50 is identied by moving systematically through the population and identifying every 100th unit after the rst randomly selected unit.
A large random sample will include many people with lots of different characteristics whereas a small sample will not have the same range of people or characteristics. Thus, the results from one small sample may differ considerably from the results of another small sample depending on the range of people and characteristics it includes.
A large random sample will include many people with lots of different characteristics whereas a small sample will not have the same range of people or characteristics. Thus, the results from one small sample may differ considerably from the results of another small sample depending on the range of people and characteristics it includes.
Errors in sampling
Sampling errors (related to the act of selecting the sample) e.g.a biased sampling method; sampling frame which differs from the population. Non-sampling errors (not related to the act of selecting the sample) e.g. missing data, response errors, processing errors, wording of questions.
Errors in sampling
Sampling errors (related to the act of selecting the sample) e.g.a biased sampling method; sampling frame which differs from the population. Non-sampling errors (not related to the act of selecting the sample) e.g. missing data, response errors, processing errors, wording of questions.
Questionnaires
Questionnaires are often used to collect data. Some of the issues involved in using questionnaires are: How do you contact the individual - mail, telephone, interview? How is the issue of anonymity (i.e. subjects are anonymous even to the person carrying out the survey) versus condentiality (i.e. all data about individuals is known to the person carrying out the survey but kept condential) handled? What is the response rate?
Questionnaires
Questionnaires are often used to collect data. Some of the issues involved in using questionnaires are: How do you contact the individual - mail, telephone, interview? How is the issue of anonymity (i.e. subjects are anonymous even to the person carrying out the survey) versus condentiality (i.e. all data about individuals is known to the person carrying out the survey but kept condential) handled? What is the response rate?
Questionnaires
Questionnaires are often used to collect data. Some of the issues involved in using questionnaires are: How do you contact the individual - mail, telephone, interview? How is the issue of anonymity (i.e. subjects are anonymous even to the person carrying out the survey) versus condentiality (i.e. all data about individuals is known to the person carrying out the survey but kept condential) handled? What is the response rate?
Questionnaires
Questionnaires are often used to collect data. Some of the issues involved in using questionnaires are: How do you contact the individual - mail, telephone, interview? How is the issue of anonymity (i.e. subjects are anonymous even to the person carrying out the survey) versus condentiality (i.e. all data about individuals is known to the person carrying out the survey but kept condential) handled? What is the response rate?
Questionnaires
Questionnaires are often used to collect data. Some of the issues involved in using questionnaires are: How do you contact the individual - mail, telephone, interview? How is the issue of anonymity (i.e. subjects are anonymous even to the person carrying out the survey) versus condentiality (i.e. all data about individuals is known to the person carrying out the survey but kept condential) handled? What is the response rate?
If there is a high non-response rate, are there differences between those who do or do not respond i.e. volunteer bias? We should try to assess differences between responders and non-responders and try to get as high a response rate as possible e.g. use second or third mailings to those who dont respond to the rst. Are there missing data - was the questionnaire too long, complicated, misunderstood? Analysis of questionnaire data - how will the data be processed? How long will it take to input data?
If there is a high non-response rate, are there differences between those who do or do not respond i.e. volunteer bias? We should try to assess differences between responders and non-responders and try to get as high a response rate as possible e.g. use second or third mailings to those who dont respond to the rst. Are there missing data - was the questionnaire too long, complicated, misunderstood? Analysis of questionnaire data - how will the data be processed? How long will it take to input data?
If there is a high non-response rate, are there differences between those who do or do not respond i.e. volunteer bias? We should try to assess differences between responders and non-responders and try to get as high a response rate as possible e.g. use second or third mailings to those who dont respond to the rst. Are there missing data - was the questionnaire too long, complicated, misunderstood? Analysis of questionnaire data - how will the data be processed? How long will it take to input data?
If there is a high non-response rate, are there differences between those who do or do not respond i.e. volunteer bias? We should try to assess differences between responders and non-responders and try to get as high a response rate as possible e.g. use second or third mailings to those who dont respond to the rst. Are there missing data - was the questionnaire too long, complicated, misunderstood? Analysis of questionnaire data - how will the data be processed? How long will it take to input data?