Statistics is about collecting data, then analysing the data, and then interpreting the results, to find out about real-world phenomena. There is always variability in the data: we need to extract meaningful patterns, so that we can make definite decisions. Here are some areas of application, with some typical problems.
Statistics is about collecting data, then analysing the data, and then interpreting the results, to find out about real-world phenomena. There is always variability in the data: we need to extract meaningful patterns, so that we can make definite decisions. Here are some areas of application, with some typical problems.
Statistics is about collecting data, then analysing the data, and then interpreting the results, to find out about real-world phenomena. There is always variability in the data: we need to extract meaningful patterns, so that we can make definite decisions. Here are some areas of application, with some typical problems.
What is Statistics about? We collect data, then analyse the data, and then interpret the results, to nd out about real-world phenomena. There is always variability in the data: we need to extract meaningful patterns. Because of the variability, our conclusions cannot be certain. We need to quantify the uncertainty, so that we can make denite decisions and judge how likely it is that we are right. Here are some areas of application, with some typical problems. Agriculture Which varieties of wheat make the best bread? Manufacturing Can we make a cheaper detergent that is just as effective as the current one? Health Does an aspirin a day protect against stroke? If so, are there any side-effects? Education What is the best way to teach young children mental arith- metic? Biology How does biodiversity affect the enviroment? Social science Do people live in better houses than they did 20 years ago? Economics How is the credit crunch affecting food prices? Market research What sort of advertising campaign is most effective? Environmental studies Are people who live near mobile-phone masts more likely to get cancer? Meteorology Is global warming a reality? Psychology Are shyness and loneliness related? Before starting any of these investigations, we need to stop and ask: What do we want to investigate? What should we measure? How should we measure it? 1 Populations and Samples When we carry out a statistical investigation we want to nd out about a population. Denition A population is the collection of items under discussion. It may be nite or innite; it may be real or hypothetical. Sometimes although we have a target population in mind the study population we can actually nd out information about may be different. We are interested in measuring one or more variables for the members of the population but to record observations for everyone would be costly. The government carries out such a census of the population every ten years but also carries out regular surveys based on samples of a few thousand. Denition A sample is a subset of a population. The sample should be chosen to be representative of the population because we usually want to draw conclusions or inferences about the population based on the sample. Samples will vary and the question of whether the data in the sample is compatible with hypotheses we may have about the population will be considered both in this course and MTH5122 Statistical Methods. For each member of the sample we will measure one (or more) random variable. We usually assume something about the distribution of the random variables. For example, if our data were counts of radioactive particles from a sample of radioactive sources manufactured to have the same mean it would be reasonable to assume that X i Poisson(). This assumption is called a model, and is called a parameter. We will not concern ourself much with the mechanics of how the sample is chosen, but the following examples give you some idea of the sorts of problems: (a) A city engineer wants to estimate the average weekly water consumption for single-family dwellings in the city. The population is single-family dwellings in the city. The variable we want to measure is water consumption. To collect a sample if the dwellings have water meters it might be best to get lists of dwellings and annual usage directly from the water company. If not then the local authority should have lists of addresses which can be sampled from. Note we should collect data through the year as water consumption will be seasonal. Note also that if there is no water meter measuring how much water the household uses may be problematical. 2 (b) A political scientist wants to determine if a majority of voters favour an elected House of Lords. The population is voters in the UK. Electoral rolls provide a list of those eligible to vote. What we want to measure is their opinion on this issue using a neutral question. (It would be easy to bias the response by asking a leading question.) We could choose a sample using the electoral roll and then ask the question by post, on the telephone or face to face but all these methods have problems of non-response and/or cost. (c) A medical scientist wants to estimate the average length of time until the recur- rence of a certain disease. The population is people who are suffering from this disease or have done in the past. What we want to measure are the dates of the last bout of disease and the new bout of disease. We could take a sample of patients suffering the disease now and follow them until they have another bout. This may be too slow if the disease doesnt recur often. Alternatively we could use medical records of people who suffered the disease in one or more hospitals but records can be wrong and there may be biases introduced. (d) An electrical engineer wants to determine if the average length of life of transis- tors of a certain type is greater than 5000 hours. The population is transistors of this type. We want to record the length of time to failure by putting a sample of transistors on test and recording when they fail. Note that for such experiments where the items under test are very reliable it may be necessary to use an accelerated test where we subject the items to higher currents than usual although this might introduce biases. In other parts of the course we may not emphasize the underlying population or exactly how we collect a sample but remember these questions have had to be consid- ered. Three methods of collecting data 1) Take a sample from a population. Do we ask questions (in which case, how do we word the questionnaire?), or take objective measurements, such as blood pressure? This is called a survey. If the sample is the whole population, it is called a census. 3 2) Design an experiment. This means that we apply different treatments to different experimental units, and then measure something to see if there is a difference between the treat- ments. How do we choose the treatments? How do we choose the experimental units? How do we decide who or what is given which treatment? (See MTH6116 Design of Experiments.) 3) If it is impractical or unethical to impose our choice of treatments, we may do an observational study. We might compare the effect of things that people can change themselves (for example, diet, or whether they go to the gym), or things that they cannot change (such as height, or place of birth). Some practical examples 1. The BBC wants to know how many people watch each of its programmes. Population = all people in UK Sample = panel of people, chosen to be representative. Each member of the panel keeps a diary recording all their TV viewing for a week, then sends it to the BBC. The BBC uses this data to estimate the total number of people who watched each programme. This is a survey. 2. Health researchers want to know which lifestyle factors affect the chances of getting various diseases. The UK BioBank has recently recruited 500,000 vol- unteers. The UK Biobank people collect some information now (for example, Do you drink full-cream milk?), then follow the persons medical records until they die, recording which diseases they get. They will then be able to test hypotheses such as If you cycle daily, you are less likely to have a stroke. This is an observational study. 3. A marine engineer wants to know if a new sort of paint protects pier supports from corrosion. He paints ten metal beams, and leaves a further ten beams unpainted (why?). He puts all the beams in a tank of sea water for three months, then he measures the amount of corrosion in each beam. This is an experiment. 4