You are on page 1of 8

What is statistics?

1. INTRODUCTION The presidency carries with it many duties, some more agreeable tan others. The presidential address is, perhaps, the one occasion when the incumbent can speak personally rather than in a representative capacity and I welcome that opportunity: not simply in these research rating days because it yields one more publication in a reputable journal but because there is little opportunity to stand back and to reflect on what statistics is all about. I therefore thank you most warmly for electing me to this office and hope that this address may be a modest part repayment for the honour that you have done me. Having declared that I intend to speak for no-one but myself I hope that you will forgive me I f I indulge from time to time in personal reminiscence. The need for brevity means that my remarks will have more in common with a cartoon drawing than with a full portrait. I have long admired the skills of the cartoonist, who with a few deft strokes can convey the essence of a situation, so perhaps that analogy sets a standard to aim for. In any event I can only hope to sketch in what I see to be the main features. Looking back over half a centurys presidential addresses I can find no occasion where question of my title has been addressed. There have been several near misses but I think that this is a particularly appropriate moment to raise the matter. After more than 40 years of separate existence the Institute of Statisticians and the Society have merged and I am glad to acknowledge the efforts of my two predecessors which, along with those of the then chairman of the Institute, steered those tedious but necessary negotiations to a successful conclusion. The imminent realization of a hope, which goes back at least to the presidential address of 1949, that the Society might be worthily housed, is also a cause for satisfaction. The formation of an International Committee, the initiatives of the Education Committee and the formation of several new Local Groups are further evidences of the vitality of our Society. With over 6000 Fellows and a sound financial position the future seems assured. These fundamental developments have, inevitably, turned the Societys attention inwards. Reviews of the way that we govern ourselves and the seemingly endless need to amend the Bye-laws are necessary but must not become an end in themselves. Nor must we be so concerned with professional status that we seem to be unwelcoming to those on the fringes of the fringes of the subject from whom we have so much to learn. Perhaps the consideration of what statistics actually is will prove an antidote to selfabsorption and set our current organizational concerns in context.

After a survey of some definitions and views on the significance of statistics I shall suggest a fourfold division of the territory. In conclusion I shall outline some implications for the role of the Society.

2. SOME DEFINITIONS The question of my title is simple enough but there is no short answer. At the beginning of the presentation of the joint paper earlier this session by Grenander and Miller (1994), Michael Miller anticipated this address by asking What is statistics?. Their answer is given in the opening sentences of that paper. The object of statistics is information. The objective of statistics is the understanding of information contained in data In my first undergraduate lecture in statistics, Egon Pearson defined statistics as The study of the collective characters of populations At about the same date Sir Maurice Kendall, in his inaugural lecture at the London School of Economics, spoke of it as (Kendall, 1950) ..The science of collectives and group properties One of my favorite definitions comes from a psychologistS.S Stevens (of levels of measurement fame) who referred to statistics as (Stevens (1968), p854) A straight forward discipline designed to amplify the power of common sense in the discernment of order amid complexity. Perhaps he was unconsciously echoing Laplaces dictum that, at bottom, the calculus or probability was but common sense reduced to calculation. None of these, however, quite captures those two distinctive elements which came together so fruitfully about a century ago and are now so closely intertwined that to separate them would take the heart out of our subject. Let me explain what they are and why they are so central to my understanding of what statistics is. I had the good fortune to be taught mathematics at school by a Fellow of this Society. With W. F. L. Dick, B. C. Brookes had written a deservedly popular book called Introduction to Statistical Method (Brookes and Dick, 1951). I, and several other sixth formers, spent one summer checking the answers to the exercises. The acknowledgement of that contribution in the preface may not count for much in todays research assessment terms but it is,

at least, a personal landmark. Presumably, I showed more than average interest in statistics and so, as a further stimulus, I was given two things to look at. One was the Annual Abstract of Statistics; the other was a copy of a current issue of the Supplement to the Journal of the Royal Statistical Society-the precursor of Series B. The first I found fascinatingand still do. The second was totally baffling, thought looking back I notice that it was full of contributions from household names among us who had cut their statistical teeth during the war. I can only record that this experience did not deter me and that the two strands represented by those publications still them to me to constitute the essentials of statistics. There can be no statistics without data and no statistics with data alone. Being a critical realist I believe that there is a real world out there and that the measurements we make on things in it are capable of telling us something about its true nature if only indirectly. Theorizing without data will get us nowhere. The fascination of statistical tables lies not in the numbers, of course, but in their pattern. (I am aware that fascination with tables is said to be a mark of an obsessive personalitybut I can live with that!) Pattern brings us into the realm of mathematics and that was the only thing that I learnt from the Supplement. Mathematics, I deduced, was a tool for reasoning about the order which the tables revealed. This lifted the subject, so it seemed to me, above mere arithmetic and accounting (with a small a of course). The mathematical element centred on the theory of probability while the early social statisticians, the biometric school of Galton and Pearson and the world of official statistics, represented the concern with data. The two came together in modern statistics. Variability and uncertainty are two sides of the same coin and, together, are the hallmark of a statistical problem. It is this which gives statistics right of entry to almost every sphere of human Endeavour. In his presidential address to the Society in 1952 Sir Ronald Fishier, quoting an earlier address he had given to the International Biometric Society, said (Fisher, 1953) I ventured to suggest that Statistical Science was the peculiar aspect of human progress which gives to the twentieth century its special character; end indeed members of my present audience will know from their own personal and professional experience that it is to the statistician that the present age turns for what is most essential in all its more important activities. Or, if you want it from someone less suspect of professional self-interest (Porter (1986), p.3) Not since the invention of calculus, if ever, has a new field of mathematics found so extensive a field of application. Or more succinctly, and going back to 1877 (E.J. Goncourt, quoted on p.3 of Porter (1986)),

Statistics is the premier inexact science.

3. A DISMAL SCIENCE? But not everyone sees our subject in such exalted terms and we do well to pause and reflect on the poor image which statistics has in many quarters. Things have not changed very much in that respect since Major Greenwood, seconding the vote of thanks to David Herons presidential address in 1947, said (Greenwood, (1948)) If the President of the Royal Statistical Society were to announce an address on the growing menace of communism or the increasing danger of fascismwe should have two rows of reporters herenot a single reporter from Fleet Street has thought it necessary to interrupt his tea. We do not interest, we are not news. It is not only Fleet Street that disregards us, that would be bearable, but also some of our colleagues in other disciplines. The late Professor Gordon Rupp, speaking as an ecclesiastical historian wrote (Rupp, quoted on p. 121 of Vickers (1993)) I am sure we must welcome all the help we can get from sociologistsI am sure too, they need always to be checked by the historian, lest they take to vain prophesying, at which point sociology would cease to be a science and would degenerate into mythology undergirded by statistics. He might not have taken much comfort from William Ogburn who, in 1930, proudly predicted that all sociologists would one day be statisticians (on p.497 of Outhwaite and Bottomore (1993)). If sociology is near the bottom of the scientific pile where does all that leave statistics? But much nearer home, there is a worrying tendency for our own kind to feel uneasy about being identified as a statistician. On the front page of RSS News, Agent Provocateur (1993) asked why do so many people, who suppose that do the very things that we do, prefer not to call themselves statisticians but operational researchers, software engineers, QA experts, forecasters or astrologers? Why indeed? To the long familiar econometrics and biometrics we must now add infometrics, environmetrics, chemometrics, scientometrics, stylometrics and doubtless many more. The vogue for distinguishing data analysis as a separate activity seems be seeking to avoid guilt by attaching qualifiers to their titles. The frantic search for degree titles in some universities which soften or obscure the hard edges of unvarnished statistics is symptomatic of the same unease.

Why is it, if our subject is so central to the intellectual enterprise, that we are so bashful in displaying our credentials? This is much debated among us and I merely underline a few important elements of the answer. One element we share with the whole scientific community. This is partly a cultural matter. It is strange how, in a society whose prosperity and opportunities to enjoy the full range of the arts depend on science, there is so little interest and curiosity about how the world actually works. I am constantly surprised at the ignorance which otherwise intelligent people show- and often proudly- about basic physical principles. The unscrupulous trade on this ignorance and life is impoverished by it. Perhaps Richard Gregory was not far from the mark when he said (Rassam, 1993). Doing science is very different from the arts. Science is difficult. You need mathematics and statistics, which is dull like learning a languageThe arts intrinsically appeal to the human soul, but a lot of science doesnt and a lot of science is incredibly boring too. But we labour under a double handicap. Science, at least, appears to deal in certainties and that is what people want. To some, indeed, it appears to be the only source of certainty and is exalted to the status of a religion though, like many others, it is not actually practiced in most cases! Statisticians, in contrast, never seem to be certain about anything. It can be claimed, with justice, that few of the great scientific discoveries of physical science required statistics. Statistics only comes into its own as we move into the shadowy world where information is incomplete, reproducibility is hard to come by and where conclusions must be hedged about by qualifications. In a world conditioned to expect truth to be packaged in simple certainties we have an uphill task. It is not only certainty that public craves but also simplicity. Whether or not the world is essentially simple there are clearly people unwilling to accept it on any other terms. In its extreme form this aversion to complexity forces everything into one of two categories- black or white, good or bad or, following W.S. Gilbert, liberal or conservative. This simple conception is built into the adversarial character of English law and parliamentary government in the British style but this tendency to polarize is quite foreign to the statistical temperament. When he was Home Secretary, Kenneth Clarke was interviewed (on the radio 4 today programme on May 4th, 1993) about the idea that this was too complicated. It might be difficult to administer but if such a simple formula is too complicated it speaks volumes about the gulf that we have to bridge. Public ignorance goes much deeper. Even among the chattering classes there sometimes seems little understanding of what actually constitutes scientific knowledge. A public lecture was given at the London School of Economics earlier this year by a person of uncertain academic provenance on a subject

which the media avidly report. One of the broadsheet newspapers contained the following remark about her work (Campbell, 1994). Although many of its statistics will be repudiated in the usual row about how representative her sample is, many of its findings are, again, consistent with what we know. This failure of understanding leads naturally to the belief that every event has a simple direct cause which scientists can discover if only the government will provide enough money. The demand to know whether excess numbers of leukaemia cases in the vicinity of nuclear plants implies a causal link illustrates the point. To some extent the fault lies in ourselves. Statisticians can be very dismissive. Technical terms like inadmissible, incoherent and inefficient are often understood by our potential clients in their everyday senses so it is hardly surprising that they find us forbidding and look elsewhere for help. But there is another side to the story. The gambling instinct is deeply embedded in the human psyche. Much of the excitement of life is derived from uncertainty. From games of change to investment plans we are constantly engaged in choice and chance. Surely anyone with an ounce of curiosity should want to understand what lies behind this kaleidoscope of daily experience. Perhaps we are too unimaginative in motivating our educational efforts but let us leave the matter there and explore the question What is statistics? In greater depth. First let us pause for breath and summarize the story so far. Statistics is concerned with understanding the real world through the information that we derive from classification and measurement. Its distinctive characteristic is that it deals with variability and uncertainty which is everywhere. This gives it is fundamental role and is the reason why its tentacles reach into every corner of the scientific enterprise. And yet this role is scarcely recognized and sometimes denied. What can the statistician or society do to give our subject its rightful place? I suggest, again, that the answer lies partly, perhaps mainly, in ourselves. Our view of statistics is too small. Let us seek to broaden it.

4. POPULATIONS Statisticians recognize two types of error. I want to identify four types of statistics and to suggest that no view of statistics which excludes any one of them is complete. My categorization is no doubt idiosyncratic; the categories overlap and are certainly not exhaustive but the intention must be clear. It is to

resist the fragmenting tendency that is evident in some quarters and to insist on that breadth and unity which the society, at its best, has always stood for. Statistics of type 1 is what one might call, harking back to earlier remarks, the

Annual Abstract kind of statistics .Its main thrust is the collection and
presentation of numerical data in a manner calculated to reveal their salient features. It is the statistics of censuses and of official publications and it was the belief that this was important which led to the foundation of the Society. Going further back it was closely associated with the origin of the term statistics itself. But type 1 statistics has a broader base. The early volumes of Biometrika show that the tabulation and graphical representation of large amounts of data was at the heart of biometry. The early social investigators like Florence Nightingale and Charles Booth were in on doubt of the necessity of good data and plenty of it. It might be described as the statistics of populations or of samples so large that the difference between sample and population can be disregarded. This type of statistics is currently undergoing a revival. Methods of automatic data logging and large scale social surveys with many variables have created a demand for imaginative ways of displaying and summarizing very large data sets and good use has been made of colour and the other visual aids which modern computing equipment offers. Type 1 statistics is not just another name for the data strand of the subject. The form of presentation used and the interpretation of it are intimately linked to modeling. If this causes surprise let me remind you of Keyness remark to the effect that those who prided themselves on being practical were usually the unwitting slaves of some long dead economist. We all depend on models to interpret our everyday experiences. We interpret what we see in terms of mental models constructed on past experience and education. They are the constructs that we use to understand the pattern of our experiences. The terminology may be unfamiliar but the practice is deeply ingrained. Professional statisticians bring to the data a mental frame of reference which is informed by the mathematical models which are part of their thought world. Just as the consultant physician brings a wealth of knowledge about how the human body works to the interpretation of the figures dutifully plotted on charts at the foot of the hospital bed so does the statistician. I have been involved over a long period with manpower modeling in both its practical and its theoretical aspects. Practice has led the way but theory has blossomed in its wake and has gone well beyond the point of direct practical application. But that need not mean that it is useless; far

from it. It gives insight into the way that highly complicated stochastic systems behave. The researcher who has been immersed in this will have a richer conceptual framework in which to interpret what the data say. In my first post with the National Coal Board I was given a job which nicely illustrates the point. In those days hydraulic roof support systems were coming into use. They were effective but expensive and great interest centred on their reliability. Dates of installation, dates of removal for repair and, when the person involved could be bothered, the reason for failure had been recorded for a very large number of items. My task was to see what could be learnt from all these data. The essence of the analysis, unaided by computers, was to construct dozens of frequency distributions and to draw their histograms. I then looked at them. Two things were rather striking. The distribution of the time to repair changed in shape as the number of repairs increased and the mean time to repair at first decreased and then stabilized. Having had a theoretical course in stochastic processes I recognized that I was dealing with the pooled output of several renewal processes, each arising from different source of failure. The time to first failure would be some kind of extreme value distribution but, as the system aged, a steady state would be reached in which the interfailure time would have, approximately, an exponential distribution with constant mean. The theoretical contribution was not so much to the analysis itself but to the choice of the form of the informal analysis and the interpretation of what it showed. Many students who have had a good grounding in statistical theory are disappointed and sometimes disillusioned when they are thrust into a type 1 statistical job. They should not be. The effective collection and presentation of data for management, government or public consumption needs to be done by people who have a deep understanding of the processes which have generated the data. Since almost all such processes are stochastic, the value of a theoretical grounding in probability is obvious. It is the interplay of the data and the theory, however informal, which gives statistics of type 1 an honoured place in our subject field.

You might also like