You are on page 1of 246
B. Com. (Hons.) 1 Year Commerce Paper IV : BUSINESS STATISTICS Lesson: 1-12 SCHOOL OF OPEN LEARNING (Campus of Open Learning) University of Delhi Department of Commerce Editor : Dr. K.L. Dahiya pus Joo fom Graduate Course Paper IV : Business Statistics Lesson Lesson Lesson Lesson Lesson Lesson Lesson Lesson Lesson Lesson Lesson Lesson Prepared by: oa198 10 _Arneory of Probability 11 brobabilty Distributions 12 Statistical Decision Theory Dr. K.L. Dahiya CONTENTS. Construction of Frequency Distribution and Graphical Presentation Measures of Contral Tendency Measures of Dispersion Measures of Skewness and Kurtosis , sicarin » Gugt Simple Correlation nose Regression Analysis Index Numbers Analysis of Time Series SCHOOL OF OPEN LEARNING UNIVERSITY OF DELHI 5, Cavalry Lane, Delhi 110007 Academic Session 2012-2013 (52:00 copies) © School of Open Learning Cavalry Lane, Dethi-110 007 Published by : Executive Director, School of Open Learnin, Printed at : Nutan Printers, F-89/12, Okhla Industrial Area, Phase-1, New Delhi-110020. LESSON 1 CONSTRUCTION OF FREQUENCY DISTRIBUTION AND GRAPHICAL PRESENTATION What frequency distribution : Collected and classified date are presented in a form of frequency distribution. Frequency distribution is simply a table in which the data are grouped into classes on the basis of common characteristics and the number of cases which fall in each class are recorded. It shows the frequency of occurrence of different values of a single variable. A frequency distribution is constructed for satisfying three objectives of data () to facilitate the analy: Gi). tocestimate frequencies ofthe unknown population distribution from the distribution of sample data and (ii) to facilitate the computation of various statistical measures. ‘Normally, frequency distribution can be of two types 1, Univariate Frequeney Distribution, 2. Bivariate Frequency Distribution. In this lesson, we shall discuss the Univariate frequency distribution. Univariate distribution incorporates different values of one variable only whereas the Bivariate frequency distribution incorporates the values of two variables only. The Univariate frequency distribution is classified further into three categories : (Series of Individual observations i) Discrete frequency distribution, and (ii) Continuous frequency distribution. Series of individual observations, i a simple listing of items of each observation. If marks of 20 students in statistics of a class are given individually, it will frra a series of Individual observations. Marks obtained in Statistics RollNos.1 203 4 5 6 7 8 9 10 N 12 13 14 1S 16 17 1B 19 2 Marks: 6 71 8) 41 94 33 81 41 78 66 8 35 61 S55 98 52 50 1 30 88 Marks in Ascending Order Marks in Descending Order 20 98, 3 4 35 91 4 88 4a 85 50 81 2 80 55 B wo 1 a 6 6 a n 0 B 55 80 2 at 50 85 4 8 a 91 35 4 3 8 30 Diserete frequency Distribution : In a discrete seek, the data are presented in such a way that exact easurments of units are indicated. Ina discrete frequency “Trrbation, we count the number of times each value of measirgbiein data given to you. Ths is facilitated trough the technique of tally Bars Tn the first column, we write all values ofthe variable. In the second column, a vertical bar called tally bar, agains the variable, we write a particular value has ne gurred four times, forthe fifth ovcurrence, We Put 8 Ch tally seer Con the four tally bas to make a block of 5. The Techn que of putting oss tally bars at every fh reper Frais the counting of the number of occurrences of the fer patting tally bars fora the values in the dala; ‘econ the numberof times each value isrepeated and Writ © gains the corresponding value ofthe variable in the third column entitled frequency. This type of representalion the data is called discrete frequency distribution. We are given marks of 50 students PP ee ee 4% 4% 3 40 2% 40 40. Al. 43 208.7105 tsp ibeaeT earGD AEN 199. 50040 eS ama RIDES 9 a> es” ~7etae PSS gaara 9 51 ogee ghaniaiaa6) a 15 Wo can construct a dserete frequency distribution from the above B1vs) ‘marks. Marks of 50 Students Marks Tally Bars ‘Frequency 1 6 5 33 4 39 2 40 5 a 2 2 1 4B 3 4s 46 a 1 48 50 1 2 1 st i} 2 3 i 3 55 3 iN 37 1 1 59 ; z } a I 1 8 I 1 65 1 1 0 I 1 8 i 1 Total 50 The pcsentaton ofthe data inthe form ofa dsereterequency distribution s better than aranging does notcondene the data as needed andis quite dificult to grasp and comprehend, This distribution is quite simple incase the values of the variable are repeated otherwise there will be hardly any condensation. Continuous frequency Distribution :f the identity ofthe units about a particular information is collected, is notre nate he orter in which the observations occur, then the frststepof condensation isto classify the Sale aor antelassesby dividing the entire group of valus ofthe variable into a suitable number of groups aud then sae ding the numberof observations in each group. Thus, if we divide the total ange of values of the variable (marks eesO students) i.e. 78-1563 into groups of 10 ach, then we shal get (63/10) 6 groups and the distibution cof marks is displayed by the following frequency distribution Marks of 50 students Marks @) Tally Bars ‘Number of Students (/) 15—25 W 3 334 AL ° 35—45 AWA B 4555 WOAH a B 5565 wii 9 65—75 i 2 15-85 \ 1 Total 30 “The various groups into which the values ofthe variable are classified are known as classes, the Tength of the 3 class interval (10) is called the width of magnitude of the class. Two values, specifying the class, are called the class limits. The presentation of the data into continuous classes with the corresponding frequencies is known as continuous frequency distribution. There are two methods of classifying the data according to class intervals (exclusive method and i) inclusive method Ina exclusive method, the class intervals are fixed in such a manner that upper limit of one class becomes the ower limit of the following class. Moreover, an item equal to the upper limit of a class would be excluded from that class and included subsequently in the next class. The following data are classified on this basis. Tncome No. of Persons (Rs) 200-250 50 250-300 100 300—350 0 350—400 130 400450 50 450300 100 Toal 500 It is clear from the example that the exclusive method ensures continuity of the data in as much as the upper limit of one class is the lower limit ofthe next class. Therefore, 50 persons have their incomes between 200 10 249.99 and a person whose income is 250 shall be included in the next class of 250—300. According to the inclusive method, an item equal to upper limit of a class is included in that class itself: The following table demonstrates this method. Tacome No. of Persons (Rs.) 2002 0 250—2 100 70 130 0 450—499 100 Toal 300 Hence in the elass 200-249, we include persons whose income is between Rs. 200 and Rs. 249. Principles for Constructing frequeney Distributions : Inspite of the great importance of classification in statistical analysis, no hard and fast rules be laid down for it. A statistician uses his discretion for classifying a frequency distribution and sound experience, wisdom, skill and aptness for an appropriate classification of the data. However, the following guidelines must be considered to construct a frequency distribution 1. Types ofelases : The lasses should be clearly defined and should not lead to any ambiguity. They should be ‘exhaustive and mutually exclusive so that any value of variable corresponds to only class. 2, Number ofelasses ; The choice about the number of clases into which a given frequency distribution should be divided depends upon these things; The total frequency which means the total numberof observations inthe distribution. (i) The nature ofthe data which means the size or magnitude of the values of the variable (ii) The desired accuracy. (iv) The convenience regarding computation ofthe various descriptive measures ofthe frequency distribution such as means, variance et. -The number of classes should neither be too small nor too large. In case the classes are few, the classification becomes very broad and rough which might obscure some important features and characteristics ofthe data. The scouraey of the results decreases as the number of clases becomes smaller. On the other hand, foo many classes weil resalt in very few frequencies in each class. This will give an irregular pattern of frequencies in different classes thus makes the frequency distribution irregular. Moreover a large numberof classes will render the distribution ‘o9 unwieldy to handle. The computational work for further processing ofthe data wil become quite tedious and ting consuming without any proportionate gui inthe accuracy ofthe results. Hence balance should be maintained between the loss of information in the fist case and irregularity of frequency distribution inthe second case, to arsive ata pleasing compromise giving the optimum numberof classes. Normally, the number of classes should not be less them $ and more than 20. Prof, Sturges has given a formula k= 143322 log \Where krefer to the numberof classes and nis the total frequency or number of observations. The value of kis rounded tothe next higher interger : Ifn= 100 ke +3.322 log 100 + 6.644 =8 If n= 10,000 k =143.22 log 10,000 = 14+ 13.288= 14 However, this rule should be applied only when the number of observations are not very small. Moreover, the number or class intervals should be such that they give uniform and unimodal distribution which means that te frequencies in the given classes increase and decrease steadily and there are no sudden jumps The number of classes should be an integer preferably 5 or some multiples of 5, 10, 15, 20, 25 tc. which are quite convenient for numerical computations. 3. Size of class Intervals : Because the sizeof the class interval is inversely proportional othe numberof classes ina given distribution, the choice about the sizeof the clas interval will also depend upon the sound sbjestive jagment of the satistician. An approximate vale ofthe magnitude ofthe class interval say canbe caleulated ‘with the help of Sturge’s Rule ieee 173.22 logan ‘Where i stands for class magnitude or interval, Range is calculated by taking the difference between the largest and smallest value of the distribution, and » refers to total number of observations. 400, Largest item = 1300 and Smallest item = 340, If we are given the following information ; 1300-340 960 i= Tg iog400 | 143222 x 2.6021 9, 644 then, = 99,54 (100 approx.) ‘Another rule of thumb for determining the sizeof the clas interval is thatthe length ofthe class interval spould not be greater than -}th ofthe estimated population standard deviation. If 6 i the estimate of population standard deviation then the length of clas interval is given by : 1 = 6/4 “The size of lass intervals shouldbe taken as Sor multiples of, 10,15 or 20 for easy computations of various statistical measures ofthe frequency distribution, clas intervals should beso fixed that each class bass comvimiert sei pnt around which all he observations in that elas cluster t means thatthe enite fequeney Fhe css ae athe nid vale ofthe class. This assumption willbe true only ifthe frequencies ofthe different lasses sane ray distributed in the respective cass intervals. is always desirable to take the class interval of equal uniform magnitude throughout the frequency distribution. «Class Boundaries: Irn a grouped frequency distribution there are gups between the upper limit of any class cae rouer Tin ofthe succeeding class (as in case of inclusive typeof classification), thee isa need conver, aoe oe continuous distribution by appyinga correction factor for continuity fr determining new classes are ease tipe The lower and upper class limits of new exclusive type clases are called class boundaries itdisthe gap between the upper limit of any class and lower limit of succeeding class the class boundaries for any class are given by Upper class boundary = Upper class limit +5 d ) Lower class boundary = Lower elass limit ~ 5d J 4/2 is called the correction factor. Letus consider the following example to understand Class Boundaries (20-05, 24+ 05)ie., 19.5245 (2505,29+0.5)ie.,245—295 (300.5, 34+0.5) ie, 295—34.5 (505,39 +0.5) ie, 34.5—39.5 (40-05, 44 +05) ic. 39.5445 Coatiee ee correction factor = 5 =5—=9 “Mid.value o eass Mark The mi value or the lass marks the value ofthe variable which is exactly tthe eile of the class. The mid-value of any clas is obtained on dividing the sum ofthe upper and lower class limits by 2 =05 Mid value of class Th throughout th of the class. 4 [Lower class limit + Upper class limit} lass limits should be selected in such a manner thatthe observations in any class are evenly distributed erate yal o thatthe actual average ofthe observation in any class is very close tothe mid-value 6 Open End Classes: The classification is termed as open end classification ifthe lower ni of the first class or the upper limatof the lst class or both are nt specified and such classes in which one ofthe Himis ‘missingare 6 called open end classes. For example, the classes like the marks less than 20 or age below 60 years, As far as possible open end classes should be avoided because in such classes the mid-value cannot be accuracy Potained. But if the open end classes are inevitable then itis customary to estimate the class mark or mid-value forthe first class with reference to the succeeding class. In other words, we assume that the magnitude of the first class is same as that of the second class, Example : Construct a frequeney distribution from the following data by inclusive method taking 4 asthe class interval 10 ” esr] 0" 16 19) RewdApor®. 29, 18 25 Buen 4 i Big Ts 720. 2 15 18? cake 36 18 15 2 ors 38 4 B 10 eee = 220) 2 2» Ao. a0 a Solution : Because the minimum value ofthe variable is 10 which i a very convenient figure for taking the lower Traut of the first class and the magnitude of the class interval is given to be 4, the classes for preparing frequency distribution by the Inclusive Method will be 10—13, 14—17, 18—21, 2225, mwen-38—AL Frequency Distribution Class Interval Tally Bars Frequency () 1 w 3 M17 wil 8 1821 sl 8 2-25 val 7 26-29 aH 5 30-33 i 4 “37 i! 2 3841 \ 1 Example : Prepare a statistical table from the following. Weekly wages (Rs.) of 100 workers of Factory A 6 Oe, iis icury seme pr, occ ga rua. hin: tae as Seite apg ae gs UT gO By UP TG HGS BAO I Ao £09 ibpoatemia atcrpen 2026 24) 00 29-4o) «(A008 ai oleRGuar arabe BA nL TAD {Otis oh AGanaiBONG 2: 129.2120 40t-00 2AOMRib 26D baie z2SiP= py(M6i YonlnA 106:itiow 9 ahaa ieeio 20 sen) MD yee AOS ges Mls atx Dod 90 Ayers LO orresee 36 aime¥SIEt ahsilioirroq AO opr IDGrin tial ai gent PA oat Yo SLi» ach Bh Ble cater Misono Baie-a i huioorreld Sosy Miens sinc Mem Fe ional. meee anew mene Meme, Meee Mere ley ge S282 AY eek aa Sotntions The lowest value is 23 and the highest 106. The difference in the lowest and highest value is 83, If we take se lase interval of 10, nine elasses would be made. The first class should be taken as 20—30 instead of 23—33 as per the guidelines of classification. Frequency Distribution of the Wages of 100 Workers Wages (Rs.) Tally Bars Frequency ()) 20—30 We Ut B 30—40 ww vt u 40—S0 vm WA URI th 18 50-60 wt RI 0 10 wid 6 70-80 ut 20-90 we at 4 90100 MA Ua 2 100—110 WL 0 Total 100 Graphs of Frequency Distributions ‘The guiding principles forthe graphic representation of the frequency distributions are precisely the same 88 for the diagrammatic and graphic representation of ther types of data. The information contained ina frequency dition ae pe ehown in graphs which reveals the important characteristics and relationships that are not easily discemiic 8 cimple examination ofthe frequency tables. The most commonly used graphs for charting a frequency’ distribution for the general understanding of the details ofthe data are : 1, Histogram 2. Frequeney polygon 3. Smoothed frequency curves 4. Ogives or cumulative frequency curves. 1, Histogram “The term histogram’ must not be confused with the term ‘historigram’ which elates to time charts. Histogram isthe best way of presenting graphically a simple frequency distribution. The statistical meaning of histogram shat itis a graph that represents the class frequencies ina frequency distribution by vertical adjacent rectangles; While constructing histogram the variable is always taken on the X-axis andthe corresponding frequencies On the Y-axis. Each class is then represented by a distance on the scale that is proportional to its clss-interval. The crazance for each rectangle on the X-axis shal remain the same in case the class-ntervals are uniform throughout if they are diferent the wid ofthe rectangles shal also change proportionately. The Y-axisrepresens ihe frequencies vrosch clase which constitute the height ofits rectangle. We get a series of rectangles each having a class interval distance as its width and the frequency distance as its height. The area ofthe histogram represents the total frequency ‘Thehistogram shouldbe cleariy distinguished froma bar diagram. A bar diagram s one-dimensional, only the length of the bar is important and not the width, a histogram is two-dimensional, that i, in 8 histogram both the Tength andthe widih are important. However, histogram can be misleading if the distribution has unequal class- intervals and suitable adjustments in frequencies are not made. The technique of constructing histogram is explained for () distributions having equal class-intervals and (i) distributions having unequal class-intervals When class-inervals are equal, tke frequency onthe Y-axis, the variable onthe X-axis and construct rectangles. inauch a case the heights of the rectangles will be proportional tothe frequencies. Example : Draw a histogram from the following dats Classes Frequency 0-10 10-20 u 20-30 » 2040 a 40-50 16 50—60 10 «070 8 70-80 6 30-90 3 90-100 1 Solution : “a HISTOGRAM | > 204 — Zz {| & 1s | S | | 3 10} s o 10 20 30 40 30 70 80 90 100 cLass! When class-intervals are unequal the frequencies must be adjusted before constructing 8 histogram. We take that clase which has the lowest class-interval and adjust the frequencies of ther classes accordingly. If one class- eel i twice as wide as the one aving the lowest classinterval we divide the ‘height ofits rectangle by two, if it inal tines more we divide itby three et. the heights will be proportional to the ratios of the frequencies to the width of the classes. Example : Represent the following data on # histogram. ‘Average monthly income of 1035 employees in a construction Industry i$ BIVen below 9 Monthly Income (Rs.) No. of Workers 600700 25 700—800 100 800900 150 900—1000 200 1000—1200 240, 1200—1400 160 14001500 50 1500—1800 0 1800 or more 20 ‘Solution : Histogram showing monthly Incomes of workers 200 ‘NUMBER OF WORKERS be oleae OMe il 600700800900 1000 1100 1200 1300 1400 1500 ia00 * MONTHLY INCOME When mid point are given, first we ascertain the upper and lower limits of each class and then construct the histogram in the same manner. Example : Draw a histogram of the following distribution Life of Electric Lamps Firm A Firm B in hours 1010 10 287 1030 130 105 1050 482 26 1070 300 20 1090 18 352 Solntion 7 Since we are given the mi points, we should ascertain te class limits, To ‘ealeulate the class limits of snom sacs ake difference of two consecutive mid-poins and divide the difference bby 2, then add and subtract The walve obtained from each mid-point o calculate lower and higher cass-limits. Life of Electric Frequency Frequeney Lamps Firm A Firm B 10001020) 10 287 1020—1040 130 105 1040—1060 482 16 1o60—1080 360 230 1080—1100 18 352 10 HISTOGRAM (FIRM A) HISTOGRAM (FIRM B) | FREQUENCY FREQUENC’ | | 1000 1020 1040 1060 1080 1100 1000 1020 1040 1060 LIFE OF LAMPS LIFE OF LAMPS 1004 2, Fequency Polygon “This is a graph of frequeney distribution which has more than four sides. Its particularly effective in comparing two or more frequency distributions. There are two ways of constructing a frequency polygon. (We may draw a histogram of the given data and then join by straight line the mid-points of the upper horizontal side of each rectangle with the adjacent ones. The figure so formed shall be frequency polygon. Both the ‘ends of the polygon should be extended to the base line in order to make the area under frequency polygons equal to the are under Histogram. [NUMBER OF STUDENTS (FREQUENCY) Another method of constructing frequency polygon isto take the mid-poins of the various class- intervals and then plot the frequency corresonding to each point and join all these points by straight lines. The igure obtained by both the methods would be equal. n 3 Frequency polygon has an advantage over the histogram. The frequency polygons of several distributions canbe deswson he same axis, which makes comparisons possible whereas histogram can not be usefully employed fn the same way. To compare histograms we draw them on separate graphs. 3. Smoothed Frequency Curve A smoothed frequency eure ean be drawn through the various points ofthe polygon. The curves drawn By free hand inchs manner thatthe area included under the curve is approximately the sume as that of the Poy ae aie tat dring« smoothed curve ist eliminate a faras posible al accidental variations which xis 08 vin] data, whilesmoathening, te top ofthe curve woud overtop the highest point of polygon panicle when the steitade ofthe class interval is large. The curve should look as regular as possible and ll sudden 0) should be ere ecient of smoothening would depend upon the nature of the data. For drawing smoothed frequene) sv0 idea naccesary to frst draw the polygon and then smoothen it, We must keep in mind the following Pointsto smoothen a frequeney graph : (@)_ Only frequency distribution based on samples should be smoothened. Gi) Only continuous series should be smoothened. The total area under the curve shouldbe equal tothe area under the histogram or polyBon. ‘The diagram given below will illustrate the point : THSTOURAN FREQUENCY POLYGON AND CONE 4. Cumulative Frequency Curves or Ogives We have discussed the charting of simple distributions where each frequency refers to the measurement of the class-interval against which it is placed. Sometimes it becomes necessary to know the number of items whose Yalues are greater or less than a certain amount. We may, for example, be interested in knowing the munaber of Teens whose weight is less than 65 Ibs. or more than say 15.5 Ibs. To get this information, itis necessary to change the form of frequency distribution froma simple toa curnulative distribution Ina cumulative frequency distribution of the frequency ofeach clas is made to include the frequencies ofall the lower oral the upper classes depending upon {he manner in which cumulation is done. The graph of sucha distribution is called a cumulative frequency curve or an Ogive. There are two method of constructing ogives, namely’: (@ less then method and i)_more than method. in te less than method, we start with the upper limit ofeach class and go on adding the frequencies. When these frequencies are plotted we get a rising curve, In the more than method, we start with the lower limit of each class and we subtract the frequency of each class from total frequencies. When these frequencies are plotted, we get a declining curve This example would ilustrate both types of ogives. Example : Draw ogives by both the methods from the following data Distribution of weight of the Students of a college (Ibs.) Weights No, of Students 90.5—1005 5 100.5—1105 34 10s—1205 139 1205—130.5 300 1305—1405 367 1405—1505 319 1505—1605 205 16051705 6 1705—180.5 6 1805—190.5 16 190.5—200.5 3 20052105 4 210S—2205 3 22052305 1 Solution : First ofall we shall find out the cumulative frequencies ofthe given data by less than method Less than (Weights) Cumulative frequency 1005 5 10s 39 1205 178 13 1305 478 1405 345 1505 64 1605 1369 1705 1445 180.5 1488 1905 1504 2005 1307 210.5 r 151 2205 s4 2305 1515 Plot these frequencies and weights on a graph paper. The curve formed is called an Opive (CUMULATIVE FREQUENCY Now we calculate the cumulative frequencies of the 19 given data by more than method. More than (Weights) 905 100.5 os 1205 1305 4 ‘Cumulative frequencies 1515 1510 1476 137 1037 1405 670 1505 351 1605 146 1705 0 1805 n 1905 ul 2005 8 2105 4 2205 1 By plotting these frequencies on a graph paper, we will get a declining curve which will be our cumulative frequency curve or Ogive by More than method. = 0 ‘Although the graphs are a powerful and effective media of presenting statistical data, they are not under all ‘umstances and for all purposes complete substitutes for tabular and other forms of presentation. The specialist in this field is one who recognizes not only the advantages but also the limitations of these techniques. He knows when touse and when not to use these methods and from his experience and expertise is able to select the most appropriate method for every purpose. Example: Draw an ogive by less than method and determine the number of companies getting profits between Rs. 45 crores and Rs. 75 crores as Profits No. of Profits No. of (Rs. crores) Companies (Rs. crores) Companies 1020 8 60—70 10 20-30 2 70—80 7 30—40 20 80-90 3 40—s0 24 90—100 1 50—60 Is Solution = OGIVE BY LESS THAN"METHOD Profit No. of OGIVE BY LESS THAN METHOD (Rs. Crores) Companies Less than 20 8 Less than 30 20 Less than 40 40 Less than 50 64 Less than 60 » Less than 70 89 Less than 80 6 Less than 90 #9 Less than 100 100 NO. OF COMPANIES: 2 30 404580 60 707580 85 PROFIT RS. IN CRORES Ttis clear from the graph that the number of companies getting profits less than Rs. 75 erores is 92 and the number of companies getting profits less than Rs. 45 crores is $1. Hence the number of companies getting profits between Rs. 45 crores and Rs. 75 crores is 92~$1 = 41. “The following distribution is with regard to weight in gfams of mangoes ofa given variety. If mangoes of than 443 grams be considered unsuitable for foreign market, what is the percentage of total yield suitable for it? Assume the given frequency distribution to be typical ofthe variety Weight in gms. No. of mangoes ‘Weight in gms. No. of mangoes 410419 10 450459 45 420429 20 460469, 18 430439 a2 470479 7 440449. s4 Draw an ogive of more than’ type of the above data and deduce how many mangoes will be more than 443 grams. Solution : Mangoes weighting more than 443 gms. are suitable for foreign market. Number of mangoes weighting more than 443 gms lies in the lat four classes. Number of mangoes weighing between 444 and 449 grams would be 16 $454 = 10 10 Total number of mangoes weighing more than 443 gms. = 32.4 +45 +18+7=1024 1024 Percentage of mangoes = 5¢~* 100 = 2.25 Therefore, the percentage of the total mangoes suitable for foreign market is 52.25 OGIVE BY MORE THAN METHOD n (gms) __No. of Mangoes * 410 196 420 186 430 166, 440 14 450 70 460 25 at 470 7 sapien TO 02 From the graph it can be seen that there are 103 mangoes whose weight will be more than 443 gms. and are suitable for foreign market. ; -LESSON 2 MEASURES OF CENTRAL TENDENCY What is Central Tendency : ‘One of the important objectives of statistical is to find out various numerical values which explains the inherent characteristics of a frequency distribution. The first of such measures is averages. The averages are the measures which condense a huge unwieldy set of numerical data into single numerical values which represent the entire distribution. The inherent inability ofthe human mind to a large body of numerical data remember compels us ‘o few constants that will describe the data. Averages provide us the gist and give a bird's eye view of the huge mass ‘of unwieldy numerical data. Averages are the typical values around which other items of the distribution congregate. This value lie between the two extreme observation of the distribution and give us an idea about the concentration of the values in the eentral part of the distribution. They are sometimes called as the measures of central tendency. Averages are also called measures of location since they enable us to locate the position or place of the distribution in question, Averages are statistical constants which enables us to comprehend in a single value the significance of the whole, According to Croxton and Cowden, an average value is single value within the range of the data that is used to represent al the values in that series. Since an average is somewhere within the range of the data, itis sometimes called a measure of central value. An average, known as the measure of central tendency, is the ‘most typical representative item of the group to which it belongs and which is capable of revealing all important characteristics ofthat group or distribution. What are the Objects of Central Tendency : ‘The most important object of calculating an averageor measuring central tendency is to determine a single figure which may be used to represent a whole series involving magnitudes of the same variable. ‘Second object is that an average represents the entire data, it facilities comparison within one group or between group of data. Thus, the performance of the members of a group can be compared with the average Performance of different group. ‘Third object is that an average helps in computing various other statistical measures such as dispersion, skewness, kurtosis et. Essential of a Good Average An average represents the statistical data and itis used for purposes of comparison, it must posses the following properties. |. Itmustbe rigidly defined and not left to the mere estimation of the observer. Ifthe definition is rigid, the computed value of the average obtained by different persons shall be similar. ‘The average must be based upon all values given in the distribution. Ifthe item is not based on all values it might not be representative of the entire group of data 3. Itshould be easily understood. The average should possess simple and obvious properties. It should be too abstract for the common people. 4. Itshould be capable of being calculated with reasonable care and rapidity. 5. Itshould be stable and unaffected by sampling fluctuations. 6. It should b& capable of further algebraic manipulation. 18 Different methods of measuring “Central Tendeney” provide us with different kinds of averages. The follows ing are the main types of averages that are commonly used : 1. Mean (Arithmetic mean Gi) Weighted mean Gi). Geometric mean (iv) Harmonie mean 2. Median 3. Mode Arithmetic Mean : The arithmetic mean of a series is the quotient obtained by dividing the sum of the values by the number of items. In algebraic language, if X,, X;, X, Arithmetic Mean (X ) is defined by the following formula : _X, are the n values ofa variate X, then the 2% #2 4X torn #X) Le Ex =x), = 2% ie Example: The following are the monthly salaries (Rs. of ten employees in an of the employees : 250, 275, 265, 280, 400, 490, 670, 890, 1100, 1250 ce. Caleulate the mean salary of pe Bie vias ane Solution ; N x = 25022754265 +280 + 400+ 490+ 670+ 890-+1100+1280 _ S870 _ p55, 10 10 Short-cut Method ; Direct method is suitable where the number of items is moderate and the figures are ‘small sizes and integers. But if the number of items is large and/or the values of the variate are big, then the process ‘of adding together all the values may be a lengthy process. To overcome this difficulty of computations, a short-cut ‘method may be used. Shortcut method of computation is based on an important characteristic ofthe arithmetic mean, that is, she algebraic sum of the deviations of a series of individual observations from their mean is always equal to zero, Thus deviations of the various values of the variate from an assumed mean computed and the sum is divided by the number of items. The quotient obtained is added to the assumed mean to find the arithmetic mean. N ‘We can solve the previous example by short-cut method. Computation of Arithmetic Mean Serial ‘Salary (Rupees) Deviations from assumed mean Number x where di (X- A), A = 400 1 250 ~150 2 215 125 3. 265 135 4 280 ~120 as 5. 400 0 490 +90 1 67 +270 8 890 +490 1100 +700 10, 1250 +850 E dx = 1870 ide ye 3 E N By substituting the values in the formula, we get Computation of Arithmetic Mean in Discrete series. In discrete series, arithmetic mean may be com- puted by both direct and short cut methods. The formula aecording to direct method is + 4%) = BLO X, have frequencies ff Example. The following table gives the distribution of 100 accidents during area days of the week ina given month. During a particular month there were 5 Fridays and Saturdays and only four each of other days. Calculate the average number of accidents per day. Realy 7 where the variable values X,, X25 fy and N= Efe Days: Sun. Mon. Tue. Wed. Thur. Fri Sat. Total Number of accidents : 20 2 10 9 n 8 2 = 100 Solution Calculation of Number of Accidents per Day Day No. of No. of days Total accidents Accidents in month x f Ix Sunday 2 4 0 J Monday 2 4 88 Tuesday 10 4 40 Wednesday 9 4 6 Thursday " 4 “4 Friday 8 5 40 Saturday 2» 5 100 100 N=30 EsfxX= 428 20 EEX _ 428 EEX 222 aan arate a on 14 accidents per day The formula for computation of arithmetic mean according to the Short cut method is fide At zs where A is Assumed mean, dr = (X ~ A) and N= Ef x We can solve the previous example by short-cut method as given below : Calculation of Average Accidents per day Day z de=X-A (where A= 10) Sunday 2 +10 Monday +12 ‘Tuesday +0 Wednesday =r Thursday +1 Friday ea Saturday 30 4 accidents per day Calculation of arithmetic mean for Continuous Series : The arithmetic mean can be computed both by direct and short-cut method. In addition, a coding method or step deviation method is also applied for simplification of calculations. In any case, itis necessary to find out the mid-values of the various classes in the frequency distribution before arithmetic mean of the frequency distribution can be computed. Once the mid-points of various classes are found out, then the process of the calculation of arithmetic mean is same as in the case of discrete series. In case of direct method, the formula to be used : x-Lfm X= SE" when m oe Inthe short-cut method, the following formula is applied nid points of various classes and N = the total frequency . it = A+ EL where de = (m— Ay and N= Ef ‘The short-cut method can further be simplified in practice and is named coding method, The deviations from the assumed mean are divided by a common factor to reduce their size. The sum ofthe products ofthe deviations and frequencies is multiplied by this common factor and then its divided by the total frequency and added to the assumed ‘mean. Symbolically and i= common factor Example. Followingis the frequency distribution of marks obtained by $0 students in a test of Statistics ‘Marks ‘Number of Students 0-10 4 10—20 6 20-30 » 30—40 40—s0 7 50—60 3 ‘Calculate arithmetic mean by; (direct method, (i) short-cut method, and (ii) coding method Solution : Calculation of Arithmetic Mean fn de=m—A Sa’x (where A = 25) where i= 10 ~20 -2 -8 -10 -1 -6 ° 0 o 350 +10 +1 +10 315 #20 +2 140 +14 165 +30 +3 99 +9 N=50 Efm= 1440 Lfde= 190 Efd'x=+19 Direct Method Zym _ 1440 Short-cut Method Coding Method Kaas DE; = 254210 = 2543.8 = 28.8 marks, N 50 We can observe that answer of average marks ic, 28.8 is identical by all methods. Mathematical Properties of the Arithmetic Mean (The sum of the deviation of a given set of individual observations from the arithmetic mean is always 22 symbolically, 1. (X ~ 5X) = 0.It is due to this property that the arithmetic mean is characterised asthe cantre of gravity i.e the sum of positive deviations from the mean is equal to the sum of negative deviations the minimum when deviations are taken (ii) ‘The sum of squares of deviations of a set of observations from the arithmetic average. Symbolically, Z (X-— X)?= smaller than E (X— any other value) We can verify the above properties with the help ofthe following data Values Deviations from % Deviations from assumed mean x &-X) EK- (=A) zaK-ay 3 6 % 2 49 s 4 16 -5 25 10 1 1 0 ° 2 3 9 2 4 13 6 36 5 5 0 98 3 108 = 9, where A (assumed mean) = 10 (ii) Ifeach value of a variable X is increased or decreased or multiplied by a constant k, the arithmet ‘mean also increases or decreases or multiplies by the same constant, (iv) If we are given the arithmetic mean and number of items of two or more groups, we can compute the combined average of these groups by apply the following formula where X,, refers to combined average of two groups, X, refers to arithmetic mean of frst group, X, refers to arithmetic mean of second group, 1; refers to number of items of first group, and Nj refers to number of items of second group We can understand the property with the help of the following examples. ste : The average marks of 25 male students in a section is 61 and average marks of 35 female students in the ‘ame section is 58. Find combined average marks of 60 students. Solution : We are given the following information, 38, N,=35 61) + 35x58) © * 59.25 marks. NN, 25435 Apply Example : The mean wage of 100 workers in a factory, running two shifts of 60 and 40 workers respectively is Rs. 35 The mean wage of 60 workers in morning shifts Rs. 40, Find the mean wage of 40 workers working inthe evening shift 23 Solution ; We are given the following information, | =40,N,= 60, Ky = 2,Ny=40, X,, =38, and N=100 Apply (BAA CON) 3800 2400 + 40%, 3800 — 2400 40 = 85. Example: The mean age of a combined group of men and women is 30 years. Ifthe mean age of the group of men is 32 and that of women group is 27, find out the percentage of men and women in the group. Solution : Let us take group of men as first group and women as second group. Therefore, X,= 32 years, 27 years, and Xj, ~30 years. Inthe problem, we are not given the number of men and women, We ean assume N, +N; = 100 and therefore, N, = 100 —N, Apply %, pocenae N,N 32N, +27N, 30 = NTE ‘Substitute N, = 100 ~ ie (Substitute N, = 100-3) 30% 100 =32(100-N,)+27N; or SN,=200 N, = 20/5 = 40% Nj; =(100—N,) = (100-40) = 60% Therefore, the percentage of men in the group is 60 and that of women is 40. (%) Because X = ZX IXx- If'we replace each item in the series by the mean, the sum of these substitutions will be equal to the sum of the individual items. This property is used to find out the aggregate values and corrected averages. We can understand the property with the help of an example. Example : Mean of 100 observations is found to be 44. If atthe time of computation two items are wrongly taken as 30 and 27 inplace of 3 and 72. Find the corrected average. Solution: x = 2X N IX =N.X = 100 44 = 4400 Corrected EX = E X + correct items — wrong items = 4400 + 3 + 72-30-27 = 4418 Comected EX _ 4418 Corrected average = met N 100 = 44.18 24 1n of Arithmetic mean in Case of Open-End Classes + ‘Open-end classes are those in which lower limit ofthe first class and the upper limit ofthe lat class are not defined. In these series, we can not calculate mean unless we make an assumption about the unknown limits. The ‘assumption depends upon the class-interval following the first class and preceding the last class. For example ‘Marks No, of students Below 15 4 1s—30 6 3045 2 4500 8 Above 60, n Tn this example, because all defined class-intervals are same, the assumption would be that the first and last class shall have same class-interval of 15 and hence the lower limit of the first class shall be zero and upper limit of last class shall be 75. Hence first class would be 0—15 and the last class 60—75. ‘What happens in this case ? ‘Marks No. of students Below 10 4 10-30 7 30-60 10 60100 Above 100 Tn this problem because the clas interval is 20 inthe second class, 30 in the third, 0 inthe fourth class an so on. The class interval is increasing by 10. Therefore the appropriate assumption in this case would be that th Tower limit of the first class is zero and the upper limit of the last class is 150. In case of other open-end clas distributions the fist class limit should be fixed on the basis of succeeding class interval and the lastclass limit shoul, be fixed on the basis of preceding class interval. Ifthe class intervals are of varying width, an effort should be made to avoid calculating mean and mode, Iris advisable to calculate median. Weighted Mean In the computation of arithmetic mean, we give equal importance to each item in the series. Raja Toy Shop sell: Toy Cars at Rs. 3 each; Toy Locomotives at Rs. § each; Toy Aeroplane at Rs. 7 each; and Toy Double Decker at Rs. 9 each. ‘What shall be the average price of the toys sold ? Ifthe shop sells 4 toys one of each kind, (Mean Price) = 2X a aps 6. nN 4 Inthiscase the importance of each ty is equal as one toy ofeach variety ha been sold. While computing the arithmetic mean this fact has been taken care of including the price of each toy once only. But ifthe shop sels 100 toys, $0 cars, 25 locomotives, 15 aeroplanes and 10 double deckers, the importance ofthe four toys to the dealer isnot equal as source of earning revenuc In fact their respective importance is equal to the number of units ofeach toy sold, i.e. the importance of Toy ear is $0; the importance of Locomotive i 25; the importance of Aeroplane is 15; and the importance of Double Decker is 10 It may be noted that 50, 25, 15, 10 are the quantities of the various classes of toys sold. These quantities are 25 fight is represented by symbol W and EW represents the sum of wee and are taken into ese weights are of great import W.X,)+ (WX) + WK | EW Ww, + W IW sand X;, Xp Xye Xq Fepresents the price of 4 varieties of toy. of W,, Wy, Wy, Wa and XX, Xs, Xp We get ) 7 9 3 10 10s+90 _ 470 = 2 2 Rs.4.70 io e of computing the weighted Mean. trates the procedu Toys by the Raja Shop. ‘Number sold OO ————————— Price * weight fay (Rs.) snber of skilled and unskilled workers in two localities along with Shyam Nagar Number Wages (per hour) 350 D 650 Also give reasons why the results show that th though in Shyam Nagar the averagk ge in each locality the average hourly wage in Ram Nagar, eve ies of workers is lower. It is required to compute weighted mean Ram Nagar es 3 “i w wx ‘tilled 1.80 150 270 175 350 Unskilled = 1.30 850 1105 6 Total 1000 1375 10 een 1375 Res 1000 1000 Itmay be noted that weights are more evenly assigned to the different categories of workers in Shyam Nagar than in Ram Nagar. Geometric Meat In general, if we have n numbers (none of them being zero), then the G.M. is defined as GM. fi ype = Oy He ie Inthe case of a discrete series if rj, X3)snnn%y OCCU fin fi frequency (ie. N= fit fats mnnneay then f, times respectively and N is the total GM.= Uahn 7: Se For convenience, use of logarithms is made extensively to calculated the mth root. in terms of logarithms log », + log x, + oa. =aL( a =a (Ems rs Ef log x In series, GM. = AL discrete M ia Ef log m and in the ease of continuous series, G.M.= AL =7—> Example : Calculate G.M. ofthe following data : 2g Bats Solution: GM.= Y2x4x8 = VO = 4 In terms of logarithms, the question can be solve as follows og2=03010, log 4= 0.6021, and log $= 9.9031 Apply the formula b Log W GM. = AL AL 0.60206 = 4 Example : Calculate geometric mean of the following data : 27 Solution : Calculation of G.M. TYlog. AL (0.9032) = 8.002 om. = at Ef log x) oat (2628 N 40 Example : Calculate G.M. from the following data Bs 95145 145-195 245295 295345 345395 395445 Solution : Calculation of GM. x logm flog m 95145 1.0792 10.7920 145-195 1.2304 18.4560 19.5245 13424 22.8208 245295 1434 5 35,7850 295345 1.5051 27.0918 34.5395 1.9682 18,8184 39.5445 1.6232 12.9850 Tflog m= 146.7410 46.7490 3M. = AL, GM. ae ) = AL (1.3976) = 2498 Specific uses of G.! () Itis used in the construction of index numbers, the geometric Mean has certain specific uses, some of them are : Gi) tisalso helpful in finding out the compound rates of change such asthe rae of growth of population in a country. (i) tis suitable where the data are expressed in terms of rates, ratios and percentage. (iv) Tris quite useful in computing the average rates of depreciation or appreciation, (¥)_Ihis most suitable when large weights are to be assigned to small items and small weights to large ite Example : The gross national product of a country was Rs. 1,000 crores 10 years earlier. Itis Rs. 2,000 crores" Calculate the rate of growth in G.N.P. Solution : In this case compound interest formula will be used for computing the average annual per cent inerease of growth, P,= Ptr)" where P, = prinicipal sum (or any other variate) at the end of the period. P, = prinicipal sum in the beginning of the period. r= rate of increase or decrease. n= number of years. It may be noted that the above formula can also be written in the following form ed R Substituting the values given in the formula, we have 10718, 10718- 18%, Hence, the rate of growth in GNP is 7.18%. Example: The price of commodity increased by 5 per cent from 1998 to 1999, & per cent from 1999 to 2000 and 77 ‘per cent from 2000 to 2001. The average increase from 1998 to 2001 is quoted at 26 per cent and not 30 per cent. Explain this statement and verify the arithmetic. Solution : Taking P, as the price at the end of the period, P, as the price in the beginning, we can substitute the values of P, and P, in the compound interest formula. Taking P, = 100; P, = 200.72 P, = Pita" . 200.72 = 100(1+1)° or ary 0.260 = 26% Thus increase is not average of (5 + 8 + 7V/3 = 30 per cent. It is 26% as found out by G.M, Weighted G.M. : The weighted G.M. is calculated with the help of the following formula OM. fanaa, log x, % w, + log x) Wy “AL Ee lt =w ind out weighted G.M. from the following data Group Index number Weights Food 352 48 Fuel 220 10 Cloth 230 8 House Rent 160 2 Mise. 190 15 Solution + Calculation of Weighted G.M. Group Index Number(x) Weights (w) Log x wlog x Food 352 8 25465 122.2320 Fuel 20 0 23424 23.4240 Cloth 230 8 23617 178936 House Rent 160 2 22041 26.4492 Mise. 190 1s 2.2788 34.1920 8 225.1808 Ewlgs]_ ay Zim vei at| Eee | ap 2STE - 2638 GM. (weighted [ ee = Example : A machine depreciates at the rate of 35.5% per annum in the first year, atthe rate of 22.5% per annum inthe second year, and at the rate of 9.5% per annum in the third year, each percentage being computed on the actual value. What is the average rate of depreciation? WI ution : Average rate of depreciation can be calculated by taking G.M. Year X (values taking 100 as base) log X I 100 -35.5= 64.5, 1.8096 " 100 -22.5=77.5 1.8893 100 - 95=90.5 1.9566 Dlog X = 5.6555 Apply GM. = aL| Zt 2 Average rate of acted = 100 ~ 76.77 = 23.33%. = AL1.8851 = 76.77 Example : The arithmetic mean and geometric mean of two values are 10 and 8 respectively. Find the values. Solution : If two values are taken as a'and b, then 0, and fab=8 ath or 20, ab=6s than a-b= (arb? —4ab = [Qa axe = (WO~ 256 2 Now, we have a+b=20, o a-b=12 (ii) Solving for a and b, we get a= 4 and b= 16. Harmonic Mean : The harmonic mean is defined as the reciprocals of the average of reciprocals of all items in a series. Symbolically, HM. fit. In case of a discrete series, and in case of a continuous series, HM.= yn ; Zhe} SS eee Example : Calculate harmonic mean from the following data : 5, 15, 25, 35 and 45 Solution : T x x 5 0.20 15 0.067 5 0.040 35 0.029 45 0.022 HM aie 14 approx. a) oe \x) example : From the following data compute the value ofthe harmonic mean x 5 i. a f s 5 eee oo Solution Calculation of Harmonie Mean 1 1 x f . ie 5 5 0200 1.000 15 5 0.067 100s 5 10 0.040 0.400 8 1s 029 0435 45 s 0.022 0.110 7 rs= 50 =(s1)=29 N 50 HM.= ay = Fog = 17 aPprOX T] ~ 295 zn} Example: Calculate harmonic mean from the following distribution: x y 0-10 5 10-20 5 20-30 10 30-40 15 40-50 5 Souation Firs ofall we sall find out mid points ofthe various classes. They are S, 15,25, 35 and 45. Then we will calculate the H.M. by applying the following formula 32 Calculation of Harmonic Mean 1 1 x (mid points) if = pt 5 5 0200 000 1s 5 0.067 1.005 25 0 0.040 0.400 s 1s 029 043s 45 5 0022 0.10 Es=50 2.950 The answer will be 17 (approx). Application of Harmonic Mean to special cases: Like Geometric means, the harmonic mean is also applicable to certain special types of problems. Some of them are: ( If, in averaging time rates, distance is constant, then H.M. is to be calculated, Example: A man travels 480 km. a day. One the first day he travels for 12 hours @ 40 km. per hour and second day for 10 hours @ 48 km. per hour. On the third day he travels for 15 hours @ 32 km. per hour. Find his aver Solution: We shall use the harmonic mean, speed. petgte athe 9 cee i Da Fran 7 39 bm. per hour (approx) (x) 40°4 (48+ 40+ 3: The arithmetic mean would be { = 40 km. per hour. prey (i) If, in averaging the price data, the prices are expressed as “quantity per rupee”. Then harmonic mean should be applied. Example : A man purchased one kilo of eabbage form each of four places at the rate of 20 kg., 16 kg. 12 kg. and 10 kg. per rupees respectively. On the average how many kilos of cabbages he has purchased per rupee, 240 Solution: HM. = = 13.5 kg, pet rupee. MEDIAN The median is that value of the variable which divides the group in two equal parts. One part comprising the values greater than and the other all values less than median, Median ofa distribution may be defined as that value of the variable which exceeds and is exceeded by the same number of obscrvation. It is the value such that the number of observations above it is equal to the number of observations below it. Thus we know that the arithmetic mean is based on ll items ofthe distribution, the median is positional average, that is it depends upo a value in the frequency distribution. n the position occupied by When the items ofa series are arranged in ascending or descending order of magnitude the value of the riddle item in the series in known as median inthe case of individual observations. Symbolically, Median= size of #4) mem ifthe number of items is even then there is no value exactly in the middle of the series. In such a situation the median is arbitrarily taken to be halfway between the two middle items. Symbolically, (ne size of th item + sie of (% +) item ie ae) Median Example : Find the median of the following series: compo Meta w. (i) Serial No. x ‘Serial No. x : : : ; : : ; i : : i x rx Mein anot(%2!) hens 22 em sashes size of 4th + size of Sth item _ 9 +11 2 Location of Median in Discrete series: Ina discrete series, medium is computed inthe following manne 10 (j) Arrange the given variable data in ascending or descending order. (ii) Find cumulative frequencies. 34 (ii) Apply Med. = size of ( (iv) Locate median according tothe size i.e, variable corresponding tothe size or for next cumulative onan Example: Following are the number of rooms in he houses of particular locality. Find median ofthe dat: Noofwom: 3 455 6 7 8G) No. of houses 3B 654 an 42 12 2 ‘Computation of Median No. of rooms No. of houses Cumulative frequency ne t of 38 38 654 on 1003 104s 1087 1059 3 4 5 6 1 8 ea 2 Median lies in the cumulative frequency of 692 and the value corresponding to this is 4 Therefore, Median = 4 rooms In a continuous series, median is computed in the following manner : (Arrange the given variable data in ascending or descending order. (i Ifinclusive series is given, it must be converted into exclusive series to find ral class intervals. (Gi) Find cumulative frequencies. (iv) Apply Median = size of Sa item to ascertain median class. (v) Apply formula of interpolation to ascertain the value of median. refers to lower limit of median class, refers to higher limit of median class, refers cumulative frequency of previous class, refers to frequency of median class, Example: The following table gives you the distribution of marks secured by some students in an examination ‘Marks No. of Students 0-20 2 21-30 48 3140 120 4150 4 s1—00 48 61 36 71-80 3 Falta nal oa sotaton fp Calton of Median Marks Marks No.of students &) o. 0-20 a 2130 38 31—40 120 41-50 4 si—oo 8 61—0 36 7180 31 N 399 Median = size of = th item = size of “> th =199-5th item therefore the median class is 30.5—40.5, Applying the formula of interpolation, which lies in (31—40) grou Median = J, + = 305 + 1995=80 ag) «305+ 85 120 2 10.46 marks, Related Positional Measures: The median divides the series into two equal parts. Similarly there are ino nasreacores which divide the series into certain equal parts. There are first quartile, third quartile, dectes percentiles ete. Ifthe items are arranged in ascending or descending order of magnitude, Q's that value which ree he otal nue of tems Sinulrly ifthe total amber ifitems are divided int en equal pats, then there shall be nine deciles Symbolically, Ne Firstquartile(Q,) = sizeot (S24) item 36 (N+1) +) th item Thirdquartle(Q,) = sizeof 3 Fintdecie(D,) = sawot( 24) item Uo (N+ size of 5 Sixthdecile(D)) item First percentile(P\)= size of thitem 1 percentile (P)) veot (Xt) te Once values ofthe items are found out, then formulae of interpolation are applied for ascertaining the value of Q;, Os, Di, Dus Pan Example : Calculate Q,, Qs,D, and Ps from the following data Marks: Below 10 10-20 2040 © 4060-60-80 Above 80 No. of Students 8 10 2 2s 10 5 Calculation of Positional values ‘Marks No. of Students () of Below 10 8 1020 10 18 2040 2 40 40-60 2 65 60-80 10 15 Above 80 5 80 2 and i= (f~1,) = 20 N 20, = =20, Cfo 18f By substituting the values, we get 2, = 29+ 29 =20 +1.8= 218 Similarly, we can calculate 3x80 th item = 60th item. 3N size of “> thitem 37 Hence Q, lies in the class 40°60, apply 3N 5 = Et where = 40, 2 = 60, y= 40, f=, '=20, y= + whee 40, = 6, p= 80,9025 0, = 40 + 2 «20 = 40 +16= 56 2N size of 5th item = 16th item, Hence D; les in the elas 10—20. 2N a aa 2N =h+ xi where y= 10, [~ = 16, Cfy=8,/= 10, /= 10. 10+ 09-9), 10 10=10+8=18 £28 item = 5% item = ath tm, Hence Plies in the class O—10 ize of tite = 55h item = th item Hence Pisin the class N 5N xi where lh =0, 755 = 4, fo=0,f= 8, i= 10 Ps = 0x4 2 x10=0+ Calculation of Missing Frequencies: Example: In the frequency distribution of 100 families given below; the number of families corresponding to expen- diture groups 20-40 and 60-80 are missing from the table. However the median is known to be $0. Find out the missing, frequencies Expenditure: 0-20 20-40, 40-60 80 80-100 No.of families: 14 ” a 2 1s Solution: We shall assume the missing frequencies for the elasses 20-40 to be x and 60—80 to y Expenditure (Rs.) ‘No. of families ch 0-20 14 14 2040 x tx 40-60 an 1442743, 60—80 » Mtxty 80100 15 41 tistxty N= 100=56+x+y able, we have N= F = 56+ x+y =100. 38 x+y= 100-564 44, Median is given as 50 which lies in the class 40—60, which becomes the median class. By using the median formula we get : N ate Median = i, + xi 50-443) 6g _ 30-(4+%) 05 Oe) or, thon or 50-40 = 522x290 or 50-40 36-22 or 10*27=720-20x or 270 = 720-20x 20x = 720-270 450 xe 22 20 ae . By substitution the value of x in the equation, xty =44 We get, 225+y =44 . y =44-225=215, Hence frequency for the class 20—40 is 22.5 and 60—80 is 21.5. Mode Mode is that value of the variable which occurs or repeats itself maximum number of times. The mode is the most “fashionable” size in the sense in the sense that itis the most common and typical and is defined by Zizek as “the value occurring most frequently in series of items and around which the other items are distributed most densely.” In the words of Croxton and Cowden, the mode of a distribution isthe value atthe point where the items tend to be most heavily concentrated. According to A.M. Tuttle, Mode is the value which has the greater frequency density in its immediate neighbourhood. In the case of individual observations, the mode is that value which is repeated the maximum number of times in the series. The value of mode can be denoted by the alphabet = also. Example : Calculate mode from the following data: ‘Sr Neeusen erm" gro nige | upg sclnegupet iy smucangi a Ztha0y Marksobtained: 10 27 24 12 2 2 2 18 15 30 Solution + Marks No. of Studenis 10 1 2 1 1s 1 18 1 39 2» 1 4 i 2 a 30 1 Calculation of Mode in Discrete series. In discrete series, itis quite often determined by inspection. We can understand with the help of an example : Mode is 27 marks ra 2 3 4 5 6 7 —: Sob = 083 6 aie 8 6 By inspection, the modal size is 3 as it has the maximum frequency. But this test of greatest frequency is not fool proof as it isnot the frequency of a single class, but also the frequencies of the neighbour classes that decide the mode. In such cases, we shall be using the method of Grouping and Analysis table. Size of shoe 1 2 3 4 5 6 7 frequency 4 Soi Bascal 8 6 Solution : By inspection, the mode is 3, but the size of mode may be 5. This is so because the neighbouring frequencies of size 5 are greater than the neighbouring frequencies of size 3. This effect of neighbouring frequencies is seen with the help of grouping and analysis table technique. a ]° ] x] 4 ed pet 5 2 26 |» 26 6 8 ] —— ‘When there exist two groups of frequencies both while analysing the sizes of items. in equal magnitude, then we should consider either both or omit Analysis Table Column ‘Size of items with maximum frequency 1 3 2 56 3 1,234.5 4 4,56 40 5 5,67 6 34,5, Trem 5 cowrs maximum number of times, therefore, mode is S. We can note that by inspection we had determined 3 to be the mode. Deter requires one additional step. Once the modal class is determined by insp then the following formula of interpolation is applied tion of mode in continuous series : In the continuous series, the determination of mode tion or with the help of grouping technique, fi ~ fo ffs os =F, rake Seco: Mowe ere 1) 1h. = lower limit ofthe clas, where mode lis. upper limit ofthe class, where mode lies. J, = frequency ofthe class proceeding the modal class. J, = frequency ofthe clas, where mode lies Ja. = frequency ofthe clas succeeding the modal clas Example: Caleulate mode ofthe follwing frequency distribution Yariable Frequency a O-10 s 10-20 10 20-20 1s 30-40 4 40-50 0 50-60 5 @—0 3 Solution : Grouping Table a x 1 2 3 4 5 6 ow |S 5 1-210 0 3 230 1S 9 » » 4 ms a. ‘Analysis Table ‘Size of item with maximum frequency 20—30 20—30,30—40 1020, 2030 0-10, 10-20, 20-30 10—20,20—30, 30—40 20—30,30—40, 40—50 Modal group is 20—30 because it has occured 6 times. Applying the formula of interpolation. akong Mote = + AoA yi) 15-10 = 20+ 0-20) = 20 + 3 10)=283 30-10-14 Calculation of mode where it is ill defined. ‘The above formula is not applied where there are many ‘modal values in a series ora distribution. For instance there may be two or more than two items having the maximum frequency. In these cases, the series will be known as bimodal or multimodal series. The mode is said to be ill-defined and in such cases the following formula is applied. Mode = 3 Median - 2 Mean, Example : Calculate mode of the follwing frequency data Variate value Frequency 10—20 5 20-30 9 30-40 B 40-50 5060 60-70 70-80 80-99 Solution : First of all, ascertain the modal group with the help of process of grouping. Grouping Table xX 1 2 3 4 a 6 10-20 5 4 2030 9 2 2 403 8 u 4o-so 21 sa 41 soo 20 56 35 on 1 8 2B 7080 8 26 u 80-90 3 ‘Analysis Table Column ‘Size of item with maximum frequency 1 40-50 2 5060, 60—70 3 40—$0, 50-60 4 40—$0, 50-60, 60—70 5 2030, 3040, 40—$0, 5060, 60—70, 7080 6 30—40, 40—50, 50—60 There are two groups which occur equal number of items. They are 40—50 and 50—60. Therefore, we will apply the following formula Mode= 3 median -2mean and for this purpose the values of mean and median are required to be computed, Calculation of Mean and Median (73) Variate frequency mid values x f m d'x Sa’s of 10-20 5 15 -3 15 5 20—30 9 25 -2 -18 “4 3040 B 38 -1 B n 40—50 21 45 0 0 48 Medianis the ———— Ell CC 50-60 2 55 +1 +20 @ valuecf th 60-70 15 6 +2 83. itemwhich lies 70-80 8 % +3 91 in(40—S0) group 80-90 3 85 +4 oa N=94 ad ERE at i N 41-20 200 240+ 21=77 ag = 40+=—= no ar 7495 40 = 45+ 0) =45+ pg t= 45442 3 median ~2 mean =3 (49.5)-2 49.2)= 1485-984 = $0.1 nation of mode by curve fitting : Mode can also be computed by curve fitting, The following steps are to be taken; (Draw a histogram of the data. ip. Draw the lines diagonally inside the modal class rectangle, stating fom each upper comer ofthe rectangle to the upper corner of the adjacent rectangle i) Drawa perpendicular line from te intersection ofthe two diagonal lines tothe X-axis The abscissa of the point at which the perpendicular line meets isthe value ofthe mode: Example : Construct histogram forthe following distribution and, determine the mode graphically : xX: 0-10 10-20 20-30 30-40 40-50 f 5 8 15 2 1 the result with the help of interpolation. Ver Solution : 7 Example : Calculate mode from the following data Marks No. of Students Below 10 4 a 6 "30 a * 40 46

- = SOthitem. Because 50 is smaller to 67 ine, column, Median class is 40—50 Median = f, + x10 = 4044 x10 =419 21 Apply, Mode = 3 median ~ 2 mean Mode =3 * 41.9-2% 42.2 25.7 -84.3 = 413 Example: Median and mode of the wage distribution are known to be Rs. 33.5 and 34 respectively. Find the missing values. Wages (Rs,) 010 10-20 20-30 3040 400 50-60 010 No. of workers 6 4 Total = 230 Solutios 644) = 200-x-y. ‘We now proceed further to compute missing frequencies : We assume the missing frequencies as 20—30 as x, 30—40 as y, and 40-50 as 230- (4+ 16+x+y+ Wages (Rs) No. of workers Cumulative frequencies x f oS. 010 4 4 10-20 1s ~ 20-30 x +x 30—40 y D+rty 40-50 200-x-y 20 5060 6 26 70 4 20 N=230 N 77% pei Maan =e 2-1) 239 MS= 2042), cag y yQ335=30) = (115-20-»)10 3.5y = 1150-200- 10x 10x+35y = 950 Q) fizhe Apply, Mote = +A fo 421) Win henh ede —" (30-20) 30-8-12 43y—200) = 10(7-x) 10x + 2y 800 (ii) 15y =150, Substitute the value of lor +3.5 (100) 10x ‘Third missing frequency = 200 x Subtract equation (ii) from equation (i), = 950 13 100 bens 00 in equation (i), we get 980-350 600/10 = 60. a7 LESSON 3 MEASURES OF DISPERSION Need of dispersion? Messures of central tendency, Mean, Median, Mode, etc, indicate the central position ofa series. They indicate the yeneral magnitude of the data but fil to reveal all the peculiarities and characteristics of the series. In vier words, they fail to reveal the degree ofthe spread out orthe extent ofthe variability in individual items of the ane ion ‘Thicean be known by certain other measures, known as “Measures of Dispersion’ or Variation We can understanc. va-iation with the help of the following example Series I Series I Series It 10 2 10 10 8 2 0 20 8 EX=30 30 30 -2 2-10 x -2-10 x inal thre series, the value of arithmetic mean is 10. On the basis ofthis average, we ean say that the series are alike. If we carefully examine the composition of three series, we find the following di (i) In case of Ist series, the value are equal; but in 2nd and 3rd series, the values are unequal and do not follow any specific order. erences i ‘The magnitude of deviation, item-wis, is peife different fr the st, 2nd.and 3d seis, But al hese deviations eannot be ascertained ifthe value of ‘simple mean’ is taken nto consideration. Gi) inthese three series, itis quite possible thatthe vale ofarithmetie means 10; but the value of mean may differ {rom each other. This can be understood as follows 1 1 Mu 10 2 8 10 Median § Median 10 Median 0 2» R ‘The value of Median’ in Ist series is 10, in 2nd series = 8 and in 3rd series = 10. Therefore the value of Mean and Median are not identical Gi) Even though the average remains the same, the nature and extent ofthe distribution of the size of the items may vary. In other words, the structure of the fre teibuuuns may differ even though their ‘means are identical. What is Dispersion Simplest meaning that canbe attached tot word'dispersion’ ia lack of iformity inthe sizes or quantities 4B ofthe items ofa group or series. According to Reiglemen, “Dispersion isthe extent io which the magnitudes or quantities cof the items differ, the degree of diversity.” The word dispersion may also be used ty indicate the spread of the data In all these definitions, we can find the basic property of dispersion as a vaue tha indicates the extent to which all other values are dispersed about the central value ina particular distribution. Properties of a good measure of Dispersion ‘There are certain pre-requisites for a good measure of dispersion : Itshould be simple to understand. It should be easy to compute. It should be rigidly defined. It should be based on each individual item of the distr?~ation. It should be capable of further algebric treatment. It should have sampling stability. It should not be unduly affected by the extreme items. ‘Types of Dispersion : ‘The measures of dispersion can be either ‘absolute’ or ‘relative’, Absolute measures of dispersion are ‘expressed in the same units in which the original data aré expressed. For example, ifthe series is expressed as Marks ‘of the students ina particular subject; the absolute dispersion will provide the value in Marks. The only difficulty i that if two or more series are expressed in different unis, the series ecnnot be compared on the basis of dispersion Relative’ ot ‘Coefficient’ of dispersion i the ratio or the percentage of'a measure of absolute dispersion to an appropriate average. The basic advantage of this measure is that two or more series ean be compared with each other, despite the fact they are expressed in different units. ‘Theoritically, “Absolute measure’ of dispersion is better. But froma practical point of view, realtive or cofficient of dispersion is considered better as itis used to make comparison between series. Methods of Dispersion Methods of studying dispersion are divided into two types () Mathematical Methods : We ean study the ‘degree’ and ‘extent’ of variation by these methods. In this category, commonly used measures of dispersion are (a) Range (b) Quartile Deviation (©) Average Deviation (@) Standard deviation and coefficient of v (Graphic Methods : Where we want to study only the extent of variation, whether itis higher or lesser a Lorenz-curve is used aa weene ion Mathematical Methods (a) Range: Itis the simpest method of studying dispersion. Range isthe difference between the smallest value and the largest value of a series, While computing range, we do not take into account frequencies of different groups, Formula’ Absolute Range s L-s Coefficient of Range = = 49 where, L represents largest value in a distribution ‘represents smallest value in a distribution We can understand the computation of range with the help of examples of different series. (Raw Dat farks out of 50 in a subject of 12 students, in a class are given as follows 12, 18,20, 12, 16, 14, 30, 32, 28, 12, 12.and 35. In the example, the maximum or the highest marks obtained by a candidate i obtained by a candidate is ‘12°, Therefore, we ean calculate range; L=35and $= 12 Absolute Range = L $= 35 ~ 12 = 23 marks “35° and the lowest marks Coefficient of Range = (i Discrete Series Marks of the Students in No, of students ‘Accounts (out of 50) @ o Smallest 10 4 12 10 18 16 Largest 20 15 Total 45 “Absolute Range = 20 ~.10 = 10 marks Coefficient of Range = 12-034 appon 20+1 (i) Continuous Series * ‘frequencies r 10-15 4 1520 40 2025 26 25—30 i ‘Aeolus Range = L—$=30— 10=20 marks L=s_35-12_20 Liss Range isa simplest method of studying dispersion. Ittakes lesser time to ‘compute the ‘absolute’ and ‘relative’ Fa ee acount all te values of srs, itconsiers ol the extreme tems and mile range, Rang doc ence, Therefore, Range eno els anything south character ofthe dstbution Range cannot be computed in the case of ‘open ends’ distribution i.c., a ‘distribution where the lower limit ‘of the first pease a oper imi ofthe higher grou snot ot given. Coefficient of Range 0.5. approx. 50

You might also like