6 views

Uploaded by Adyasha Deeya

- quantttt.docx
- Generalised Sonic Porosity
- MAT130 Lecture Module 4
- Clements s
- STA2204(2)
- IJAIEM-2014-02-27-061
- Standing Height and Its Estimation Utilizing Foot Length Measurements in Adolescents From Western Region In
- 2 Mean Median Mode Variance
- Efectul de noutate in preferinta estetica-design de produs
- Final PPT Hermosisima
- Kul 5 Korelasi Dan Regresi2012I1
- Chapter 10
- Excel Stats Nicar2013
- Statistics Course Outline- Uttam Golder
- Defi Nations
- Cbr Statdas
- Hasil Uji Kompetensi Dan Kinerja Pegawai
- Solution Insurance
- ajbms_2011_1109
- Lecture 12 correlation and regression.pptx

You are on page 1of 13

The main objective of statistical analysis is to represent the data by one single

value which shows the concentration of data at that particular value. Such a value is

called the central value which facilitates easy comparison between two or more series

compared to loose data. Quantitative data organized or unorganized show a common

characteristic to concentrate at certain values usually some where in the centre of

distribution. Thus various measures which are employed to measure this tendency are

called measures of Central tendency. Constructing frequency distribution of raw data is

the first step towards condensation of large data into compact form. It is necessary to

condense the data into a single value. Such a single value is called an average. In most

of the data the average is a centre of concentration of the values in the date. Therefore,

the average is called a measure of central tendency. All values of the data are clustered

around the average and it carries the important properties of data. In that sense, it is

representative of the distribution. Two famous statistician named Yule and Kendall had

laid down certain requirements for an ideal average as follows:

2. Its computation should be based on all observations.

3. It should lend itself for algebraic treatment.

4. It should be least affected by extreme observations.

5. It should be easy to calculate and simple to understand.

6. It should not be affected by fluctuations of sampling.

2. Median

3. Mode

4. Quartiles

5. Geometric mean

6. Harmonic mean

7. Weighted mean

1. AM : It is the best known & widely used measures of central tendency. It is the

sum of all observations divided by no. of observations.

Sum of all observations

Mean =

No. of observations

Symbolically, if X1, X2, …….. XN are the values of a variable the mean is

computed by the formula.

N

i=1

N N

∑ is read as sigma

X = The mean of values

Xi = Values of the variable

N = No. of values

(Discrete frequency dist”) Total frequencies

Symbolically, if X1, X2 , …….XN are the value of a variable and F1, F2 …………..FN are

their corresponding frequencies, the mean is computed by the formula

N N

X = f1 X1 + f2 X2 + ……… + fN XN = ∑ f Xi = ∑ f Xi

i=1 i=1

f1 + f2 + ……… + fN ∑f N N

mean can be calculated by the formula.

2

N

∑ f dxi

X = A+ i=1

_______

N

Where A stands for assumed mean

dxi = deviations of xi values from assumed mean

f = frequencies

N = total frequencies

which fall in a given class are located at the mid-point of that class. This assumption

holds good only when the no. of frequencies is large.

From this assumption we take X1, X2 ………. XN as mid values of intervals and

calculated arithmetic mean

N

∑ fxi

X = i=1 where ∑ f = N

N

Computation procedure :

Step I : Write all class intervals serially in the first coln and

corresponding frequency in the second coln

and upper class interval and divide resultant quantity by 2

& put these values in third column.

in fourth coln. The addition of this column gives ∑ f X.

3

Sum of Second coln

If the values of variables are large in size, make it simple by using short cut

method.

Symbolically, X = A + d

Step – I choose any value from data which is called assured mean (a)

Step – II take the difference of assured mean & mid values known as

deviation of difference (d)

Step – III multiply each d by corresponding f

Step – IV calculate d by using the formula

Step – V the formula X = a + d is used to find mean of original data

1 It is rigidly defined

2 it is early to calculate & understand

3 It is based upon all the observations

4 It is capable of further mathematical treatment

5 It is least affected by sampling functions

Demerits of AM :-

1. It is used for quantitative data, mean cannot the calculated for qualitative data like

caste, religion and sex.

2. It is unduly affected by extreme observations.

3. It cannot be calculated when the frequency dist is with open end classes.

4. Some times, AM may not be an observation in a data.

5. It cannot be determined graphically.

4

n1 + n2

x1, x2 – mean of first group with size n1, n2 respectively.

Median:-

distribution with open end class and qualitative variables like honesty, sex, religion etc.

we use other meaning of CT like median.

Definition:-

Median may be defined as the central value of a variable when the values are

arranged in order of magnitude i.e., either in ascending order or in the descending order.

The median divides the series into two equal parts, 50% of the observations will be

smaller than the median while 50% of the observations will be larger than it.

2

Or median = 2 2

(un grouped data) 2 if n is even.

2

f

L2 = Upper limit of median class

f = Frequency of median class

cf = Cumulative frequency of the pre-median class

h = L2 – L1 class width

5

Merits of median;--(1) Easy to understand and easy to calculate .

(2) Can be computed for a distribution with open and classes.

(3) Not affected due to extreme observation .

(4) Applicable for quantitative as well as qualitative data.

(5)Can be determined graphically.

Demerits;- (1)It is not based on all the observations, hence it is not proper

representative.

Mode- The mode is the most common value of a variable that occurs

most frequently in a series.

(1) Ungrouped data: -In this case mode is obtained by inspection. For a

given data, mode may or may not exit & even if exists, it is not necessarily

. unique.

Mode= L+ fm- f1

------------------.h

2fm-f1-f2

L-Lower boundary of modal class

Fm- Frequency of modal class

Fi—Frequency of Pre modal class

F2-- Frequency of Post –modal class

h- Width of modal class

given distribution. As compared with the mean & median, the mode has very limited

utility.

2) Not affected by extreme observations.

3) Can be determined even though distribution has open-end classes.

4) Can be obtatined graphically.

Demerits:-

6

i. It is not based on all the observations.

ii. Not capabule of further Mathematical treatment.

iii. It is not rigidly defined.

iv. The calculation of mode is labourious & time consuming.

i.e. Mean-Mode=3(Mean-Median)

v. Quartiles :- The values which divide the given data into four

equal parts when observations are arranged in order of

magnitude are known as Quartiles. There will be three quartiles

Q1, Q2,& Q3. Q1 is known as lower quartile or first quartile

and will have 25% observations of the distributions

Below it and consequently 75% of the observations are

greater than it. The second quartile is known as Median &

Q3,75% observation below & 25% obs after.

4

Q1= (n/4)th +(n/4+1)th observation of arranged data if n=even

2

For grouped data:- The formula for determining quartile is

Q1= 1+ K-c.f./f *h , where Q1=first quartile , c.f- cumulative

Frequency of the class previous to first quartile class , f-freqency of first quartile.

h =class width of first quartile group , k= N/4 , where N= Total frequency

series. This is used when data contains a few extremely large or small values.

If there values are give 3, 9& 27 the GM be comp led as G= (3X9X27) 1/3=9

When the series consists more than three number ,it is difficult to extract root.

That is why logs are employed

GM= log G= log XI+ logX2+-----logXN

----------------------------------

N

N

=1 log xi

N

Or, G= Antilog [N log xi ]

-=1----------

N

7

For Disorate series ,

G=Antilog [ N f log xi ]

----------

N

For an ungrouped data the HM is given by formula X= 1+1+---+1

_ _ _

X1 X2 Xn

Or, N

-----------------------------

1

------

X + X2+ 1

MEASURES OF DISPERSION

As already discussed, the whole data is represented by a single value known as average.

It cannot describe the data completely. There may be two or more data sets with same

mean but data set may not be identified.

8

To avoid disuniformity in observations, if it is necessary to study the variation.

The variation is also known as dispersion. It gives the information how individual

observations are scattered or dispersed for the means of a large sizes.

Deviation=observation-Mean

Different Measures of Dispersion :

(i) Range : A-B

(ii) Quartile deviation : Q3-Q1

2

(iii) Coefficient of Quartile deviation : Q3 - Q1

Q3 + Q1

(iv) Mean deviation Md = ∑ x-x

(v) Standard deviation Md= ∑ + x-x

N

(vi) Variance : N= ∑f

(vii) Coefficient of variation :

Coefficient of mean deviation about mean = MD about mean ∑ x-x /X

mean n

Standard deviation : Positive square root of the arithmetic mean of the square of the

taken for the mean denoted by

δ = ∑ x-x 2

n

When population mean is not known, we can take sample mean as an estimate of

population mean. In this case, only (n-1) observations are independent. Therefore, when

there are n observation in the data, divisor is n-1. In statistical language n-1 is called

degree of freedom.

δ = ∑ x-x 2

n-1

on simplification = δ2 = 1/n(∑x2-nx-2)

When observations are large in size the formula for SD is lebonion short cut method may

be used.

I- Divide assigned mean ‘a’

9

II- Obtain deviation values u,d = x-a

III- Complete mean deviation

IV- Apply formula δ = ∑ (d2-nd-2

n-1

For grouped data δ = ∑ fd2- d-2 xh

n-1

6. Variance : The square of the standard deviation of a set of object is called the

variance & denoted by δ2

Merits of Standard deviation :

(i) It is rigidly defined.

(ii) It is based upon all observations.

(iii) It does not ignore the algebraic sign of deviation.

(iv) It is capable of further treatment.

(v) It is not much affected by sampling fluctuation.

Demerits of Standard deviation :

(i) It is difficult to understood & calculate.

(ii) It cannot be calculated for quantitative data &

(iii) It is unduly affected due to extreme deviation.

Coefficient of variation :

For comparing the variability of two frequency distribution, the relative is

known as Coefficient of variation. It is always expressed in percentage.

Cv = δ x 100

x

SUMMARY :

1. Standard deviation or variance is never negative.

2. When all observations are equal, standard deviation is zero.

3. When all the observation in the data are increased or decreased by a constant,

Standard deviation remains the same.

10

4. When each of the observation is multiplied by constant K, then the standard

deviation is K times the standard deviation of original data.

Many a times in statistics, the data is related to two variables known as bivariate

distribution . One of the variables is denoted by ‘x’ & other by ‘y’ & observations are

paired like (x,y). For example blood pressure & weight, age of wife & husband. Pulse

rate & temperature, height of father & sons etc.

We are interested to study whether there is mutual relations between two variables

under consideration or not. The joint relation is called correlations. Two variables are

said to be correlated when change in value of one variable causes corresponding change

in the value of theother variable. To study correlation, there must be logical relationship

between two variables.

Positive Correlation :

Increase in the value of the one variable causes increase in value of the other variable or

decrease in the value of one variable causes decrease in the value of other variable.

Correlation between these two variables is said to be positive correlation. In other words,

direction of change in values of two variables is same e.g Temp & pulse rate are

positively correlated.

Increase in the value of one variable causes decrease in the value of other variable & vice

versa. Change in the values of the two variables is in opposite direction.

The simplest way to study correlation is graphical method. Plot ‘n’ sized observation like

(X1, Y1) …..( Xn, Yn). Put these prints in a graph paper. These points are scattered. Thus

this diagram is known as scattered diagram .

Correlation Coefficient :

Prof. Karl Pearson has suggested a measure of degree of correlation coefficient. It

is calculated by the formula rxy

It is also called Product moment Correlation Coefficient.

r= n ∑ xy - ∑ x . ∑ y XXXXX

√ {n. ∑ x2 – ( ∑ x)2} √ n. ∑ y2- (∑y)2

or

r= 1/n ∑ xy- xy

11

√ (1/n. ∑ x2 - x2 ) x √ 1/n ∑y2-y2)

Properties of Correlation Coefficient :

(i) It always lies between -1 & +1. symbolically -1≤ r ≤ +1

(ii) r is a pure member , r is a unit less quantity.

(iii) Two independent variables are uncorrected , when x & y are independent ,

then r=0

(iv) The absolute value of Correlation Coefficient r is independent of change of

origin & scale.

RANK CORRELATION :

Given by the formula :

rs = 1- ∑ d2

n (n2-1)

Where n = No. of paired observation.

d= difference between respective ranks.

LINEAR REGRESSION :

First used by British biometrician Galton literally means stepping back towards

averages. Regression analysis is a mathematical measures of the average relationship

between two or more variables in terms of original units of the data . In Regression

analysis, there are two types of variables. The variables whose value is to be predicted is

called dependent variable & the variable which is used for prediction is called the

independent variable. In Regression analysis, independent variable is also known as

regressor, or predictor or explanator while the dependent variable is also known as

regressed or explained variable.

Y= a + bx

LINE OF REGRESSION :

If the variables in a bivariate distribution are related, we will find that the points

in the scatter diagram will cluster round some curve called the Curve of Regression. If the

curve is a straight line, it is called Line of Regression & there is said to be Linear

Regression between two variables. The Line of Regression is the line which gives the

best estimate to the value of one variable for any specific value of the other variable.

Thus the line of regression is the “line of best fi” & obtained by the principles of least

square.

12

Linear Equation satisfy an equation of the form

Y= a + bx falls as a straight line where a, b, are constant.

Mathematically, a is the y intercept &

b is the slope of the line.

Summarises the degree of relationship Summarises the nature of relationship

between two variables. between two variables.

Pairs of observation of two variables The value of one variable are selected at

selected at random. random by fixing the value of other

variables.

Applied to those cases where there is no Applied to those cases where there is a

direction of dependency. direction of dependency.

Cause & effect relationship between two One variable is dependent & another is

variables is not clear, x may be cause of y, independent.

y may be the cause of x or correlation may

be due to chance between two variables.

13

- quantttt.docxUploaded byumut
- Generalised Sonic PorosityUploaded byElok Galih Karuniawati
- MAT130 Lecture Module 4Uploaded byEric Bonilla
- Clements sUploaded byMar'atus Sholihah
- STA2204(2)Uploaded byRUHDRA
- IJAIEM-2014-02-27-061Uploaded byAnonymous vQrJlEN
- Standing Height and Its Estimation Utilizing Foot Length Measurements in Adolescents From Western Region InUploaded bybebek sakit
- 2 Mean Median Mode VarianceUploaded byBonita Mdoda-Armstrong
- Efectul de noutate in preferinta estetica-design de produsUploaded bySuduran Flavia
- Final PPT HermosisimaUploaded bypemea2008
- Kul 5 Korelasi Dan Regresi2012I1Uploaded bypenyabu
- Chapter 10Uploaded byCHloe Pang
- Excel Stats Nicar2013Uploaded byyaktamer
- Statistics Course Outline- Uttam GolderUploaded byUttam Golder
- Defi NationsUploaded byFarrukh Ali Khan
- Cbr StatdasUploaded byjosuas
- Hasil Uji Kompetensi Dan Kinerja PegawaiUploaded bySusiLowati
- Solution InsuranceUploaded byNirmal Patel
- ajbms_2011_1109Uploaded byShaneMurphy85
- Lecture 12 correlation and regression.pptxUploaded byEngr Muhammad Sohail
- Trade AssociationsUploaded byAndra Nuţu
- Lecture 7 Regression and CorrelationUploaded byfa2heem
- Combined Ad No 02-2017_updated.pdfUploaded byprincesaleem
- zik schriever regression reportUploaded byapi-270110231
- contoh Hasil Analisis DataUploaded byRisasiana
- Comm is IonUploaded byAl Ge Ir
- RI 5 Presentation Improved Decision Making With Sensitivity AnalysisUploaded byCity Aspire
- [Elearnica] -636449929369605365-Teaching for Transfer in EAP Hugging and Bridging Revisited - ScienceDirecUploaded byshahin baradaran
- 11Uploaded byAnonymous 932wiZUNjD
- RegressionUploaded byAdam Jensen

- flowrit_260_001_E-dUploaded byvsrikala68
- Auto ColorUploaded byClaudia Ivan
- L32_Quadratic Programming - Modified Simplex algorithm.pptUploaded byNirmit
- Performance Appraisal as Motivation MechanismUploaded bySri Kanth
- PinnacleUploaded byVikas Panjiyar
- Water Bound Macadam _ India _ Civil Engineering Test _ ConstructionUploaded byAmjid Afridi
- Storage Tank Heat Loss Calcs - Rev.1 31.12.2010Uploaded byshashi kant kumar
- Integral Image for Computation of Mean And VarianceUploaded bypi194043
- ThesisUploaded byessiengideon
- Sample Research Article Review.docxUploaded byAnonymous FAy7usZZ
- MeasuringOutcomes.pdfUploaded byCHARI
- Lect_01C_svUploaded bythespacewizard
- Anexo PCIUploaded byxneouz
- scriptUploaded byapi-404374046
- Strategies for Improving Students’ Acquisition of Practical Skills in Electrical Installation and Maintenance Work Trade in Technical Colleges in Kano StateUploaded bytheijes
- 425xkwff4km51c.pdfUploaded byYudhi Husein
- pavement Management System text bookUploaded byGoutham Athem
- MCQ Pointers (Prof. Reyes) (Unofficial)Uploaded byVanessa Baltao
- Oracle 12c Ch4Uploaded byMsShanylove
- Ecosystem Lens JanssonUploaded byMuhammad Zaki Hashim
- EGM_20feb2008Uploaded bysaifulislam9442
- International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)Uploaded byInternational Association of Scientific Innovations and Research (IASIR)
- introduction to management accounting.pptUploaded byManzzie
- The Trans Humeral Amputation Can Be PerformedUploaded byمحمد عدلي عدلان
- thrust boring procedure.pdfUploaded bytonful143
- Sevis OptimizersUploaded byShoib Khan
- S15 NEW Tubularsensors Manual RevC EngUploaded byA_A_J
- Six_SigmaUploaded byJane Lob
- Description of STM32F4xx HAL drivers.pdfUploaded bymehmet
- ETSNGTCR_0000000031566943 (1)Uploaded byMevlat Domi