10 views

Uploaded by Lei Yin

Data analysis for sampling

- Ch8 Confidence Interval Estimation Part1 - Suggested Problems Solutions
- Sampling Paper June 2012
- Kimia analitik metode evaluasi data
- Measurements and Instrumentation Test Examples
- finalpresentationwriteup
- Mini Tab
- Characterization of Photovoltaic Panels - The Effects of Dust
- Curve Fitting Made Easy
- Chapter 13 Solutions
- Effect Size Estimation
- Data Quality Guidance 100709
- skittles term project
- exam2_sol_58_f07.doc
- Application of reliability analysis in Heritage buildings
- cep writing scoring rubric-4 1
- 2 C - Signal Software Paper
- 668969873
- Invoice Accuracy - Line Items
- Importance of Accuracy and Precision.
- Full Text

You are on page 1of 8

DATA ANALYSIS

(Part 1)

HANIM AWAB Department of Chemistry Faculty of Science UTM

assessment on fewer data (generally>25) or data accumulated from the analysis of similar samples The problem is examined with respect to precision, accuracy and reliability required of the results Analysis of the results obtained are resolved into two stages: - examination of the reliability of the results - assessment of the meaning of the results

TYPES OF ERROR

1. GROSS ERROR (eg. eg. C Contaminated ontaminated reagents, faulty instrument) - Serious obvious errors that give outlier readings - Detectable with sufficient replicate measurements - Experiments with gross errors must be repeated 2. RANDOM/INDETERMINATE ERROR (eg. eg. Inaccurate manipulation of procedure) - Data scattered symmetrically about a mean value - Deviations of measurements from the mean shown using the Gaussian or normal error curve - Cannot eliminate but can be minimized - Error can be assessed by statistical tests

and calculate the size of the errors

minimized and approximated to an acceptable precision

Some ways to overcome errors Carry out replicate measurements Analyse accurately using known standards or standard reference materials (SRM) Perform statistical tests on data

3. SYSTEMATIC/DETERMINATE ERROR Operator/Instrument error/Method error - All data too high/too low or data increases with magnitude of measurement - Causes bias in technique (either +ve +ve or ve) ve) - Affects accuracy - May be detected by: - blank determinations, - analysis of standard samples, - independent analyses by alternative/dissimilar methods - Can be avoided/eliminated avoided/eliminated by correcting instrument, method and personal errors* errors*

*Ways to minimize/eliminate systematic errors Instrument errors: - Careful recalibration and good maintenance of apparatus (eg (eg glassware) and instruments ( (eg eg AAS, GC)

materials (SRM) Use 2 or more independent methods - Analysis of blanks

Personal errors:

- Training of operator, care and selfself-discipline

when a standard sample is analyzed (value estimated from results of varying precision depending on the method used) Accuracy - nearness of a measurement or result to the true value (expressed in terms of error) Precision - variability of a measurement (Standard deviations are precision indicators) SpreadSpread - difference between the highest and lowest results in a set (spread is a measure of precision) Mean - average of a replicate set of results Median - middle value of a replicate set of results

Degree of Freedom - number of results in a set (each time another quantity is derived from the set, the degrees of freedom are reduced by 1) Range - difference between the highest and lowest value of the results Standard Deviation (s or ) - difference, with respect to sign, between an individual result and the mean or median of the set Relative Standard Deviation (RSD) - Also known as the coefficient of variation, often used in comparing precisions Variance (V) (V) - square of the value of standard deviation (2 or s2)

Determinations/Formula

MEAN (AVERAGE) MEDIAN

STANDARD DEVIATION Measure of spread about the mean Estimate the variability of individual measurement (The standard deviation is better estimated by the pooling of results from more than one set)

divided by number of measurements

N

order, if data in the middle is an odd number record it as the median Arranged in ascending order, if two middle data are even numbers then average the two numbers

x =

i = 1

N-1 = degree of freedom

xi

(

x

2222

xxxx

))))

iiii

ssss

iiii

1111 NNNN

aka population, N = Number of replicate

2222

))))

iiii

RELATIVE STANDARD DEVIATION (RSD)/ COEFFICIENT OF VARIATION (CV) Standard deviation divided by mean (depends on the units used)

Mean = xi/N = 0.077 (x xi-mean)2 = 4.01x10-4 VARIANCE The square of standard deviation - Sample variance ( 30) 30): V = s2 - Population variance (large #) #): : V = 2

Sample 1 2 3 4 5 6 7 8 9

Se (mg/g) 0.07 0.07 0.08 0.07 0.07 0.08 0.08 0.09 0.08

(xi - mean) 4.9x10-5 4.9x10-5 9.0x10-6 4.9x10-5 4.9x10-5 9.0x10-6 9.0x10-6 1.69x10-4 9.0x10-6

S.D. =

s=

(x

i

x)2

= 0.007

N 1

STD. DEV. FOR POOLED DATA (Spooled) To achieve a value of good approx. to s for N 30, it is sometimes necessary to pool pool data from a number of sets of measurements Suppose there are t small sets of data, comprising N1, N2,.Nt measurements, the equation for the resultant sample standard deviation is:

Analysis of 6 bottles for sugar

Bottle Sugar (% ) 1 0.94 2 1.08 3 1.20 4 0.67 5 0.83 6 0.76

2222

Obs 3 4 5 4 3 4

2222

Deviations from mean 0.05, 0.10, 0.08 0.06, 0.05, 0.09, 0.06 0.05, 0.12, 0.07, 0.00, 0.08 0.05, 0.10, 0.06, 0.09 0.07, 0.09, 0.10 0.06, 0.12, 0.04, 0.03

2222

2 2 2

N1

N2

N3

= (

5 0 . 0

) +(

2

0 1 2222 . 0

) +(

8 0 . 0

9 8 1 2222 0 . 0

) =

spooled =

i =1

i =1

i =1

N1 + N2 + N3 +......t

S 1 2 3 4 5 6 Total

ssss

7 9 0 . 0 =

1111

i

6 6666 2 3 1 3 . 0 2

ssss

% 8 8 0 . 0

d e l o o p

Solve this Problem Given a set of diameters of four cells in units of m, 120, 135, 160 150 (a) Use functions available in your calculator (b) Use the Excel Spreadsheet (at your own time and submit the data and result printout) Calculate the following: - Mean - Median - Standard Deviation - Relative Standard Deviation (RSD) - Variance

PRECISION

- Reproducibility (repeatability) of repeated measurements ie How similar are values obtained in exactly the same way? Useful for measuring deviation from the mean

d i = xi x

ACCURACY Nearness (proximity) to the true value, ie. measurement of agreement between experimental mean and true value (which may not be known!) Measures of accuracy:

- Absolute error:

- Relative error: E R = |

xi | 100%

Discussion Question 1 Four students analyzed Fe content in a sample. Each student performed 5 replicates and the results are illustrated below. Comment on the accuracy and precision of each set of results (Hint: Student C obtained the best results)

True value A B C D 9.80 10.00 10.20 mean 10.10 9.90 10.01 10.01

Discussion Question 2 - Comment on the accuracy and precision of the following results. Explain or show proof? - Which set of data has to be thrown out (discarded)? (discarded) ? Why?

Student A B 10.10 10.08 10.09 10.07 10.08 10.10 0.01 C 9.65 9.75 9.78 10.07 10.24 9.90 0.25 D 9.97 9.98 10.02 10.03 10.05 10.01 0.03 E 9.80 9.89 10.01 10.13 10.22 10.01 0.17

X

DATA VALUE 10.00 10.00 10.00 10.00 10.00 10.00 0.00

CONFIDENCE LIMIT & CONFIDENCE INTERVAL Confidence Interval (CI) is the range of values surrounding the mean, mean, within which the population mean, is expected to lie with a certain degree of probability The boundries of the range are called the Confidence Limits Confidence Level (CL) is the probability that the true mean lies within a certain interval (expressed as %) Example: It is 99% probable that for a set of measurement is 7.25mg 0.15. Thus, the mean should lie in the interval from 7.10mg to 7.40mg with 99% probability

CI for large no. of data (>30) with known population std deviation, CI for small no. of data (30) without knowing (know s)

=x z N

=x ts N

Values of z for determining confidence limits Confidence level (%) 50 68 80 90 95 96 99 99.7 99.9 z 0.67 1.0 1.29 1.64 1.96 2.00 2.58 3.00 3.29

N = Number of measurements z = values from normal distribution curve (Read from the zz-table) t = values from normal distribution curve but depends on the degree of freedom (N(N-1) (Read from the tt-table) t is also known as the students t, generally used in hypothesis tests

Degrees of Freedom (N (N-1) 1 2 3 4 5 6 7 8 9 19 59 80% 3.08 1.89 1.64 1.53 1.48 1.44 1.42 1.40 1.38 1.33 1.30 1.29 90% 6.31 2.92 2.35 2.13 2.02 1.94 1.90 1.86 1.83 1.73 1.67 1.64 95% 12.7 4.30 3.18 2.78 2.57 2.45 2.36 2.31 2.26 2.10 2.00 1.96 99% 63.7 9.92 5.84 4.60 4.03 3.71 3.50 3.36 3.25 2.88 2.66 2.58

SAMPLE QUESTION (CONFIDENCE INTERVAL) Calculate the confidence interval (CI) at 95%, 90% & 99% confidence level given the following data for the analysis of Ca in a rock sample: 14.35, 14.41, 14.40, 14.32, 14.37 Mean = 14.37, s = 0.037 From table: @ confidence level 95% & NN-1 = 4, t = 2.78 = 14.37 2.78 x 0.037 CI = = x t s =

Confidence interval is 14.37 0.05 or 14.32<< 14.42 Summary of results (calculate the rest by yourselves): @ Confidence level Confidence interval (CI) 90% = 14.37 0.04 95% = 14.37 0.05 = 14.37 0.08 99% If confidence level increases, the CI increases, and the probability of appearing in the interval also increases

AAS analysis of Cu in aircraft engine oil gave a mean value of 8.53 mg Cu/mL Cu/mL. . Pooled results of many analyses showed that s = 0.32 mg Cu/mL Cu/mL. . Calculate the confidence intervals (CI) at 90% & 99% confidence levels based on (a) 1 (b) 4 (c) 16 measurements (a) Confidence limit (CL) = = x t s

(b)

90%, CL = 8.53

99%, CL = 8.53

N

(c)

90%, CL = 8.53

@ 99%, CL = 8.53

99%, CL = 8.53

Analysis of an insecticide gave the following values for % of Lindane: 7.47, 6.98, 7.27. Calculate the CL for the mean value at the 90% confidence level

OTHER USAGE OF CONFIDENCE INTERVAL To determine # of replicates (N) needed for the the mean to be within the confidence interval To determine systematic error

2 i

x=

x

N

2172 . = 7.24 3

s=

@90%, CL = x ts

= 7.24

Example 1: 1: Calculate the number of replicates needed to change the confidence interval by 1.5 g/mL at 95% confidence level. Given, s = 2.4 g/mL

Example 2: 2: Ten measurements on a sample gave a mean of 0.461, with std dev of 0.003. A solution gave a reading of 0.470. Show whether systematic error exists at 95% confidence level At 95 95% % confidence level, (N (N 1) = 9, t = 2.26

(0.003 ) ts = 0.461 2.26 N 10 = 0.461 0.002 This means, 0.459 < < 0.463, ie 95% of the time, the true value lies between 0.459 to 0.463 Therefore, the the reading 0.470 is NOT in the range, and systematic error EXISTS

= x

DISTRIBUTION OF ERRORS

NORMAL or GAUSSIAN distribution (bell shaped, symmetrical curve) gives limits within which the population mean () is expected to lie with a given degree of probability (without any systematic error)

50% -0.67s +0.67s 80% -1.29s

dN/N

95% +1.29s

dN/N

Based on the curve, percentages of area under the curves between certain limits of z are as follows: 50% of area lies between 0.67s 80% " 1.29s 90% " 1.64s 95% " 1.96s 2.58s 99% " When we say that at a confidence level of 80%, the confidence limits are 1.29s we mean that: - 80% of the time the true mean will lie between 1.29s of the measurements made - or in other words 20% of the time the true mean will NOT lie between 1.29s

-1.96s

+1.96s

dN/N

1s 2s 3s 4s

-4s -3s -2s -1s 0 1s 2s 3s 4s -4s -3s -2s -1s 0 1s 2s 3s 4s -4s -3s -2s -1s 0

mean is indicated by

SIGNIFICANCE TESTS

Tests whether the difference between two results is significant (or merely due to random variations) - used to decide whether the difference between the measured and known values can be explained by random errors The NULL HYPOTHESIS, HYPOTHESIS, Ho If Ho is accepted: accepted: means there is NO significant difference between observed and known values (other than that due to random observation) If Ho is rejected: rejected: means difference is significant

Has two uses: (1) Comparison of true value, and mean, to detect if difference is significant - Used to detect the existence of systematic error or bias Calculate t (generally for 95% confidence level) If value of tcalculate < tcritical (ie tcalc < ttable), ACCEPT the null hypothesis, thus Ho: = Accepting Ho means that there is NO significant difference (or no systematic error) at the 95% confidence level, but there is 5% probability that there is a sgnificant difference

(2) Comparison of means ( ) of two samples - eg Compare mean of new method with a reference (or standard) method - Accept Null hypothesis (Ho) if NO significant difference between methods ie the results are the same, or =0 - Calculate t, if tcalc < ttable, accept Ho to show that there is NO significant difference in results Use pooled estimate of std dev, s2={(n1-1)s12+ (n2-1)s22} / (n1+n2-2), or

The F Test

F-TABLE

- One tailed test: test: test whether method A is more - Two tailed test: test: test whether methods A and B - F is ratio of two

sample variances:

precise than method B (assumes A is always precise) differ in their precision (ie any method can be precise)

F=

s2 1 1 = 2 2 s2

Ho: Population variances are equal (or 1) [F is always >1, thus the smaller ie the more precise is always the denominator] If Fcalc < Ftable (Accept Ho) which means that there is NO significant difference in precision between the two methods

Example Question: ONEONE-TAILED F TEST A proposed method for COD of wastewater was compared with a standardized method The results are given as follows: Standardized method (8 (8 determinations): determinations): mean =72 mg/L, s = 3.31 mg/L determinations): Proposed method (9 (9 determinations): mean = 72 mg/L, s = 1.51 mg/L () Is the proposed method significantly more precise than the standardized method? F = (SStd)2/(SProp)2 = (3.31)2/(1.51)2 = 4.8 Data values: 8 for Std & 9 for proposed, thus from the FF-table degrees of freedom (N(N-1) = 7numerator and 8denominator, Fcrit = 3.50 Since Fcalc >Ftable , reject Ho. Thus there is a significant difference bet the methods and the proposed method is significantly more precise

Set as denominator

Example: Determination of CO using a Standard Procedure gave an s value of 0.21 ppm. The method was modified twice giving s1 of 0.15 and s2 of 0.12 (both 9 degrees of freedom). Are the modified methods significantly more precise than the std? Ho : s1 = sstd Ho: s2 = sstd

F1 =

2 std 2 1

F2 =

In standard methods the # of data is large, thus s, & degrees of freedom becomes infinity, From FF-table, num=, den=9; Fcrit = 2.71 F1< Ftable : accept Ho but F2>Ftable : reject Ho Only the 2nd modified method is is significantly more precise than the standard method

The Q TEST or DIXONS TEST (Detection of gross errors) The QQ-Test is used for detecting outlier (suspected unreasonable data) which statistically does not belong to the set Example: Example : 10.05, 10.10, 10.15, 10.05, 10.45, 10.10

normal range (More easily observed when numbers are arranged in a decreasing or increasing order) 10.05, 10.05, 10.10, 10.10, 10.15, 10.45

The Qcal is compared with the Qtable and the null hypothesis, Ho is checked

Q expt =

= 0.75

From QQ-table (@95% & N=6) Q = 0.625 (Q-table:Next slide ) Qcal > Qtable data (10.45) can be rejected

will change from the original value if changed!)

Contd

Q TABLE No. of Observations 3 4 5 6 7 8 9 10 Confidence Level 90% 0.941 0.765 0.642 0.560 0.507 0.468 0.437 0.412 95% 99% 0.970 0.829 0.710 0.625 0.568 0.526 0.493 0.466 0.994 0.926 0.821 0.740 0.680 0.634 0.599 0.568

EXAMPLE QUESTION: QQ-TEST The following data was obtained for the determination of nitrite concentration (mg/L) in a sample of river water: 0.403, 0.410, 0.401, 0.380, 0.400, 0.413, 0.411 Should the data 0.380 be retained? Q = |0.380 - 0.400|/|0.413 - 0.380)| = 0.606 From the QQ-table: Sample size 7, Qtable = 0.570 Qcalc>Qtable, thus the suspect outlier is rejected

- Ch8 Confidence Interval Estimation Part1 - Suggested Problems SolutionsUploaded bymaxentiuss
- Sampling Paper June 2012Uploaded byMells
- Kimia analitik metode evaluasi dataUploaded bynurma lia
- Measurements and Instrumentation Test ExamplesUploaded byrthinchey
- finalpresentationwriteupUploaded byapi-318166914
- Mini TabUploaded byCidAlexanderRami
- Characterization of Photovoltaic Panels - The Effects of DustUploaded byMaría Ignacia Devoto Acevedo
- Curve Fitting Made EasyUploaded bybilderin
- Chapter 13 SolutionsUploaded byGreg
- Effect Size EstimationUploaded bysubcribed
- Data Quality Guidance 100709Uploaded bySunlight Foundation
- skittles term projectUploaded byapi-317268406
- exam2_sol_58_f07.docUploaded byMilton Stevens
- Application of reliability analysis in Heritage buildingsUploaded bySayantani Lala
- cep writing scoring rubric-4 1Uploaded byapi-322460881
- 2 C - Signal Software PaperUploaded byEsra'a Alhaj
- 668969873Uploaded byKomoriTakao
- Invoice Accuracy - Line ItemsUploaded byمحمود الخطيب
- Importance of Accuracy and Precision.Uploaded byvidhanbhaiya
- Full TextUploaded bymhariri269
- Chi Cuadrado (Luque)Uploaded byAlejandro Escallon Escallon
- EST.10Uploaded byBianca Ursachi
- Sample_size_calculation_for_a_single_cross_sectional_survey (2).docUploaded byMuhammad Fikru Rizal
- MB0040Uploaded byKumar Gaurav
- StatProbCE_week 9.pdfUploaded byAndrianPratama
- Summary StatisticsIIaUploaded byMartin Altmann
- Racial Gap in Management of Pediatric Appendicitis PainUploaded bymysteriousdress42
- SPE-10279-MSUploaded byPedro Guerrero
- WAUploaded byRAJA
- articol biostatisticaUploaded bym_963934629

- PFD ModificationUploaded byLei Yin
- Appendix C CITablesUploaded byErika Madrazo
- Ch4 Effective Interest Feb 2013Uploaded byLei Yin
- Ch6_AnnualWorthAnalysisUploaded byLei Yin
- Ch1 Foundations_Engineering Economic ExerciseUploaded byLei Yin
- Ch1 Foundations_Engineering Economic ExerciseUploaded byLei Yin
- Plant Design Costing RevisionUploaded byLei Yin
- Annuity Problems for EngineeringUploaded byAnonymous I7TYFsv
- Eit OrginalUploaded byLei Yin
- Plant Design_Separation_Tower DesignUploaded byLei Yin
- 29920922 Sample Problem 6Uploaded byLei Yin
- Blank Tarquin Engineering Economy Selected Solutions 6th Ed Chapter 1Uploaded byLusash1
- skkk4173_Assignment2_EEUploaded byLei Yin
- skkk4173_Assignment1_engineering economicUploaded byLei Yin
- 2.2 Data AnalysisUploaded byLei Yin
- 04 Script Examples Solid Liquid ExtractionUploaded byLei Yin
- 04 Script Examples Solid Liquid ExtractionUploaded byLei Yin
- 1.1 Introduction of analytical chemistryUploaded byLei Yin
- Blank Tarquin Engineering Economy Selected Solutions 6th Ed Chapter 2Uploaded byLusash1

- Lecture Chp10Uploaded byMurali Dharan
- Factorial Analysis of Variance PDFUploaded byLauren
- Ch26 ExercisesUploaded byamisha2562585
- Test Help StatUploaded bythenderson22603
- 4577-14559-1-SMUploaded byHiển Hồ
- Likert ScalesUploaded bykccasey
- Project Report.doc LohitUploaded byBalan Victor
- Asymmetric Effects of Monetary Policy in the US and BrazilUploaded byAndré Cordeiro Valério
- Anova en SPSSUploaded byvsuarezf2732
- Chapter 4 RegressionUploaded byIvan Ng
- Microstructure and Fatigue Resistance of Carburized Steels.pdfUploaded byyh1.yu
- DelGiudice Colle 2007 Enjoyment Smiles DpUploaded byd3e3utzaa
- Multiple classification analysis in trip production modelsUploaded byAnonymous 22TyVaW7
- Measuring MSE With MBI10 Stock Index - FINAL.2Uploaded bytantur
- Eas6490 Fan Hw4Uploaded byfangatech
- PLS Toolbox Quick Reference jUploaded byJuan Olivares
- ModenMethods.pdfUploaded byBazaraa Enkhbaatar
- Experimental Design- TQMUploaded byRishi Balan
- Henson 2005 FIL Technical NoteUploaded byFederico Nemmi
- Berenguer (2009)-Do Upscale Restaurant Owners Use Wine Lists as a Different at Ion StrategyUploaded byAustin Christine
- Practice an OvaUploaded byHassan Khan
- Calibration ModelsUploaded byRodrigo Vallejos Vergara
- 09603100210105030Uploaded byabdulwahab0015
- Development of a ed Extraction Procedure and Certification of a Sediment Reference MaterialUploaded byMert Sapanci
- 39.pdfUploaded byHuma Malik
- The Impact of Macroeconomic Indicator on Share Pricing in Nigeria Capital Market by Adekoya AdeniyiUploaded byAdekoya Adeniyi
- Take Home Quiz 1Uploaded byHealthyYOU
- Michael Joseph-Introductory EconometricsUploaded byiolcdhnl
- Curran,West&Finch (1996)Uploaded byShare Wimby
- jaai-1.pdfUploaded byHendi Dwi Istanto