ASTM - G1695 - Applying Statistics To Analysis of Corrosion Data

AMERICWJ SOCIETY f=Ofl TESTING AND MATEftiAlS
I Designation: G 16 - 95 1S16KS! Ptnldflphia.Fa'BIQJ

ilmKJtroni tKt*inuit B* o *STW SttndK, Copulo* ATM
Standard Guide for

Appiying Statistics to Analysis of Corrosin Data1
Tbi) DdHd is issim) undcr Ihc fixsd detisnation O li; ihe numhcr immnlimy foitowmi !he dcsiination indcales the ym of
onginil opcin or, ia Ihe case of revisin. M ycu of la revion. A number m parcnthes indcale! tlw year of UB
supeiscrifK epuln (if indicats n edilarii change since the bst revisin or napproval.
1. Scope estimated from the measured results.

.l This guide presente briefly some generally accepted
tnethods of statistical analyses which are useful in Ihe 4. Errors
interpretation of corrosin test result. 4.1 DistributionsIn the measurement of vales associ-
1.2 This guide does not cover detailed calculations and ated with the corrosin of metis, a variety of factors act to
methods. but rather covers a range of approaches which have produce raeasured vales that deviate from expected vales
found application in corrosin testing. for the conditions that are presen!. Usually the factors which
1.3 Only ihose statistica! methods that have found wide contrbute to the error of measured vales act in a more or
acceptance in corrosin testing have been considered in this less random way so that the average of severa! vales
gutde. approximates the expected valu better than a single mea-
surement. The partera in which data are scattered is caiied its
2. Referenced Docaments distrbution, and a variety of distributions are seen in
corrosin work.
2.1 ASTM Standards; 4.2 HisogromsA bar graph called a histogram may be
E 178 Practice for Dealing with Ouying Observatkms* used to display the scatter of the data. A histogram is
E 380 Practice for Use of the International System of constructed by dividing the range of data vales into equal
Uns (SI) (the Modernizcd Mctric System)2 intervals on the abscissa axis and then placing a bar over
E 691 Practice forConducting an Interlaboratory Study to each interval of a height equal to the number of data points
Determine the Precisin of a Test Method2 wiihin that intervaJ. The number of nlervah should be few
G46 Guide for Examination and Evaluaon of Pilting enough so that almost all intervals contain at least threc
Corrosin3 points, however thcrc shoud be a sufficient number of
intervals to faciltate visualizaton of the shape and sym-
3. Sienificance and Use metry of the bar heights. Twcnty intervals are usually
3.1 Corrosin test results often show more scatter than recommended for a histogram. Because so many paints are
macy other types of tests because of a variety of factors, required to constnict a histogram, it is unusual to nd data
inciuding the fact thal mino: impurties oflen play a dechive sets in corrosin work that lend themselves to this type of
role in controlng corrosin ratcs. Slatisiical analysis can be analysis.
voy helpfu! in allowing invesgators to interpret such 4.3 Normal DisiributionMany statisiical techniques are
results, especially in determiniag hen test results differ from bsed on ihe normal distribution. This distnbution is bell-
one anotber signicantly. This can be a dilTicult lask when a shaped and symmetrcal. Use of analysis techniques devel-
variety of materials are under test, but siatistical methods oped for the normal distribution on data distribuid in
provide a rational approach to this probiem. another manner can lead to grossly erroneous conclusions.
3.2 Modcrn data reduction programs in combination with Thus, befare attempting data analysis, the data should either
computen have allowed sophisticated statisticat analyses on be verified as bemg scattered like a normal distribution, or a
data seis with relati ve sase. This capability permite investiga- transformation should be used to obtan a data sel which is
tors to determine if associations exist betwecn many vari- approximately normally distributed. Transibrraed data may
ables and, if so, to devctop quantitative expressions relating be analyzed statistically and the results transformed back to
the variables. gjve the desired resuits, althoufth the process of transfonning
3.3 Statistical evaluation is a necessary step in the analysis the dala bock can crale problems in terms of not having
of results from any procedure which provides quantimtive synimetrical confidence intervals.
information. This analysis allows condence intervals to be 4.4 Normal Probadiliy Paperl the histogram is not
confirrnatory in terms of the shape of the distribution, the
data may be examined further to see if it is normaiiy
'TtiEUdeis under Ibejurisdtclianof ASTM OxnmineeG-l i Conosior of dislributed by constructing a normal probability piot as
Meuh and is Ihe direct mponsibility of Subecmm|t COI .OS on Laboralory
Correxion TBU. described as follows (I)-*
Curre ediiion ipproved JM. is, 99S. Pxiblished Man* 1995. OriBlly
publiihcd u C 16 - 7], Ua previmo cdriion O 16 - 44.
1
Aimal fot* /ASTM Sflndwdi. Vcl W 97
4.4,! U S easiest to constmct a normal probability plot if x = original datura, and
normal probability paper is available. This paper has one n ==- number of data points,
linear axis, and one axis which is arranged to refleci the Time to failure in stress corrosin cracking usually is best
shape of the cumulative rea under the normal distribution. fittcd with a log x transformation (2, 3).
In practico, the "probability" axis has 0.5 or 50% at the Once a set of transformed data K found that yields an
center, a iwmber approachjng O percent at one end, and a approximately straight line on a probability plot, the statis-
number approaching 1.0 or 100 % at the other end. The tical procedures of interest can be carried out on the
martes are spaced far apart in the center and cise logetber at transformed data. Resulta, such as predicted data vales or
the ends. A normal probability plot may be construcied as confidente intervals, most be transformed back using the
follows wiili normal probability paper: reverse transformation.
4.6 Unknown DistributionIf there are insuflicient dala
NOTE 1Dala that plot apprOMroaiely on a stiaight line on ihe
probability po! may be considerad to be normay dfetributed. Devia- points, orif for any other reason, the distribution type of the
tions [rom a normal distribution may be rceognizcd by the presence of data cannot be determined, then two possibilities exist for
deviaiions from a siraight Une, usuaJly rnosi noiL-eable at the extreme analysis:
ends of the data. 4.6.1 A distribution type may be hypothesiied based on
4.4.1.1 Number the data points starting at the largest the behavior of similar types of data. If this distribution is not
negative valu and proceedng to tbe largest positive valu. normal, a transformatton may be sought which mil nor-
Tbe numbers of the data points bus obtained are called the malize that particular distribution. See 4.5 above for sugges-
ranks of the points. tions. Analysis may then be conducted on the transformad
4.4.1.2 Plot each point on the normal probability paper data.
such that when the data are arranged in orden y (1), y (2), y 4.6.2 Statistical analysis procedures that do not require
( 3 ) , . . , , these vales are called the order statistics; the linear any specific data distribution type, known as non-para me trie
axis reflects the valu of the data, while the probability axis mclhods, may be used to analyze the data. Non-parametric
location is calculated by subtracting 0.5 from the number tests do not use the data as efficently.
(rank) of that point and dividing by the tola! number of 4.7 Extreme Valu AnalysisIn the case of determining
poinu in the data set. the probability of perforation by a pitting or cracking
mechanism, the usual descriptive statistics for the normal
UTE 2ccasianally two or more identical valu! are obtained in a distribution are not the mosi useful. In this case. Cuide
G 46 shouid be consulted for the procedure (4).
poini may be located at the average of the ploni ag posions for all the
idntica! vales. 4.8 Significara Dgtsftactice E 380 shouid be followed
to determine the proper number of significan! dgits when
4.4.2 If normal probability paper is not available, the reporting numrica! results.
location of each point on the probability plot may be 4.9 Propagaron of VarianceIf a calculated valu is a
determined as follows: fbnction of severa! independen! variables and those variables
4.4.2.1 Mark the probability axis usiig linear grada tions nave errors associated with them, the error of the caicuiated
from 0.0 to 1.0. valu can be estmated by a propagaron of variance tech-
4.4.2.2 Por each point, subiraci 0.5 from the rank and nique. See Refs. (5) and (6) for details.
divide the result by the tota! number of points in the data set 4.10 Mistakef-~Mistakes either in carrytng out an exper-
This is the rea to the left of that valu under the standard- iment or in calculations are not a characteristic of the
ized normal distribution. The cumulative distribution tunc- populadon and can predude Statistical treatrnent of data or
tion is the number, aiways berween O and I, that is piotted tead to erroneous conclusions if includcd in the analysis.
on the probability axis. Sometimes mistakes can be identied by statislical metbods
4.4.2.3 The valu of the data point defines its location on by recognizing that the probability of obtaining a particular
the other axis of tbe graph. result is very low.
4.5 Other Probabtlily PaperIf the hislogram is not 4.1 [ Oalying ObsenativnsSee Praclice E 178 for pro-
syinmetrical and bell-shaped, or if the probability plot shows cedtires for dealing with outlying observationg.
nonlinearity, a transformation may be used to obtain a new,
transformeri data set that may be normally distributed. 5. Central Measures
Aithough it is sometimes possible to guess at the type of 5.1 It is accepted practiee to employ severa! independen!
distribuon by looking at the hstogram, and hus determine (replcate) measurements of any experimental quantity to
the exact transformation to be used, it is usually just as easy improve the estmate of precisin and to reduce the variance
to use a computer to calclate a number of different of the average valu. If it is assumed that the processes
transformad ons and to check each for the norrnaiity of the operaling lo crate error n the measurement are randera in
transformed data- Some transfonnations based on known nature and are as likely to overestimate the true unknown
non-normal disributions, or that nave been found to work valu as to underestimate it, then the average valu is the
n some situations, are Usted as follows: best estmate of the unknown valu n question. The averag*
valu is usually indicated by placing a bar over the symbol
represeritng the measured variable.
y-lfx NOTE 3In this sundn), the ttrm "mean" i> reserved w describe a
where: central measure of a population. while average refera to a samle,
y = transformed dalum. 5.2 If processes oprate to exaggerate the magnitude of the
<|fi) G16
error either in ovcrestimatng or underestimating the correct tan! to note dcarty whether the valu reponed is the
measurement, then the medan valu is usually a better standard deviation of the average or of a single valu. In
estmate. either case, the number of measurements shouid also be
5.3 If the processes operating to crate error affect both reported. The sampie estmate of tbe standard deviation is s.
the probability and magniude of the error, then olher 6.4 Coeffkienl of VaationThe pcpulation coefficent
approaches mus be empioyed to fnd the besl estimation of variation is defined as the standard deviation divided by
procedure. A qualified statistician shouid be consultcd in this the mean. The sampie coefficieru of variation may be
case. calculated as S/ and is usualy reported in percent. This
5.4 In corrosin testing, it is generally observed ihat measure of variability is particutarly useful in cases where the
average vales are useful in characterizing corrosin rates. In size of the errors is proportional to the magnitude of the
cases of penetration from pitting and cracking, Mure is measured valu so that the cocfficient of variation is approx-
oen defined as the first through penetration and in these imatcly constan! over a wide range of vales.
cases, average penetration rates or times are of little valu. 6.5 RangeThe range is defined as ie difference bc-
Extreme valu analysis has becn used in these cases, see tween the mximum and mnimum vales in a set of
GuideG46. replcate data vales. The range is non-parametric in nature,
5.5 Whcn the average valu is calcuhted and reponed as that is, iis ealcuiation makes no assumption about the
the or.ly rcsult in experiments when several replcate runs distribution of error. In cases when smalj numbers of
were made, information on the scatter of data is lost. replcate vales are in volved and the data are normally
6. Variablty Measwres dstributed, the range, w, can be used to estmate the
standard deviation by the relationship:
6.1 Severa! measures of distrbution variability are aval-
abie which can be useful in estimaling confidence intervals s !2
and making predictions from the observcd data. In the case (3)
of normal thstribuon, a number of procedures are available
and can be handled with computer programs. These mea- where:
sures include the foowing: variance, standard deviation, S = the estimated sampie standard deviation,
and cocrcient of variation. The range is a useful non- w = the range, and
parametric estmate of variability and can be used with both n = the number of observations.
normal and oiher distributions. The range has the same dimensions as standard deviation. A
6.2 Variarte?Variance. a2, may be estimated for an labulation of the relationship between a and w is given in
experimental data sel of n observations by compung the Ref, (7).
sampie estimated variance, S2 assuming all observations are 6.6 PrecisinPrecisin is closeness of agreement be-
subject to the same enors: tween randomly selected individual measurements or test
results. The standard deviation of the error of measurement
may be used as a measure of imprecisin.
6.6.1 One aspect of precisin concern* the ability of one
where: investigator or laboratory to reproduce a measurement
d -the dSrence between the average and the mea- previously made at the same Eocation with the same method.
sured valu, This aspect is sometimes calied repeaubihty,
n - I = the degrees of freedom available. 6.6.2 Another aspect of precisin concems the abiiily of
Variance is a useful measure because it is additive in systems rfifTerent in vesiigators and laboratories to reproduce a mea-
that can be described by a normal distribution, however, the surement. This aspect is sometimes calied reproducibility.
dimensions of variance are square of unils. A procedure 6.7 BioBias is the doseness of agreement between an
known as analysis of variance {ANOVA) has been de veloped observed valu and an acceped reference valu. When
for data sets involving several factors at dferent levis in
applied to individual observations, bias includes a combina*
order to estmate the efTects of these factors, (See Section 9.) don of a random componen! and a componen! due to
6.3 Standard DeviationStandard deviation, o-, is defmed
as the square root of the variance. Ii has the property of systematic error. Under these circumstances, accuracy con-
having the same dimensions as tbe average valu and (he tains elements of both precisin and bias. Bias refere to the
original measurements from which it was calculated and is tendency ufa measurement technique to consisten t)y under-
gcncrally used to describe the scattcr of the observations. or overestimate. In cases where a specic quantity such as
6.3.1 Standard Deviation of (he AverageThe standard corrosin rale is being estimated, a quantitatve bias may be
deviation of an average, SZ, is difieren! from the standard determino!
deviation of a single measured valu, bul the two standard 6.7.1 Corrosin test methods which are intended to sim-
deviations are related as in Eq (2): late service conditions, for example, natural environments,
often are more se ver on some material s than others, as com-
a <2 pared to the conditions which the test is smulating. This is
-T, > particular!:, truc for test procedures which produce damage
where: rapdly as compared to the service experience. In such cases,
n =* the total number of measurements which were used to t is importan! to establtsh the coirespondence between re-
calclate the average valu. sults from the service environment and test results Cor the
When reporting standard deviation calculations, t is impar- class of material in question, Bias in this case refere to the
G16
vanation in the accderatton of corrosin or differeni mate- t S(x) representa the one half width confidence interval
rials. associated with the signifcance level chosen.
6.7.2 Another type of corrosin test method measurcs a 7.3.3 The i test is often used to test whether there is a
characteristic that s related to thc tendency of a material to significant difference between two sample averages. In this
sufter a form of corrosin damage, for example, pitting po case, the expression becomes:
tential. Bias in this type of test refera to the nability of the
test to properly rank the malcriis to which the test applies as -S, !
t (8)
compared o service results. Ranking may also be used as a
qualitative estmate of bias in the test method types described
in6.7.l.
where:
Xi and Jc2 are the sample averages,
n, and ti are the number of measurements used in calcu-
7. Statistlcal Tests lating r and xi respectiveiy, and
S(x} is the pooled estmate of the standard deviation from
7.1 ut Hypoihesis Staislica! Tests are usuatly carried both seis of data.
out by postuladng a hypohesis of the form: the distribution
of data under test is not significantly different from some
postulated distnbuticn, It is necessary to establish a proba-
biiity that wll be accepiable for rejeciing the nuil hypothesis 2 -1 )
In experimental work it is conventional to use probabiliries
of 0.05 or 0.01 to reject the nuil hypothesis. 7.3.4 One sided test. The function is symmetrical and
7.1.1 Type ! errors occur when the nuil hypothesis is can nave negative as well as positivo vales. In the above
rejected talsely. The probability of rejecting the nuil hypoth- examples, only absolute vales of the differences were
esis falsely is described as the signicance level and is ofien discussed. In some cases, a nuil hypothesis of the form:
li>m
7.1.2 Type II errors occur when thc nuil hypothesis is or (10)
accepted falsciy, If the signiicance level is set too low, the <m
probability of a Type II error, ft becornes larger. When a may be desired. This is known as a one sided : test and the
valu of a is set, the valu of 0 is also set, With a fixed valu signifcance level associated with this valu is haif of that for
of a, it is possible to decrease ft only by increasing the sample a lwo sided i.
size assuming no other factors can be changed to improve 7.4 F TestThe /"test is used to test whether the variance
the test. associated with a variable, je,, is significan!ly different from a
7.2 Degrees cf Freedom^fbe degrees of freedom of a
statisical test refer to thc number of independen! measure- variance associated with variable x,_. The F statistic is then:
ments that are available for thc calculador..
7.3 ( TesTbe t statistic may be written in the fonn: The F test is an important componen! in the analysis of
\X-
variance used in experimental designs. Vales of F are
tabulated for signifcance levis, and degrees of freedom for
both variables. In cases where the data are not normally
wbere: distributed, the F test approacb may falsely show a signifi-
A- is the sample average, car.! effect because of the non-normal distribution rather
li is thc populaton mean, and than an actual difference in variances being compared.
S(x) is the estimated standard deviation of the sample 7.5 Correlation CotfiidenThe correlation coefficient, r,
average. is a measure of a linear associaon between two random
The / distribution is usually tabulated in terms of significa nce variables. Correlation coefficients vary between -1 and +1
levis and degrees of freedom. and the closer to either -1 or 4-1, the better the correlation.
7.3.1 The t test may be used to test the nuil hypothesis: The sign of the correlation coefficient simpiy indcales
m -n (5) whether the correlation is positivo (y increases with x) or
For example the valu m is not signficantly different than (i, negative (y decreases as x increases). The correlation coefTi-
the populaton mean. The t test is trien: cient, r, is given by:
I2
[zft - ZMyt - y)}* B? - t^X^,-3 as: < >
su-> v^ x, = ohserved vales of random variable x,
The calculated valu of may be compared lo the valu of / y, * observed vales of random variable y,
for the degrecs of freedom, n, and the signifcance leve!, x = average valu ofjc,
7.3.2 The statistic may be used to obtain a confidence y = average valu of y, and
interval for an unknown valu, for example, z corrosin rate
valu calculaled from several independen! measurement& Generally, r2 vales are 2preferred because they avoid thc
problem of sign and the i vales relate directly to variance.
(X - t T)) < u < (x + t S(x)) (7) Vales of r or r3 have been tabulated for different signifi-
where: cance levis and degrees of freedom. In general, it is desirable
I G16
to neport vales of r or r2 when presenting correlatjons and x = the independent variable,
regression analyses. m = the slope of the estimated une,
NOTE 4tThe procedure for calculatng ^orrelation coeficier.t does
A = the y intercept of the estimated Une,
noi require ihat the x and y variables be random and consequeniiy,
2x = the sum of je vales etc., and
some investigaofs lave used the correlatio coefficien as an indication n = the number of observations of x and y.
of goodness of fit of dala in a regiesuon analysis. However, the This standard deviation of m and the standard error of the
significance test uiing coirelatioa coefficient requires Ihal the x and y expression are often of inters! and may be calculated easily
valu be independen: variables of a popuiation measured on randomly (5, 7, 9). One problem wih linear regression is that all the
setected samples. error; are assumed to be associated with the dependen!
7.6 Sgn TestThe sign test is 8 non-parametric test used variable, y, and this may not be a reasonabte assumption. A
in seis of paired data to determine if one component of the variation of the linear regression approach is available,
pair is consistently larger than the other (8). In this test assuming the fitting equation passes through the origin. In
method, the vales of the data pairs are compared, and if the this case, only one adjustable parameter will result from the
first entry is larger than the second, a plus sign is recorded. If t. It is possible to use slatistical tests, such as the F test, to
the second trro is larger, then a minus sign is recorded. If compare the goodness of fit between this approach and the
both are equal, then no sign is recorded. The total number of two adjustable parameter fits described above.
plus signs, P, and minus signs, N, is compuled. Significance 8.3 Polynomial RegressionPolynomial regression anal-
is determined by the following test ysis is used to t data to a polynomial equation of the
| P- N\ >kS7Tff (13) foliowing fbrm:
where k = a function of significance level as follows: y " a + bx + ex1 + dx3 etc. (17)
wbere:
a, b,c,d = adjustable constants to be used to fit the data
sel,
= the observed independent variable, and
The sign test does not depend on the magnitude of the = the observed dependen! variable.
difference and so can be used in cases where normal statistcs Tncc6 equations required to carry out the calculation of the
would be inapproprate or impossible to apply. best t constante are complex and best handled by a
7.7 Outside CounThc outside count test is a usefil computer. U is usualiy desirable to run a seres of exprcssions
non-pararnetnc technique to evalate whether the magni- and compute the residual variance for each expression lo
tude of one of two data sets of approximately the same find the simples! expression fitting he data.
number of vales is signicanily larger than the other. The 8.4 Muitipie RegressionMltiple regression anaysis is
derails of the procediere may be found dsewhere (8), used when data sets involving more than one independent
7.8 Crner CountThe comer count test is a non- variable are encountered An expression of the following
parametric graphical technique for determming whether form is desired in a mltiple linear regression:
there is correiatkm between two variables. U is simpler to y - + 6,jt, -t fijjrj + *,*, etc. (18)
apply that the correlation coeffident, but requires a graphical
where:
presentation of the data, The detaifcd procedure may be
a, b,, blt 3, etc. = adjustable constants used to obtain the
found elsewhere (8).
best t of the data set
8. Curve FittingMethod of Uast Squares *i. x3, *j! etc. = the observed independent variables
y = the observed dependent variable.
8.1 It is often desirable to determine the best algebraic Because of ihe comptexily of this problem, it is generally
expression to t a data set with the assumption that a handled with the help of a computer. One slrategy is to
nonnally distribuid random error is operating. In this case, compute the valu of ali the "b's," togcther with standard
the best fit will be obtained when the condition of mnimum deviation for each "b." It is usually necessary to run several
variance between the measured valu and the calculated regression analysis, droppng variables, to establish the rea
valu is obtained for the data set. The procedures used to tive importance of the independent variables under consid-
determine equations of best t are based on this concept. eration.
Software is available for computer calculation of regression
equations, including linear, polynomial and mltiple vari-
9. Compulsan of EffectsAnalysis of Variance
able regression equations.
8.2 Linear Regression2 VariablesLinear regression is 9.1 Analysis of variance is usefui to determine the effect of
used to t data to a linear relationship of the following form: a number of variables on a measured valu when a small
number of discrete levis of each independent variable is
y = mx + t> (14) studied (S, 7, 9, 10, 11). This is best handled by usng a
in this case, the best t is given by: factorial or similar experimental design to establish the
m = (nSjcy - SxSy)/[nx* - (te)2] (t S) magnitude of the effects associated with each variable and
the magnitude of the interactions between the. variables.
b^-tLx-mZy} (16) 9.2 The two-Jevel factorial design experimem is an excel-
lent method for determming which variables havc an eTect
where: on the outcome.
y the dependen! variable 9.2.1 Each time an additiona! variable is to be studied,
fll 016
twice as many experimente mus be performed lo complete informatio n be obtained from the experment is also
ihe two-4evel factorial design. When many variables are reduced.
invoved, the number of experiments becomes prohibitive. 9.3 In the design and analysis of interlaboratory test
9.2.2 Fractional reptication can he used to reduce the programs. Practice E 691 should be consulted.
amount of testing. When this is done, the amount of
(1) Tufle. E R., The Visual Disptay of Quanliatve nformaon, Mathfmaiia in Chemieal Engineertng, 2nd cd., McGraw-Hill,
Grsphic Piws, Cheihirc, CT, 1983. New York, NY, 19S7, pp. 46-99.
(1) Booth. F. F., and Tucker, O. E. G,, "Sascal Distribulion o (7) Snedecor, G. W., Slatistieal Methods App!ed o Expenmenis in
Endurance in Elearochemical Streo-Corrosion Tests," Corrosin, Agriculture and Biology, 4th Edition, lowa State Cdlege Crea,
CORSA, Vol 21, 1965, pp. 173-177. Ames, 1A, 194.
(3) Haynie, F. H,, Vu(haa. D. A., Phaten, D. I.. Boyd, w. K., and (8) Brown, B. S., "6 Quick Ways Statistics Can Help You," Ctetnicat
Frort, P. D., A Fundomaaa! Investigation of the Notare of Engineen Calculaiion and Shortau Deskbook, X2S4, McGraw.
Stress-Corrosion Cracking in Ahmtnum Atoys, ARML-TR-66- Hill, New York. NY, 1968, pp. 37-43.
267, June 1966. (9) Freeman, H. A., Industrial Satlnics, John Witey & Sons, Inc.,
(4) Anz, P. M,, "Application of Siatistio Theory of Extreme Vales New York, NY, IM2.
to the Analysis of Mximum Pit Deplh Dala for Aluminum," (10) Davics, O. L., ed., Design and Analysis of Industrial Experimertu,
Comaion, CORRA, Vol 12,1956, pp. 49i-S06t Hafcer PubUstng Co., New York, NY, 1934.
(5) Volt. W., Applied Slteles for Engiaetrs* 2nd cd-, Roten E. (11) Box, G. E P., Huoter, W. G., and Hunter, J. S. Siataiia for
Krieger Pablishing Corapany, Hunngicn, NY, 1980. Experimeners. Witey, New York, NY, 1978.
{() Mkldcy, H. S.. Sherwood, T. K-, and Reed, C. K., Editora, Apptfd
APPENDIX
(Nonmandatory Information)
XI. SAMPLE CALCULATIONS

X1.1 Calculacin of Variante and Stantard Deviaton average corrosin rate of 27 panels, *:
XI.1.1 DataThe 27 vales shown in Tatde X I , I are 54.43
calculated mass ioss based corrosin rales for copper panels = -= 2.016
27
in a one year rural atmospheric exposure,
XI.1.2 Catcaiation ofSttaistcs: The variance estmate based on this sample, s2(x):
XI.1.2.1 Le x, = corrosin rate of the i'h panel. The
TA5LE xi.l Copper Con-oww Rt Omt-YMr EKpo*ure

Pmd _ CRjnwyfyi)"" Ra* PtotUngPosponK)
1!1 Z.16 25 90.74 110.085 - 27 x (2.016)2 _ 0.350
=
S 2.21
2.15
27
24
96.15
87.04
26
The standard deviation is:
26
1667
60.00 The coefflcient of variaton is:
66.52
44.44
6652 --
22.22
75.93 Tbe standard deviation of the average is:
69.26 0.116
(*) =- - = 0.0223 (Xl.5)
The rangc, w, is the diference between the 1 est and

smatiest vales:
w-2.21 - (.70-0.41 (XI.6)
The mid-range valu
PC 1.7)
s Q.I O.K> aai
FtG. X1.1 PrtjtoabWy Ptat for Cotrwtai Wat of Coppar Pirarii In 1-Yer Rural MmosphMic Expotura
X1.Z Calculacin of Rank and Phming Points for Probabillty XL3.2.1 (I) Ptot the avcragc valu at S0%, 2.016 at
Paper Plots 50%.
X \2. \ The lowest corrosin rate valu (1.70) is assigned a XI.3.2.1 (2) Plot the average + 1 standard deviation al
rantc, r, of 1 and the remaning vales are arranged in 84.13 %, i.e-, 2.06 + 0.116 = 2.136 at 84.13 %.
ascending order. Mltiple vales are assigned a rank of the XI.3.2.1 (J) Plot the average - 1 standar deviation at
average rank. For example, both he third and fourth panels 15.87 %, .e., 2.016 - 0.116 = 1.900 at 15.87 %.
have corrosin rales of 1.88 so that the rank is 3.5. See the XI.3.2.1 (4) Connect these three points with a straight
third colurnn in Table X I . 1 . line.
XI.2.2 The plotting postions for probability paper plois
are expressed in percentages in Table X 1 . 1 . They are derved X1.4 Evaluaton of Outlier
from the rank by the following euprcssion:
Xl.4.1 DataSee X l . l , Table X l . l , and Fig. X l . l .
Plotting position = 100 (r - :/)/n expressed as perecn! (XI.8) X1.4.2 Is the 1.70 result (panel 411} an outlier? Note that
See Table X1.1, Fourth column, for plotting positions for this this point appears to be out of Une in Fig. X l . l .
data set. XI.4.3 Reference Pracce E i78 (Dixon's Test)Wc
NOTEFor extreme valu staiistics the plotting position formula is choose a = 0.05 for this example, that is, the probaoility that
]00r/n - i (see Guide G 46). The median is the coiro&on rale al the this point could be this far out of Une based on normal
SO % plultini poiion and is 2.03 for panel 142. probability s 5 % or less.
X1.4.4 Nurnber of data points is 27:
XI3 Probability Paper Ploiof Data: See Table XI.1
8 - 1.70
X I . 3 . 1 The corrosin rate is plotted versus piotting posi- -0.391 (XI.9)
tion on probability paper, see Fig. XI. 1. * n _ 3 -*i 2.16-1.70
XI.3.2 Normal Distribuon PlOling Position Reference: The Dixon Criterion at - 0.05, n - 27 s 0.393 (see
Xt.3.2.1 In order to compare the dala points shown in PracticeE178,Tabte2).
Fig. Xl.l to what woutd be peded for a normal distribu- Xi.4.4.1 The r32 valu does not exceed the Dixon Crite-
tion, a straight line on the plot may be constracted to show a rion for the valu of n and the valu of a chosen so that the
normal dtstribution. 1.70 valu is not an outlier by this test.
4G16
X1.4.4.2 Practice E 178 recommends using a T test as the thosc idntica! conditons will M, it may be calculated as
best test in this case; follows:
^.2.0.6-1.70.^ (XI. 13)
i 0.116
X1.6 Difference Betw Average Vales
Critica! valu T for a = 0.05 and n = 27 is 2.698 (Practice XI. 6,1 Aia Triplcale zinc fla panels and wire hlices
E 178, Table 1). Therefbre, by this criterion the 1.70 valu is
were exposed for a one year period at the 250 m loi at Kure
an outlier because the calculated Tt valu exceeds the critical
Beach, NC. The corrosin rales were calculated from the loss
r valu. in mass after deaning the specimens, The corrosin rate
Xt.4.5 Discussion: vales are gi ven in Table X I . 2.
Xf.4.5.1 The 1.70 valu for panel 411 does appear to be
X I . 6.2 Statittlcs:
out of line as compared to the other vales in this data set.
Tbe 7 test confirms this conclusin if we choose a = 0.05. Pand Average i - 2.24
The next step shouid be to review the calcuiations that lead Panel Standard Deviaiion - 0.18
to the determinaton of a 1.70 valu for this panel The Helix Average: t = 2.55
Hdx Standard Deviaion = 0.065
original and final mass vales and panel sizc measurements
should be checked and compared to the vales obtained Xl.6.3 QuestionAre the hlices corroding significantly
from the other panels. faster than the panels? The nuil hypothesis is therefbre that
X.4.5.2 If no errors are found, then the panel itself the panels and hlices are corroding at the same or ower
should be retreved and examined to determine if there is any rate. We wil! choose a - 0.05, that is, the probabity of
evidence of corrosin producs or other extraneous material erroneously rejecting the nuil hypothesis is one chance in
that would cause its final mass to be greater than it should twenty,
nave been. If a reason can be found to explain the loss mass XI. 6.4 Calculaions:
loss valu, then the result can be exduded from the data set X t 6.4. 1 Note that the standard deviations for the panels
without escrvation. If this potnl is excluded, the stastics for and hlices are different. If they are not significantly different
this distribution become: then they may be pooled to yield a larger data set to test the
JE - 2.028 hypothesis. The F test may be used for this purpose.
(x) = 0.0102
4*)- 0.101
The critical f for a = 0.05 and both numerator and

denominaior degrees of freedom of 2 is 9.00. The calculated
._. F is less than the crtica! F valu so that the hypothesis that
the two standard deviations are not signficantly difieren!
Median = 2.035 may be accepted. As a consequence the standard deviations
w-2.2 - 1.86 = 0.35 may be pooled.
2.2 XI. 6.4.2 Cakutation of pooled variance, j^jc):
Miel range = - 2.035
^]
The average, median, and mid range are closer together {XI.15)
excluding the 1.70 valu, as expecied, altbough the changes substuting:
are relattvely small. In cases where deviations occur on botb
eods of the distrbuton a different procedure is used to check
for ouliers. Picase refer to Practice E 1 78 for a discussion of
this procedure. Xl.6.4.3 Caiculaionofslalisfic:
XI .5 Cofidence Interval for Corrosin Rale

X1.5.I DflMSee XI. 1, Table XI. 1, and XI.4.1, ex-
cluding the panel 41 1 result.
Signifieance leve! a = 0.05
CooTidenoe interval calculacin:
Confidence interval = i (JE) ,v . ...
/ for a = 0.05, DF= 25; is 2.060 DF - 2 + 2 = 4
95 % confidencc interval for the average coirosian rale. Xl.6.5 ConclusinThe critical valu of t for a = 0.05
JE (2.060X0-0198) ~Z 0.041 or 1.987 t 2.069 and DF - 4 i s 2.13 2. The calculated valu for exceeds the
Nole that this interval refere to the average corrosin rale. If critical valu and therefore the nuil hypothesis can be
one is intcrested in the interval in which 95 % of measure- rejccted, that is, tbe hlices are corroding at a significantly
raents of the corrosin ratc of a copper panel exposed under higher rate than the panels. Note that the critical valu
TABLE X1.2 Comion Rale Vales k = is the mass loss coefficient and b is the time exponen!.
The data in Table XI. 3 may be handied in severa! ways.
Linear regression can be applied to yiet a valu of A, that
minimizes the variance for the constant rate espression
above, or any linear exprrssion sucb as:
where:
above is listed for a = 0.1 most tables. This is because thc a = is a constant.
Alternatively, a nonlinear regression analysis may be used
tables are set up for a two-sidcd' test, and this cxamplc is for that yiets vales for k and b that miriimize tbe variance
a cr.e-sided test, that is, is Jfj, > xj from the measured vales to the calculated valu for m at
Xl.6.6 DiscussonUsually the a level for ihe F test
shown in XI.6.4.1 should be canied out at a more stringent any time usjng the power ftmction above. All of these
agnificance level than in the t test, for example, 0.01 rather approaches a&sume that the variance observed at shorl
than 0,05. In the event that the F test did show a significant exposure times is comparable to variances at long exposure
difieren; then a difftrent procedure mus be used to carry mcs. However, the data in Table X1.3 shows standard
out thc t test. It is also desirable to consider the power of the deviations that are roughly proportional lo trie avcrage valu
l test, Details on these procedures are beyond the scope of at each time, and so the assumption of comparable variance
this appendix bul are covered in Ref. (10). is not justified by the data at hand.
Another approach to bate this problem is to employ a
X1.7 Die FirtlagRegression Analysis Example logarithmic ansformation of the data. A transformed data
Xl.7.1 The mass loss per unit rea of zinc is usually set is shown in Table X 1 .4 where x = log T and y - log m.
assumed to be linear with exposure time in atmospheric These data may be handied in a linear regression analysis.
exposures. However, most other metis are better fitted with Such an analysis is equivalen! to the power fimction fit with
power ftmcton kinetics in atmospheric exposures. An expo- the k and b vales minimiziiig the variance of the trans-
sure program was carried out with a commercial purity formed variable, y.
rolled zinc alloy for 20 yeais n an industrial site. How can The logaritbmic transformation bccomes:
the mass loss results be convcrted to an expreson that log m - log k + b log T
describes the results?
Xt.7.2 ExperimentalForty paneb of 16 gage rolled zinc
strips were cut to approxiraately 4 n. to6 in. in sze(lOOby
150 m). Thc panels were ckaned, weighed, and exposed at
thc same time. Five paneb were removed after 0.5, 1, 2, 4, 6, Note that the standard deviation vales, s(yi), in Table X 1 .4
10, 15, and 20 years exposure. The panels were then deaned are approximately constant for both short and long exposure
and reweighed. The mass loss vales were calculaied and times.
con verted to mass loss per unit rea. The results are shown in
TableXUbelow: XI.7.4 CatcuiaionsThe vales in Table X1.4 were
XI.7.3 AnaysisCorrosin of zinc in the atmosphere is used to calclate the foliowing:
usually assumed to be a constan! rale proce-ss, This would x -23.11056 y -20.92232 n -39
imply that the mass loss per unit rea m is relaed to 2x3 = 24.742305 Sy2 = 24.159116 Ssy = 24.341352
exposure time ?by:
5 = 0.592758 = 0.536470
where;
k= is the corrosin rate. = 24.742305 - '"''^^ = 11.047485
Most other metis are better fitted by a power function such
(2Q.92232)3 _
S'y* =24.159116-^^=12.934924
( 2 3
Z'xy = 24.341352- '11Q2232)-
* Losa pw Unit Ana, Zine tn Ure Atnoiphere (AH valas in mg/cm*}

Exposure DwWW Panet DesignMon
visa 1 2 a i 5 a SOLD*
O.SB7 0.392 0.362 0413 0.319 0.376S o.oaaa
0.8? 0.759 0.30; 0.738 0.780 0.7814 0.0355
un 1,64? 1.5828 0.0800
un 3.406 3.237 3.260 3.297 3.3933 0.1720
6 5.325 S.2S7 S.391 5.280 5.333 5.41TZ 0.2336
10 10759 9.7T2 9,553 9966 9935 9.3970 0.4407
II 15.4*0 14,557 15.102 14.910 LOSt 15.0022 0.3689
K 20.700 19.507 18.963 19.336 ta.esB 19.4326 0.7S12
TABLE X1J Log of Dte fnm T
2 yi StdDav
-0,30103 -0.413C9 -0.40671 -0.441 -0.3736 -0.49621 -O.2e03 0.04GOO

-O144 -0.11976 -0.09637 -0.13194 -0.107BO -O.J0748 0.01969
3 0.30103 0.2535S 023194 0.20003 0.23729 0.21537 0.2S6X 0.02056
0.56679 0.53224 <J.B1fl1! O.B1687 0.5181? 0.53023 0.09144
S 0.77H1S 0.76530 073074 0.73187 0.72SB3 0.72B97 0.73346 0.01 626
s 1.00000 1.03177 0.98998 0.9B466 0.99852 0.99277 0.96954 0.01870

1 17609 ! 18865 1.16307 1.17903 1 17346 tm 0.01069
U1597 1. 2M19 1.27791 1,28637 1.Z7W6 1 28826 0.01721
8 1.30103
in which the unknown mean mass loss val wil! be located

uniess a 1 in 20 chance has occurred in tbe sampling of this
cxperment. On dw other hand, a confidenoe interva! calcu-
V -0.017810 lated from the regressioo results represents an interval that
a

- y - b - O.S3647 - !,08108(0.59278) = -0.10416
will cover the unknown regreasion line uniess a 1 in 20
chance has occurred in the sampting of this experiment.
X1.7.7.2 The confidcnce interval for each exposure time
is equally spaccd around the average of the log vales. This
k = 0.7868
will also be truc for the regression confidence intervaJ.
Xl.7.5 Analysis of Vanante One approach 10 test the Howevcr, when these intcrvals are plotted on linear coordi-
adequacy of the anaysis is to compare tbe residual variance nates the interval will appear to be unsymmetrica. An
from the regression to the error variance as estimated by the example of a confidence interval calculation is shcwn as
variance found in replication. Thc nuil hypothesis in this follows:
case is that the residual variance from the calculatcd regres- Calculation of the confidence interval, CI, for the average
sion expression is not sigiuficantly greater than the replica- valu:
tion variance.
Xl.7.6 Comparsoa of Power Pualn o Linear Kinet- Exposure time, T-yeara, 0.03, DF = 4. t = 2.78
csIs the expression found better than a linear expression? y, * 0.73346 s(ys) = 0.01829
A linear kinctics function would nave a slope of one after a ,?iyi2iiM,s.Q2274
iiansformation to a log function. Therctbrc, this question
can be reformulated to: Is the b valu significantly diTerent Cl = .71072to.75S20
from one? The nuil hypothesis is thercfore that the b valu s
not difieren! from 1.000 with a = 0.05. A test will be used: Coiiverting y to m; CI x 5.137 to 5.704 mg/cra2
b = 1.08108 CaiculatiuQ of Ihe conidence interval for the regression
expression at exposure time T = 6, a = 0.05, DF = 6, t =
- 0.0000829
(6)11.04749
= 1.08108-1.00000
- = 8.90 %* = 11.0475
0.00911
The crtica! t at 6 egrecs of freedom and a * 0.05 is 2.45, = ^ = P^p = 0.000<M6
m
which b smalter than the calculated t valu above. Therefore
the nuil hypothesis may be rejected, and the slope is Note thai a poolcd esiniate of this variance cout have been \atd.
significantly difieren! from one. H " 0.59258 xs - log 6 - .77815
XI.7.7 Confidence Interval for Regression:
XI.7.7.1 A confidcncc nterval calculated from the repl-
cate information at each exposure time represents an interval
TABLE X1.S Analyi o
ReaMual from xy ira
SOS - tan o squeras valu

DF = deQ<es of freedom
US - mean squvas uto
F-iest on hypoih=:s ir X1.7.
OjODOSTfi '*
a nk of 0.05 and 6/31 oearees of Iiaeoar, e 2.41. Tlwrefw*, the resldu.1 .arlante from \h
vanare* wtnn testad rt the 5 X tevei. Ttus, the iegresion m
10
I G16
that the correlation coefficient calculated for the logarithmic

s(ys) - 0.01247 expression is not comparable to a correlalion cocfficient
Vi - -0.10416 + 1.08108 x s - -0.10416 + 1,08108 (.77815) = calculated for a nontransformed rcgression.
.73708 Xl.7.9 Discussion:
Cly, = y, &) = .73708 * 2AS{.012A1) = .70653 to .76763 Xl.7.9.1 The use of a log transformation to obtain a
power jnctioa fit is convenient and simple but has some
Ct - 5.088 lo 5.856 limitaions. The log transformation tends to depress the
Note tbat the confidence interval calculated from he regres- calculated vales to the low side of the linear average. It also
son is slightly larger than that calculated from the replcate produces a aonlinear error function. In the example above
vales at that exposure time. the use ofa log transfrmation produces an almost constant
Xl.7.7.3 Figure X1.2 is a iog-tog plot showing thc regres- standard deviation ovcr the rangs of exposure times.
sion equaton with the 95 % confidence interval for the XI.7.9.2 A linear regression analysis also may be used
regression shown as dashed Unes and the averages and with these raass loss results, and the corresponding expres-
corresponding confdence intervalo shown es bars. Figure sion may be a reasonabe estimate of mass loss performance
X 1.3 shows the same nformatkm on linear coordnales. for rolled zinc in this atmospbere. However, neither the
X I . 7.8 Other Statistic from he RegressionStandard linear or a nonlinear power function regression analysis will
error of estmate, s(y) for the logarthmic expression. yieid a confidence interval that matches as dosely the
s(y) - vC0009T6 = .03026 replcate data confidente intervals as the logarishmic trans-
ottnation shown above.
Correlation Coefficient, R, for the logarithmic expressioa. X1.7.9.3 The regression expression can be uscd to projec!
(11.943236)' future results by estrapolation of the results beyond ie
u?
1 range of data availabie. This type of calculation is generally
" i.047485 x 12.934924
not advisable unless there is good informaton indicating that
R - .9991 the procedure is valid. that is, that no changes have occurred
Note that R or R3 is oftcn quoted as a measure of the quality in any of the environmental and surface conditions that
offit ofa regression expression, However, it shouldbe noted govern the knetics of the corrosin reaction.
1
tfi!
8
Z^l
r=|
Mlrf
FM. X1.3 Un,,, PM o M.,, Lo, versu. Eipoiure Tlm. lor n.pioii, RoM zinc Pmel. m u IndunM Almo.prme
ffll G16

ASTM - G1695 - Applying Statistics To Analysis of Corrosion Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ASTM - G1695 - Applying Statistics To Analysis of Corrosion Data

Uploaded by

Copyright:

Available Formats

AMERICWJ SOCIETY f=Ofl TESTING AND MATEftiAlS

I Designation: G 16 - 95 1S16KS! Ptnldflphia.Fa'BIQJ

Standard Guide for

1. Scope estimated from the measured results.

XI. SAMPLE CALCULATIONS

TA5LE xi.l Copper Con-oww Rt Omt-YMr EKpo*ure

The rangc, w, is the diference between the 1 est and

The critical f for a = 0.05 and both numerator and

XI .5 Cofidence Interval for Corrosin Rale

* Losa pw Unit Ana, Zine tn Ure Atnoiphere (AH valas in mg/cm*}

-0,30103 -0.413C9 -0.40671 -0.441 -0.3736 -0.49621 -O.2e03 0.04GOO

s 1.00000 1.03177 0.98998 0.9B466 0.99852 0.99277 0.96954 0.01870

in which the unknown mean mass loss val wil! be located

TABLE X1.S Analyi o

ReaMual from xy ira

SOS - tan o squeras valu

F-iest on hypoih=:s ir X1.7.

that the correlation coefficient calculated for the logarithmic

You might also like