Professional Documents
Culture Documents
Goals of the lecture Finding a representative value that best characterizes the average of the data set. Finding a representative value that provides a measure of the variation in the measured data set. Establishing an interval about the representative average value in which the true value is expected to lie.
Error Distribution: Characterizes the probability that an error of a given size will occur during repeat-sample experiments.
Probability: an expression of the likelihood of a particular event taking place, measured with reference to all possible events.
The probability density function (PDF) for the entire population of possible precision error values is generally assumed to be Gaussian (normal, bell-shaped) Since total precision error is random, each individual measurement in the sample will have a distinct error whose likelihood of occurrence (roughly) decreases with size
For a normal distribution, if we know the mean and standard deviation, we can estimate the probability that a single measurement will lie within a band around the mean:
Probability density function In simple terms, a probability density function (PDF) is constructed by drawing a smooth curve fit through the vertically normalized histogram. Histograms:
a histogram is constructed by divvying up the n measurements of a sample into J bins or intervals (also called classes) such that for the first bin (j = 1), x1 < x x2, for the second bin (j = 2), x2 < x x3, etc. We define xmid,j as the middle value of x in bin j. For example, xmid,2 = (x2 + x3)/2. Generally between 6 and 15 bins are used, i.e., 6 J 15. Then a bar plot is made of the frequency the number of measurements in each bin versus the value of x, as sketched. (The frequency is also called the class frequency.) The bin width (also called the interval width or class width) is usually constant, although it does not have to be. In the sketch above, the bin width x3 for the third bin (j = 3) is shown.
The histogram can be modified by dividing the vertical axis by the total number of measurements, n. The resulting probability histogram has the same shape, but the vertical axis represents a relative frequency or probability, i.e.,
We can also define a vertically normalized histogram by further dividing the vertical axis by the bin width or interval width. The vertical axis of the vertically normalized histogram is defined as:
Using a histogram to display this data, we need to choose K small intervals for each bin of the histogram.
For small N, the number of measurement results in at least one bin should be >= 5
nj is the number of samples in each bin. p(x) defines the probability that measured variable might assume any particular value upon any individual measurement.
Mean Value
Variance
Finite Statistics
Unless we have made a very large number of measurements, we don't have an accurate estimate of the mean or standard deviation of a data set. If we assume the values are normally distributed, we can estimate the mean and standard deviation from the data.
The sample mean and sample variance are given by: and
How close are these values to the true mean and standard deviation? That depends on how many samples we have.
For a normally distributed data set, we can say that the probability of a sample, xi, differing from the data set mean value, , is given by x
sample of N values differs from the true mean of the distribution by an amount
Confidence level
is defined as the probability that a random variable lies within a specified range of values. The range of values itself is called the confidence interval. For example, as discussed above we are 95.44% confident that a purely random variable lies within two standard deviations from the mean.
Regression Analysis
Regression analysis is used to find an equation for y as a function of x that provides the best fit to the data. Typically, y is some measured output as a function of some known input, x. Recall that the linear correlation coefficient is used to determine if there is a trend.
ei is also called the residual. Note: Here, what we call the actual value does not necessarily mean the correct value, but rather the value of the actual measured data point.
7. We define E as the sum of the squared errors of the fit a global measure of the error associated with all n data points. The equation for E is :
Correlation coefficient :
In engineering analysis, we often want to fit a trend line or curve to a set of x-y data. Consider a set of n measurements of some variable y as a function of another variable x. Typically, y is some measured output as a function of some known input, x. In general, in such a set of measurements, there may be: Some scatter (precision error or random error). A trend in spite of the scatter, y may show an overall increase with x, or perhaps an overall decrease with x. The linear correlation coefficient is used to determine if there is a trend. If there is a trend, regression analysis is used to find an equation for y as a function of x which provides the best fit to the data. The linear correlation coefficient rxy is defined as: