Professional Documents
Culture Documents
This HowTo document describes how to use Scientific Python (SciPy) for De-
sign for Six Sigma (DfSS). DfSS is an informal collection of best-practices and
methods for product development and design. Some of these methods sup-
port the project management and requirements gathering, while others are of a
mathematical nature. In the coming chapters, it is shown with examples how to
attack these numerical/statistical problems using SciPy.
DfSS tasks addressed:
• Statistical observations
All examples are written for Python 2.6, and will most probably work in any
other version of Python.
2 Statistical Observations
The basic concept of DfSS is that there is variation everywhere. Every process
has some variability in the inputs and the outputs. The key to improve the
process is to understand the nature of the variation and the best way is to use
statistical models. This section describes how to look at measurement data in a
statistical way.
First of all, measurement data must be imported into the SciPy environment.
Either by typing it into the source code using the r_[...] directive to create
a row vector, or by importing the data from a file, e.g. a .csv-file (comma
separated values).
A simple example with twelve temperature observations done once per hour,
entered and plotted:
timeVector = r_[1:13]
dataVector = r_[2.3,3.3,4.1,5.5,5.1,6.7,6.3,6.9,7.0,7.7,7.2,7.4]
plot( timeVector, dataVector )
xlabel('Time [h]')
ylabel('Temperature [C]')
The same data could be found read from a data file with the following format:
1,2.3
2,3.3
3,4.1
...
The code would then look like this, and the plot is shown in Fig. 1:
measurementVector = sp.loadtxt('import_csv_example.csv',delimiter=',')
timeVector = measurementVector[:,0]
dataVector = measurementVector[:,1]
measurementVector = sp.loadtxt('example_weather_observations.csv',\
delimiter=',', skiprows=1, converters = {0: datestr2num}, \
usecols=(0,1,2,3,4,5) )
timeVector = measurementVector[:,0]
dateVector = mplt.dates.num2date( timeVector )
pressureVector = measurementVector[:,1]
Figure 2: Example data imported from an external file. Barometric pressure in Maarn, in March
2010.
hold(True)
p_mean = mean(pressureVectorW23)
p_std = std(pressureVectorW23)
p_values = linspace( p_mean-3*p_std, p_mean+3*p_std, 100)
plot( p_values, 1400*stats.norm.pdf( p_values, p_mean, p_std )
)
# plt.savefig('example_weather_observations.png', dpi=300)
The second test, the comparison of the means of the two weeks, is done using
the ”Student’s T-Test”. Visually, we clearly see that the levels are different in the
two weeks and it is quite straightforward to compare the datasets:
pressureVectorW1 = pressureVector[0:700]
pressureVectorW2 = pressureVector[700:1400]
(t,p) = stats.ttest_ind(pressureVectorW1, pressureVectorW2)
print 'T-test of different means, p = %.2f <0.05' % p
# plt.savefig('example_weather_observations.png', dpi=300)
Note that we use the ttest_ind() function for independent measurements.
The measurements were not of the same item and not correlated pairwise in
some way. However, if we would compare the measurements of two differ-
ent weather stations for the same time period, then the measurements would
be correlated and we would be only interested in the variability between the
difference of the means. Then the other T-test function is used, ttest_rel().
??
Figure 3: Histogram of the pressure during weeks 2-3, with an estimated normal distribution.
For each sample, i.e. N items, the mean x-bar and standard deviation are
computed and plotted in the two graphs. The mean of the sample is plotted in
the top-chart, with dotted lines for the Upper Control Limit and Lower Control
Limit.
4 Statistical Inference
Based on a parametrized statistical distribution, say something about the prob-
ablilities for something to happen...
5 Measurement Assurance?
Measurement Capability - repeatability/reproducibility.
6 Linear Regression
Linear model parameter estimation.
sp.stats.regression()
11 Conclusions