TSL - TuanShipland - Com - Data Mining

DATA MINING WITH SPREADSHEET PROGRAM
IN EXPERIMENTAL STATISTICS
Nguyen Anh Tuan(1) , Vo Trong Cang(2)
(1) Computational Engineering Laboratory

Institute for Computational Science and Technology, Ho Chi Minh city, Vietnam
(2) Department of Naval Architecture and Marine Engineering, Faculty of Transportation Engineering
Ho Chi Minh City University of Technology (HCMUT), Vietnam
ABSTRACT
Data processing is an issue that people usually come against face to face in not only engineering
and but also business activities. The experimental statistic data processing is an essential part of data
processing. Data processing can be carried out in many fields such as evaluating reliability of
structural systems and also engineering system maintenance. This study represents engineers and
researchers the method to process the experimental statistic data in common spreadsheet programs.
Keywords:
Experimental statistic, data set processing, spreadsheet program, experimental density function
statistic data processing. Moreover, engineers

1. INTRODUCTION can promptly learn Excel functions by
ourselves. That reason, authors choose
In engineering system maintenance field, Microsoft Excel to present in this paper.
engineers usually collect and process data,
such as failure statistics data processing in
engineering systems. Furthermore, engineers 2. EXPERIMENTAL STATISTIC DATA
can set up the suitable maintenance period PROCESSING
from analyzing failure statistic data.
After processing datasets, the
Consequently, engineers in factories will
experimental statistic data processing will be
effectively optimize the operative costs in
carried out. Following the below diagram of
future. In another case, in evaluating reliability
the experimental statistic data processing (See
of complex structure, engineers also process
figure 1).
experimental data.
In the first, data will be divided into small
Up to now, the worldwide development of
groups with the width of group h. According
spreadsheet programs supports researchers and
to formula (1.81) in the preference [1], the
engineers that effectively solve these upper
width of group h can be determined from the
issues in data processing, such as Lanpar
below equation
software, VisiCalc, Lotus 1-2-3, Apple
Numbers, Gnumeric, and etc. One of these is
Microsoft Excel that is so familiar to people in
 E-mail: tuanshipland@gmail.com
(2.1) The expected value of the experimental
distribution will be calculated by
where: ( ) ∑ (2.4)
n: The total of statistic data
xmax , xmin : the maximum and minimum of Experimental Data Input

random value in statistic data
Group 1 from xmin to xmin + h

Initial Data Processing
Group 2 from xmin + h to xmin + 2h
THE\
Group 3 from xmin + 2h to xmin + 3h
EXPERIMENTAL Divided Group
Be continuous to the range of final group
that contains the maximum value in the DATA
experimental statistic data. PROCESSING Calculating the Median xi ,
the Frequency mi , the
Next to define the median of each group
xi. Probability ri
The median of group 1: x1 = xmin +

Calculating the Experimental
The median of group 2: x2 = xmin + Distribution Density pi
The
Graph pi
The median of group 3: x3 = xmin +
Calculating the Expected
Be continuous to the final group. Next, the Value E(X), the Variance D(X),
probability can be calculated in each group. the Standard Deviation σ
(2.2)
THEORETICAL DISTRIBUTION DEFINITION

Where:
ri : the probability in each group Figure 1. The diagram of the data set
processing and the experimental data
mi : the frequency in each group processing in Excel (Do Duc Tuan, 2007)
n : the total of values in the experimental The variance of the experimental
statistic data distribution will be expressed as the below
formula.
Note: ∑ with k is number of
group. ( ) ∑ ( ) (2.5)
Then, the experimental distribution
density pi is given by the probability in each
group. The standard deviation in the experimental
distribution will be defined by the square root
(2.3) of the variance.
√ ( ) √ (2.6)
3. APPLICATION IN EXCEL C47 checks the size of data (Fig.7) and C57
checks the probability that always is 1 (Fig.8).
Figure 2. Data Input
Data input contains 8 values that have been

Figure 5. Divided data in 25th – 34th rows
divided into 4 groups (See Fig.2 and Fig.5).
From Fig.2 to Fig.11, authors demonstrate the
results step by step.
Data in this paper is limited by 513 random

values that could be divided maximum into 10
groups. After filling values in data input, all of
calculations and graph are automatically
updated. Figure 6. The median of each group in 35th –
44th rows
Figure 7. The frequency of each group in 45th

– 54th rows
Figure 3. The Excel formulas in initial data
processing
Figure 8. The probability of each group in 55th

– 64th rows
Figure 4. The result of initial data processing
E45 = IF(D25<>"
",COUNTIF($E$3:$E$12,">="&E25)-
COUNTIF($E$3:$E$12,">"&G25)," ")
Figure 9. The experimental distribution
COUNTIF function of E45 counts the value density in 65th – 74th rows
that belongs [190, 242.522) for first group (in
cells E25 and G25, see Fig. 5).
4. CONCLUSION
This solution is realizable and useful for

applying intelligently Excel formulas in the
experimental statistic data processing. Besides,
this study proposes a simple diagram for
Figure 10. SUMPRODUCT function
statistical data analysis in engineering.
calculates the expected value and the variance
Furthermore, engineers and researchers can
E85 = SUMPRODUCT((E55:E64,E35:E44) apply this method for processing huge statistic
returns the expected value that is defined by data in Excel without difficulty.
equation (2.4).
In conclusion, this paper gives the
SUMPRODUCT function returns the value by
previous step that defines seven theoretical
multiplying horizontally value of E55 – E64
distributions in next steps such as normal
and E35 – E44 (Fig.6 and Fig.8). Then, it
distribution, lognormal distribution,
added these calculated results.
exponential distribution, gamma distribution,
E86 = SUMPRODUCT(E75:E84,E55:E64) Weibull distribution, Rayleigh distribution,
returns the variance that is calculated by and Maxwell distribution.
equation (2.5).
E87 = SQRT(E86) calculates the standard REFERENCES

deviation by the square root of the variance.
[1] Do Duc Tuan (2007). Lý thuyết độ tin cậy.
(Theory of Reliability). University of
0.008
Transportation. Ha Noi. Viet Nam.
0.007
0.006 [2] E. Joseph Billo (2007). Excel for Scientists
0.005 and Engineers - Numerical Methods. John
0.004 Wiley & Sons Inc Publication. New
0.003
Jersey. USA.
0.002 [3] Nguyen Minh Tuan (2007). Application
0.001 Statistics in Business – Using Excel.
0 Statistic Publisher. Viet Nam.
200.000 250.000 300.000 350.000 400.000
[4] Microsoft Office 2007. Excel Help. USA.
Figure 11. The scatter graph of the
[5] Paul McFedries. 2007. Formulas and
experimental distribution density with smooth
Functions with Microsoft Office Excel
line 2007. Pearson Education Inc. USA
Advance application for readers:
- Calculated bigger data over 10 groups.

- Applied this method step by step for
other spreadsheet programs that are
family to reader.
- Readers can totally carry out the
application by yourself.

TSL - TuanShipland - Com - Data Mining

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TSL - TuanShipland - Com - Data Mining

Uploaded by

Copyright:

Available Formats

DATA MINING WITH SPREADSHEET PROGRAM

Nguyen Anh Tuan(1) , Vo Trong Cang(2)

(1) Computational Engineering Laboratory

statistic data processing. Moreover, engineers

n: The total of statistic data

xmax , xmin : the maximum and minimum of Experimental Data Input

Group 1 from xmin to xmin + h

The median of group 1: x1 = xmin +

THEORETICAL DISTRIBUTION DEFINITION

Figure 2. Data Input

Data input contains 8 values that have been

Data in this paper is limited by 513 random

Figure 7. The frequency of each group in 45th

Figure 8. The probability of each group in 55th

Figure 4. The result of initial data processing

This solution is realizable and useful for

E87 = SQRT(E86) calculates the standard REFERENCES

- Calculated bigger data over 10 groups.

You might also like