Professional Documents
Culture Documents
,
where adding index i obtains the values from 0 to j.
The binomial distribution depends on two theoretical parameters p, n.
The significance of binomial distribution
A typical example of independent random attempts is a random selection of elements from
a set if the selected element is returned back, so called the selection with return. It can be
shown that, in the case where the extent of selective set is small in comparison with the extent
of basic set, the difference between the selection with return and the selection without return
33
is insignificant. The binomial distribution can therefore serve as a suitable criterion, whether
the selective statistical set was created on the basis of random selection.
b) Normal distribution the example of continuous theoretical distribution
The characteristic of collective random phenomenon
The continuous random variable whose values xe(,), can have a normal distribution. The
graph of function which assigns the probabilities to these values of random variable is given
by well-known Gauss curve in the shape of a bell. It is so sought a probability which will be
assigned to unit interval of continuous random variable values in the sense that this interval
will contain the value of x.
Theoretical distribution, distribution function
The theoretical distribution is called probability density in continuous case (the random
variable values continuously follow themselves, it is needful to assign the probabilities to
unit intervals of values because the nearest neighbouring value to value x isnt possible to
find). The form of probability density is
( )
( )
2
2
2
1
2
x
x e
o
o t
= .
The relevant form of distribution function (cumulative probability) F(x) is given by integral
( ) ( ) ,
t
F t x dx
=
}
where lower integral limit acquires value 0, upper limit then value t.
The normal distribution depends on two theoretical parameters , . This dependence
is usually recorded N(,). The theoretical parameter is a theoretical analogy of general
moment of 1.order O
1
(x) and so it is theoretical analogy of empirical arithmetic mean x . The
theoretical parameter is a theoretical analogy of the square root of central moment of 2.order
C
2
(x) and so it is theoretical analogy of empirical standard (determinative) deviation S
x
.
The normal distribution can be normalized to the values of theoretical parameters =0,
=1 by means of standardized random variable
34
x
u
o
= .
This dependence is usually recorded N(0,1) and so called standardized normal distribution
(see figure Fig.4) is then marked by this record. The probability density of standardized
normal distribution will be marked ( ) u due to introduced variable u, the distribution
function is often called Laplace function and marked by record F(u). Very detailed statistical
tables are elaborated for the values of Laplace function. The graphical representation of
standardized normal distribution probability density is in the figure Fig.4.
Fig.4 Graphical representation of probability density ( ) u of standardized normal
distribution (the values u are applied in horizontal axis, the values of
probability density ( ) u are applied in vertical axis)
The significance of normal distribution
The significance of normal distribution is described by central limit theorem. Its essence is the
statement that the random variable, being created as the summation of a large number of
mutually independent random variables, has approximately the normal distribution under very
general conditions. The exact formulation is presented by Ljapunov theorem the component
of which is the condition enabling to work with a normal distribution for sufficiently the big
extent of selective set. The special forms of that theorem Lindberg-Lvy theorem and
Moivre-Laplace theorem (this theorem shows that for sufficiently the big number of
independent attempts the binomial distribution is converging to normal distribution) are
useful, too.
35
c) Parameters of theoretical distributions
For the discrete theoretical distributions the P
j
will mark the distribution function and
the x
i
the values of random variable RV. For the continuous theoretical distributions the
( ) x will mark the probability density and the x the values of continuous random variable.
The theoretical general, central and standardized moments O
j
, C
j
and N
j
are important
parameters of all the theoretical distributions. The theoretical general, central and
standardized moments O
j
, C
j
and N
j
can be expressed through the formulas:
1
( ) ,
b
n
j j
j j i
i
a
O x x dx O i P
=
= =
}
( ) ( )
1 1
1
( ) ,
b
n
j j
j j i
i
a
C x O x dx C i O P
=
= =
}
1 1
1
2 2
( ) ,
j j
b
n
j j i
i
a
x O i O
N x dx N P
C C
=
| | | |
= = | |
| |
\ . \ .
}
Often the names and marks mean value (expected value) E and dispersion
(variance) D are used, too. The expected value E is a location parameter which measures the
level of random variable RV. The dispersion D is a variability parameter which measures the
diffusion of random variable values. The expected value E is equal to theoretical general
moment of 1.order O
1
, the dispersion D is equal to theoretical central moment of 2.order C
2
.
The theoretical general moment of 1.order O
1
is the location parameter, the theoretical
central moment of 2.order C
2
is the variability parameter, the theoretical standardized moment
of 3.order N
3
is the skewness parameter and the theoretical standardized parameter of 4.order
N
4
is the kurtosis parameter.
The relation between empirical and theoretical parameters describes the law of large
numbers. Subject to compliance with certain conditions, it can be expected that the empirical
distribution and related empirical parameters will approximate the theoretical distribution and
associated with him theoretical parameters. And the more, the greater the extent of selective
statistical set (the larger the number of realized random attempts). Approaching the empirical
parameters to the theoretical parameters has not character of mathematical convergence but
probability convergence.
36
2.1.3. Description of Selected Probability (Theoretical) Distributions
a) Discrete theoretical distribution Alternative distribution
The alternative distribution is discrete theoretical distribution A(p) with one theoretical
parameter of zero-one random variable RV (the random variable has values x
i
= i = 0, 1).
The probability and distribution functions P
i
and F
i
as analogies of empirical relative
and cumulative frequency and theoretical moments O
j
, C
j
have for alternative distribution the
forms
( )
( ) ( )( )
( )( )
1
0
1 2 3 4
1 2 3
2
4
1 , where 0,1, , where 1
theoretical moments , , ,
, 1 , 1 1 2 ,
1 1 3 3 .
i
i
i
i i i
j
i i
P p p i F P i
O C C C
O E p C D p p C p p p
C p p p p
=
= = = s
= = = = =
=
b) Discrete theoretical distribution Binomial distribution
The binomial distribution is discrete theoretical distribution Bi(n, p) with two
theoretical parameters n, p of random variable RV (the random variable has values
x
i
= i = 0,1, .,n).
The probability and distribution functions P
i
and F
i
as analogies of empirical relative
and cumulative frequency and theoretical moments O
j
, C
j
have for binomial distribution the
forms
( )
( ) ( )( )
( ) ( )( )
0
1 2 3 4
1 2 3
2
2 2 2
4
1 , where 0,1,...., , , where ,
theoretical moments , , ,
, 1 , 1 1 2 ,
3 1 1 1 6 6 .
i
n i
i
i i i
j
i i
n
P p p i n F P i n
i
O C C C
O E np C D np p C np p p
C n p p np p p p
=
| |
= = = s
|
\ .
= = = = =
= + +
c) Discrete theoretical distribution Poisson distribution
The Poisson distribution is discrete theoretical distribution Po() with one theoretical
parameter of random variable RV (the random variable has values
x
i
= i = 0,1, ., ).
37
The probability and distribution functions P
i
and F
i
as analogies of empirical relative
and cumulative frequency and theoretical moments O
j
, C
j
have for Poisson distribution the
forms
0
1 2 3 4
2
1 2 3 4
, where 0,1,...., , , where ,
!
theoretical moments , , ,
, , , 3 .
i i
i i i
j
i i
P e i F P i
i
O C C C
O E C D C C
=
= = = s
= = = = = = +
The binomial distribution Bi(n, p) may be approximated by Poisson distribution Po()
for n > 30 and for p 0 (p 0.1 is sufficient).
d) Discrete theoretical distribution Geometric distribution
The geometric distribution is discrete theoretical distribution Ge(p) with one theoretical
parameter p of random variable RV (the random variable has values
x
i
= i = 0,1, ., ).
The probabilities P
i
geometrically decreases with increasing values i. The independent
attempts are carried out and a probability taking the observed phenomenon (i.e. the
probability of success) is for all the attempts the same and equal to p. The probability of
success only in attempt i + 1 is given by probability function P
i
.
The probability and distribution functions P
i
and F
i
as analogies of empirical relative
and cumulative frequency and theoretical moments O
j
, C
j
have for geometric distribution
Ge(p) the forms
( )
0
1 2
1 2 2
1 , where 0,1, 2,...., , , where ,
theoretical moments ,
1 1
, .
i
i
i i i
j
i i
P p p i F P i
O C
p p
O E C D
p p
=
= = = s
= = = =
e) Discrete theoretical distribution Hypergeometric distribution
The hypergeometric distribution is discrete theoretical distribution HGe(N, M, n) with
three theoretical parameters N, M, n of random variable RV (the random variable has values
x
i
= i = max(0, M N + n),., min(M, n)).
38
The hypergeometric distribution, unlike the previous discrete distributions, has the
dependent repeated random attempts (e.g. it is worked with N elements, M elements of which
has observed sign and n elements is selected from these N elements without return).
The probability function P
i
as analogy of empirical relative frequency and theoretical
moments O
j
, C
j
have for hypergeometric distribution HGe(N, M, n) the forms
( )
1 2
1 2
, where max 0, ,..., min( , ),
theoretical moments ,
, 1 .
1
i
i i
M N M
i n i
P i M N n M n
N
n
O C
M M M N n
O E n C D n
N N N N
| || |
| |
\ .\ .
= = +
| |
|
\ .
| |
= = = =
|
\ .
The forms of the theoretical parameters O
1
, C
2
for N sufficiently large against n
correspond to forms of theoretical parameters O
1
, C
2
of binomial distribution Bi(n, p) with
probability
M
p
N
= .
The hypergeometric distribution HGe(N, M, n) may be approximated for
0, 05
n
N
s ,
M
p
N
=
by binomial distribution Bi(n, p).
The hypergeometric distribution HGe(N, M, n) may be approximated for small fractions
,
n M
N N
and for n large
0, 05, 0,1, 31,
n M M
n n
N N N
s s > =
by Poisson distribution Po().
39
f) Discrete theoretical distribution Multinomial distribution
The s-multiple multinomial distribution is discrete theoretical distribution
s-Multi(n,p
1
,.,p
s-1
) with s theoretical parameters n, p
1
,, p
s-1
(the random variables
RV
1
,, RV
s
have values marked i
1
,, i
s
= 0, 1,., n).
The distribution s-Multi(n, p
1
,, p
s-1
) is connected with incompatible random
phenomena A
1
,., A
s
which can come in n independent attempts with the probabilities
p
1
,., p
s
(the summation of probabilities is equal to 1, s-multiple multinomial distribution is
therefore only with s1 independent probabilities). The numbers of random phenomena A
i
occurrence in n attempts have the binomial distributions Bi(n, p
i
).
The probability function P
i
for multinomial distribution s-Multi(n, p
1
,,p
s-1
) has as
analogy of empirical relative frequency the form
1
1
1
,..., 1
1
1
1
!
... 1 .
!... ! !
s
j
j
s
s
n i
s
i i
i i s j
s
j
s j
j
n
P p p p
i i n i
=
=
=
| |
=
|
| |
\ .
|
\ .
The individual binomial distributions ( ) Bi ,
i
n p have the theoretical parameters
( )
1 2
, 1 .
i i i i i
O E np C D np p = = = =
The distribution of one random variable (s = 2) is binomial distribution Bi(n, p
i
). The
distribution of two random variables (s = 3) is trinomial distribution Tr(n,p
i
,p
j
). The
probability function P
ij
for trinomial distribution Tr(n,p
i
,p
j
) has the form
The multinomial distribution for n , p
i
0 (i=1,,s) may be approximated for
i
= np
i
(
i
are the finite numbers) by multi-dimensional Poisson distribution Po(
i
).
g) Continuous theoretical distribution Normal and standardized normal distribution
The normal distribution is continuous theoretical distribution N(,) of random variable
RV (the random variable acquires the values ( ) ; xe ). The normal distribution has two
theoretical parameters , . The standardized normal distrinution is continuous theoretical
( )
( )
1 2 1 2
!
1 .
! ! !
n i j
i j
ij
n
P p p p p
i j n i j
=
40
distribution N(0,1) of random variable U (the random variable acquires the values
( ) ; ue ). For standardized normal distribution the parameters , are standardized to
values 0, 1 by the substitution of the random variable RV by new random variable U
( ) ( )
2
, 0, 1.
E x D x
x x x
u E D
o o o o o
| | | |
= = = = =
| |
\ . \ .
The probability densities (x), (u) (corresponding with relative frequency), the
distribution functions F(x), F(u) (corresponding with cumulative frequency) and standardizing
conditions (corresponding with empirical standardizing condition) have the forms
( )
( )
( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
2
2
2
2 2
1 1
,
2 2
,
1, 1
x
u
t t
x e u e
F t x dx F t u du
F x dx F u du
o
o t t
= =
= =
= = = =
} }
} }
The theoretical parameters O
1
, C
2
can be calculated in the form
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
1 1
2 2 2
2 1 2
, 0
( ) , 1.
O E x x x dx O E u u u du
C D x x O x dx C D u u u du
o
= = = = = =
= = = = = =
} }
} }
h) Continuous theoretical distribution Lognormal distribution
The lognormal distribution is continuous theoretical distribution LN(, ) of random
variable RV which is increasing function of random variable Y in the form x = e
y
(the random
variable Y has normal distribution N(, )). The lognormal distribution has two theoretical
parameters , .
The probability density (x) (corresponding with relative frequency) has the form
( )
( )
2
2
ln
1
exp , where 0 .
2
2
x
x x
x
o
o t
| |
| = < <
|
\ .
41
The theoretical parameters O
k
, O
1
, C
2
can be calculated in the form
( ) ( )
( )
( ) ( )( )
2 2
0
2
2
1 2
2 2 2
2 2 1
exp
2
exp , exp 2 2 ,
2
exp 2 exp 1 .
k k
k
k
O E x x x dx k
O O
C D x O O
o
o
o
o o
| |
= = = +
|
\ .
| |
= + = +
|
\ .
= = = +
}
2.1.4. Apparatus of Non-parametric Testing
The use of apparatus of the zero hypotheses H
0
and the alternative hypotheses H
a
is the
foundation of the testing non-parametric (but also parametric) hypotheses.
In the case of non-parametric hypotheses the zero hypothesis supposes that empirical
distribution can be substituted by intended theoretical distribution (regarding the substitution
by normal distribution it had been a test of normality). An alternative hypothesis then
supposes that this presumption isnt correct. A comparison between theoretical and empirical
absolute frequencies is the essence of testing non-parametric hypotheses. The empirical
absolute frequencies are calculated by means of elementary statistical processing in relation to
the empirical distribution. The theoretical absolute frequencies are then calculated through
probability function or probability density in relation to the intended theoretical distribution.
The parametric hypotheses relate to a comparison of empirical and theoretical
parameters and the zero and alternative hypotheses play the similar role here.
.
For the verification of non-parametric and parametric hypotheses the special group of
theoretical distributions was developed these distributions are not intended to replace the
empirical distributions but they work as statistical criteria. The normal distribution is the only
exception in its standardized shape it may play a role of statistical criterion, in its non-
standardized shape may substitute the empirical distributions.
Standardized normal distribution (u-test), Student distribution (t-test), Pearson
2
distribution (
2
-test, chi-square) and Fisher-Snedecor distribution (F-test) belong among the
most frequent statistical criteria. The detailed statistical tables are elaborated for all presented
statistical criteria.
42
For verification of hypotheses H
0
and H
a
the suitable statistical criterion is needful to
select. The
2
-test is used the most frequently for verification of a non-parametric hypothesis.
If the creation of interval division of frequencies is a condition for its application, it is then
needful to connect the each partial interval with the absolute frequency equal to at least 5. If
this condition isnt fulfilled it is necessary to connect the partial intervals. Similarly, it is
necessary to proceed to the interval division of frequencies.
After the selection of statistical criterion (e.g.,
2
-test) it is needful to come up to the
determination of experimental value of this criterion (e.g.,
2
exp
_ ) and critical theoretical value
(e.g.,
2
teor
_ ). So called the critical domain W of relevant statistical criterion will be recorded
by means of the critical theoretical value.
If the experimental value of selected criterion will be an element of the critical domain
W it is necessary to receive the alternative hypothesis H
a
i.e. the empirical distribution
cannot be substituted by intended theoretical distribution. In the contrary case (the
experimental value will not be an element of the critical domain W) the zero hypothesis H
0
can be received i.e. the empirical distribution can be substituted by intended theoretical
distribution.
The determination of significance level is an essential element of testing non-
parametric and parametric hypotheses. This significance level quotes the probability of
erroneous rejection of tested hypothesis (i.e. the probability of the error of I. type). The most
frequent significance levels are the values = 0.05 and = 0.01. E.g., the significance level
0.05 enables for the positive test of normality (i.e. it is received the hypothesis H
0
on the
possibility to substitute the empirical distribution by normal distribution and the hypothesis H
a
is refused) to determine the conclusion if the selective statistical set SSS will be selected
100 times from basic statistical set BSS, in 95 cases it will be shown the empirical distribution
can be substituted by normal distribution.
The proper procedure of non-parametric testing can be exercised by means of the
solution of the assigned example.
43
2.1.5. Illustration of Non-parametric Testing
Within the assigned example it is now possible to monitor the procedure for the
verification of the zero hypotheses H
0
that the empirical distribution in figure Fig.2 can be
substituted by a normal distribution (see Fig.4).
In the course of testing the
2
-test will be applied, in the course of its application the
letter k will be to refer to the number of intervals of frequency interval division, the letter r
then to the number of normal distribution theoretical parameters (i.e. r = 2). The formulation
= kr1 expresses the number of freedom degrees which enables together with a selected
level of significance to determine the critical theoretical value
2
teor
_ =
2
- -1 k r
_ using statistical
tables. The significance level is selected = 0,05.
The letter F marks the Laplace function depending on standardized random variable u
i
(u
i
is standardized value reflecting the upper limit x
i
of relevant interval of frequency interval
division). The probabilities p
i
(expressed by integral calculus) are given by the difference of
Laplace function values, the products n.p
i
then express the theoretical absolute frequencies,
the values n
i
denote the empirical absolute frequencies (see tables Tab.1 and Tab.2).
The calculation of standardized values u
i
using the relation
1 i
i
x
x O
u
S
=
(general moment of 1.order O
1
= 2,5, standard deviation S
x
= 1, the upper limits x
i
are
x
1
= 1,5,
x
2
= 2,5,
x
3
= 3,5,
x
4
= 4,5,
x
5
= )
leads to the values
u
1
= 1,
u
2
= 0,
u
3
= 1,
u
4
= 2,
u
5
= .
44
The calculation of probabilities p
i
using the integral calculus and using the Laplace
function values F(u):
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
1,5 1
1 1
2,5 0
2 2
1,5 1
3,5 1
3 3
2,5 0
4,5 2
4 4
3,5 1
5 4
4,5 2
, 1
, 0 1
, 1 0
, 2 1
, 2
p x dx p u du F
p x dx p u du F F
p x dx p u du F F
p x dx p u du F F
p x dx p u du F F
= = =
= = =
= = =
= = =
= = =
} }
} }
} }
} }
} }
The application of
2
-test form
( )
( ) ( )
2
2
exp 1
1
,
k
i i
i i i
i
i
n np
p F u F u
np
_
=
= =
already enables to realize the needful partial calculations (see table Tab.3).
x
i
Interval n
i
u
i
F(u
i
) p
i
np
i
1 ( ; 1,5) 9 1 0,1625 0,1625 8,125
2 ( 1,5; 2,5) 15 0 0,5000 0,3375 16,875
3 ( 2,5; 3,5) 20 1 0,8175 0,3175 15,875
4 ( 3,5; 4,5) 4 2 0,9754 0,1579 7,895
5 ( 4,5; ) 2 1,0000 0,0246 1,230
Table Tab.3: The calculations of u
i
, F(u
i
), p
i
and n.p
i
The table Tab.4 reacts to the requirement at least 5 and more measurement results must
be in each interval in the course of normality test. The neighbouring intervals come together
to reach the 5 and more measurement results. At the same time the additional calculations,
enabling to establish the experimental value of statistical criterion, are carried out in this table.
45
x
i
n
i
np
i
2
( - ) i i
i
n np
np
1 9 8,1 0,100
2 15 16,9 0,214
3 20 15,9 1,057
4 + 5 6 9,1 1,056
= 2,427 =
2
exp
_
Table Tab.4: The adjustment of intervals number, the calculation of
2
exp
_
In the final part of non-parametric testing it was needful to determine the critical
theoretical value
2
teor
_ =
2
v
_ =
2
- -1 k r
_ =
2
4-2-1
_ =
2
1
_ = 3.84 using the calculated number of
freedom degrees = k r 1 = 4 2 1 = 1 and using the statistical tables with significance
level = 0.05. By means of the critical theoretical value already it was possible to record the
right-sided critical domain W =
( )
2
, ) 3.84, ). _ o =
For the experimental value of statistical criterion
2
exp
_ = 2.427 (i.e.
2
exp
_
e
W) it is
possible to do the conclusive verdict related to non-parametric hypothesis test:
The experimental value
2
exp
_ doesnt belong to critical domain, the zero hypothesis H
0
can be received and the empirical distribution (empirical polygon) can be substituted by
theoretical normal distribution with the significance level = 0.05. This conclusion is of
considerable importance in the course of deducing the additional information it is possible
to use not only the simple mathematical apparatus connected with normal distribution but also
in the course of parametric hypotheses testing it is possible to apply the testing techniques
which are just bound to the normal distribution.
46
2.2. Comparison of Empirical and Theoretical Parameters Estimations of
Theoretical Parameters, Testing Parametric Hypotheses
Goals:
- Probable investigation of selective statistical set: Quantification of theoretical
parameters, Comparison between theoretical and empirical parameters
- Probable picture of selective statistical set: Point & interval estimation e.g.
confidence interval, Testing parametric hypotheses
Acquired concepts and knowledge pieces:
Point estimation
Interval estimation
Confidence interval
Confidence interval for mean value
Confidence interval for standard deviation
Testing parametric hypotheses
Computed u-statistic
Computed t-statistic
Computed F-statistic
Computed chi-square statistic
47
Check questions:
Why do the estimations of theoretical parameters come before the comparison of theoretical
and empirical parameters?
What conditions must good point estimation fulfil?
What are the methods of point estimations?
What are the advantages of interval estimations?
Describe the way of confidence intervals construction
Which are the statistical criteria used for confidence intervals construction?
What is the apparatus of parametric testing?
What is the difference between one-selective and two-selective testing parametric
hypotheses?
What is the procedure for parametric testing?
Present a survey of the most general statistical criteria
.
48
Another of the main methods of statistics Comparison of empirical and theoretical
parameters builds on Assignment of theoretical distribution to empirical distribution. The
theoretical distribution is identified and assigned by non-parametric testing, but it contains
still the unknown values of theoretical parameters. Before an implementation of comparison
between empirical and theoretical parameters it is needful to estimate the theoretical
parameters. Then it is possible to approach to a comparison between empirical and theoretical
parameters with the application of parametric testing apparatus.
2.2.1. Basics of Estimation Theory
It is necessary to estimate the theoretical parameters (e.g. mean value E = and
dispersion D =
2
for the normal distribution). Two kinds of the theoretical parameters
estimations can be: the point and the interval ones.
The good point estimations should fulfil the conditions of consistency, impartiality,
abundance and sufficiency. Here these conditions are reminded only, more detailed
information can be obtained in a literature dealing with estimation theory. The point
estimation can be carried out by moment method or by method of maximum likelihood. The
moment method is based on the effect that the empirical parameters are considered the
estimations of corresponding theoretical parameters. The method of mathematical likelihood
is essentially mathematically more demanding. The disadvantage of point estimations consists
above all in the ignorance of exactness which the estimation was done with.
The interval estimations remove the problem of estimation exactness ignorance. They
are trying to construct an interval providing the reasonable guarantee (sufficiently high
probability) the real value of theoretical parameter is located inside interval. This probability
relates to the selection of significance level again and the constructed interval then bears the
name 100 (1)% confidence interval (e.g., for = 0,05 the point will be 95% confidence
interval).
a) The construction of confidence interval for mean value of normal distribution using u-test
(the condition of construction the variance
2
is assigned in advance) works on the form of
statistical criterion
1
O
u n
= .
49
The critical values are u(/2), u(/2), the conditions for construction of confidence
interval can be recorded in the form of inequalities u(/2)< u< u(/2). After the solution of
presented inequalities it is possible to obtain the confidence interval (the interval estimation
of ):
( ) ( )
1 1
2 2
;
u u
O O
n n
o o
o o
| |
|
e +
|
|
\ .
.
b) The construction of confidence interval for mean value of normal distribution using t-test
(the condition of construction the variance
2
isnt assigned in advance) works on the form
of statistical criterion
1
x
O
t n
S
= .
The critical values are t
n1
(/2), t
n1
(/2), the conditions for construction of confidence
interval can be recorded in the form of inequalities t
n1
(/2) < t < t
n1
(/2). After the solution
of presented inequalities it is possible to obtain the confidence interval (the interval estimation
of ):
( ) ( ) 1 1
1 1
2 2
;
n x n x
t S t S
O O
n n
o o
| |
|
e +
|
|
\ .
,
c) The construction of confidence interval for variance
2
of normal distribution using
2
-testu (the condition of construction the empirical variance S
x
2
is needful to calculate)
works on the form of statistical criterion
( )
2
2
2
1
x
n S
_
o
=
The critical values are
( ) ( )
2 2
1 1
1 ,
2 2
n n
o o
_ _
, the conditions for construction of
confidence interval can be recorded in the form of inequalities
( ) ( )
2 2 2
1 1
1 < <
2 2
n n
o o
_ _ _
. After the solution of presented inequalities it is possible to
obtain the confidence interval (the interval estimation of
2
):
50
( )
( )
( )
( )
2 2
2
2 2
1 1
1 1
;
1
2 2
x x
n n
n S n S
o
o o
_ _
| |
|
e
|
|
\ .
.
2.2.2. Illustration of Confidence Intervals Construction
a) Within the assigned example the construction of confidence interval will be carried out for
mean value using t-test.
The confidence interval is given by form:
( ) ( ) 1 1
1 1
2 2
;
n x n x
t S t S
O O
n n
o o
| |
|
e +
|
|
\ .
For the significance level = 0.05, for the extent n = 50 of selective statistical set SSS,
for standard deviation S
x
= 1 (approximative value) and for the arithmetic mean O
1
= 2.5 the
critical values are, according to the statistical tables, equal to t
49
(0.025) = 1.96 (for
freedom degrees number n1 > 33 it is possible to apply the statistical table for u-test).
After implementation into 95% confidence interval it is possible to obtain
( ) 2.221; 2.779 e .
b) Within the assigned example the construction of confidence interval will be carried out for
variance
2
using
2
-test.
The confidence interval is given by form:
( )
( )
( )
( )
2 2
2
2 2
1 1
1 1
;
1
2 2
x x
n n
n S n S
o
o o
_ _
| |
|
e
|
|
\ .
.
For the significance level = 0.05, for the extent n = 50 of selective statistical set SSS,
for standard deviation S
x
= 1 (approximative value) the critical values are according to the
statistical tables
2 2
49 49
2 2
49 49
(1 ( / 2) ) (0.975) 30.60
( / 2) (0.025) 70.22
_ o _
_ o _
= =
= =
After implementation into 95% confidence interval it is possible to obtain
( ) ( )
2
0.705; 1.617 , 0.839; 1.272 o o e e .
51
2.2.3. Basics of Parametric Hypotheses Testing
The parametric hypotheses testing again works on the apparatus of zero hypothesis H
0
and alternative hypotheses H
a
. This apparatus shall be accompanied by usual apparatus of
critical domain W. Due to the central limit theorem it is the natural assumption that the
normal distribution, as the most suitable theoretical distribution, may be assigned to empirical
distribution.
The parametric testing can be divided into one-selective testing hypotheses of the mean
value or of the variance (then the one-selective tests u-test and t-test are used for mean value
and one-selective
2
-test for variance) and into two-selective testing hypotheses of an equality
of the mean values or of the variances (then the two-selective tests u-test and t-test are used
for an equality of the mean values and two-selective F-test for an equality of the variances).
In the case of one-selective testing the hypothesis H
0
and H
a
can be written in the form
H
0
: =
0
or H
0
: =
0
, H
a
: =
0
or H
a
: =
0
.
The one-selective parametric testing works on the comparison between an empirical
parameter or an empirical parameter (by these symbols the results of elementary statistical
processing of selective statistical set SSS are marked, by means of these results the relevant
theoretical parameters , of corresponding normal distribution were estimated) and some
external theoretical data
0
,
0
, origin of which can be various (study of literature, research
reports, commercial indicators and the like). By the collective denominator of these external
data it can be the determination that they probably characterize the certain significant basic
statistical set BSS. The one-selective parametric testing, then from the point of view of the
mathematical statistics, answers the question whether the investigated selective statistical set
SSS could be chosen from the described significant basic statistical set BSS. In the case of
hypotheses H
0
verification it is possible to look at the results of selective statistical set SSS
investigation in the context created by basic statistical set BSS. In the case of hypothesis H
a
acceptance it is not possible to work on this context.
In the case of two-selective testing the hypothesis H
0
and H
a
can be written in the form
H
0
:
1
=
2
or H
0
:
1
=
2
, H
a
:
1
=
2
or H
a
:
1
=
2
.
52
The two-selective parametric testing works on the comparison between an empirical
parameter
1
or an empirical parameter
1
(by these symbols the results of elementary
statistical processing of selective statistical set SSS
1
are marked, by means of these results the
relevant theoretical parameters
1
,
1
of corresponding normal distribution were estimated)
and some external theoretical data
0
,
0
, origin of which can be usually found in the
investigation results of another selective statistical set SSS
2
. The two-selective parametric
testing, then from mathematical statistics point of view, answers the question whether both of
selective statistical sets SSS
1
and SSS
2
have investigated an analogous problem and whether
these sets can co-operate. In the case of confirmation of the hypotheses H
0
it is possible to
consider the selective sets SSS
1
and SSS
2
the selective sets chosen from the same basic
statistical set BSS and usually the endeavour to identify the set BSS is worth. In the case of
acceptance of the hypotheses H
a
it is necessary, from mathematical statistics point of view, to
articulate the doubts as to the compatibility of the sets SSS
1
and SSS
2
.
The procedure for parametric testing is similar to the procedure for non-parametric
testing. First, it is needful to formulate a zero and an alternative hypothesis and to select the
significance level . Then it is needful to select a suitable statistical criterion (u-test, t-test,
2
-test, F-test), to discover its critical value and to record a corresponding critical domain W.
Finally it is necessary to approach to the calculation of statistical criterion empirical value and
to determine if it is or it isnt the element of critical domain W. If the empirical value is an
element of domain W it is necessary to accept the alternative hypothesis H
a
, in the opposite
case then the zero hypothesis H
0
.
Survey of some one-selective statistical criteria (n the extent of set SSS):
a) One-selective u-test (the testing hypothesis about the mean value of the known variance
2
)
( ) ( )
0
exp
, ( ; 2 2 ; ) u n W u u
o o
o
= = ) ( .
b) One-selective t-test (the testing hypothesis about the mean value of the unknown
variance
2
)
( ) ( )
0
exp 1 1
, ( ; 2 2 ; )
n n
x
t n W t t
S
o o
= = ) ( .
53
c) One-selective
2
-test (the testing hypothesis about the variance of the unknown
parameters ,
2
)
( )
( ) ( )
2
2 2 2
exp 1 1 2
0
1
, 0; 1 2 2 ; )
n n
n
W
o
_ _ o _ o
o
= = ( ) ( .
Survey of some two-selective statistical criteria:
a) Two-selective u-test (the testing hypothesis about the equality of mean values of the known
variances
1
2
,
2
2
), n
1
, n
2
are the extents of selective statistical sets SSS
1
, SSS
2
( ) ( )
1 2
exp
2 2
1 2
1 2
, ( ; 2 2 ; ) u W u u
n n
o o
o o
= = ) (
+
.
b) Two-selective t-test (the testing hypothesis about the equality of mean values of the
unknown variances
1
2
,
2
2
), n
1
, n
2
are the extents of selective statistical sets SSS
1
, SSS
2
,
S
x1
, S
x2
are the empirical standard deviations of selective statistical sets SSS
1
, SSS
2
c) Two-selective F-test (the testing hypothesis about the equality of variances of the unknown
parameters
1
,
2
,
1
2
,
2
2
), n
1
, n
2
are the extents of selective statistical sets SSS
1
, SSS
2
,
S
x1
, S
x2
are the empirical standard deviations of selective statistical sets SSS
1
, SSS
2
( ) ( )
1 2 1 2
2
1
exp
2
2
1, 1 1, 1
0; 1 2 2 ; )
x
x
n n n n
S
F
S
W F F o o
=
= ( ) (
.
( ) ( )
( )
( ) ( )
1 2 1 2
1 2 1 2
1 2
exp
2 2
1 2
1 1 2 2
2 2
2
,
1 1
( ; 2 2 ; )
x x
n n n n
n n n n
t
n n
n S n S
W t t
o o
+ +
+
=
+
+
= ) (
54
The remark: The larger square power of square powers of the standard deviations S
x1
2
, S
x2
2
is
usually put into the numerator of statistical criterion
2
1
exp 2
2
x
x
S
F
S
= .
From this point of view the right-sided critical domain W = ( )
1 2
1, 1
; )
n n
F o
( with the value
instead of value /2 is usually used.
d) The paired t-test (the transformation of two-selective t-test on one-selective t-test on the
basis of the zero hypothesis H
0
:
1
2
= A where the most frequent A = 0).
2.2.4. Illustration of Parametric Testing
a) Assigned example testing hypotheses about mean value
Determine if the investigated selective statistical set SSS ( = 2.5, n = 50) could be, for
the significance level = 0.05, selected from the basic statistical set BSS which is
characterized by the mean value a1)
0
= 2.6, a2)
0
= 2.9.
The information about variance is missing it is needful to use the one-selective t-test:
( ) ( )
0
exp 1 1
, ( ; 2 2 ; )
n n
x
t n W t t
S
o o
= = ) (
The formulation of zero and alternative hypothesis: H
0
: =
0
, H
a
:
0
The determination of critical values and and critical domain:
t
49
(0.025) = u(0.025) = 1.96, W = ( ; 1.96) ( 1.96; )
The calculation of statistical criterion experimental value for the case a1)
t
exp
= 0.704, t
exp
eW
The result interpretation:
The experimental value t
exp
doesnt belong to the critical domain, on the significance
level = 0.05 it is possible to accept the zero hypothesis H
0
. The investigated selective
statistical set could be selected from an external set BSS. The difference
0
is statistically
55
unimportant for the significance level = 0.05 (it can be noted that the value
0
is the
element of the 95% confidence interval in the case a1))
The calculation of statistical criterion experimental value for the case a2):
t
exp
= 2.814, t
exp
eW
The result interpretation:
The experimental value t
exp
is the element of the critical domain, on the significance
level = 0.05 it is possible to refuse the zero hypothesis H
0
. The investigated selective
statistical set SSS couldnt be selected from an external set BSS. The difference
0
is, on
the significance level = 0.05, statistically important (it can be noted that the value
0
isnt
the element of the 95% confidence interval in the case a2))
b) Assigned example testing hypothesis about variance
Determine if the investigated selective statistical set SSS ( = 2.5, S
x
= = 1.005,
n = 50) could be, for the significance level = 0.05, selected from the basic statistical set BSS
which is characterized by the standard deviation b1)
0
= 1, b2)
0
= 0.5.
The one selective
2
-test will be used:
( )
( ) ( )
2
2 2 2
exp 1 1
2
0
1
, W 0; 1 2 2 ; )
n n
n o
_ _ o _ o
o
= = ( ) ( .
The formulation of zero and alternative hypothesis: H
0
: =
0
, H
a
: =
0
.
The determination of critical values and and critical domain:
( )
2
49
0.975 30.60 _ = , ( )
2
49
0.025 70.22 _ = , W 0; 30.60 70.22; ) = ( ) ( .
The calculation of statistical criterion experimental value for the case b1):
2 2
exp exp
49.49, W _ _ = e
56
The result interpretation:
The experimental value
2
exp
_ doesnt belong to the critical domain, on the significance
level = 0.05 it is possible to accept the zero hypothesis H
0
. The investigated selective
statistical set SSS could be selected from an external set BSS. The quotient between and
0
is statistically unimportant for the significance level = 0,05 (it can be noted that the value
0
is the element of the 95% confidence interval in the case b1))
The calculation of statistical criterion experimental value for the case b2):
2 2
exp exp
197.96, W _ _ = e
The result interpretation:
The experimental value
2
exp
_ belongs to the critical domain, on the significance level
= 0.05 it isnt possible to accept the zero hypothesis H
0
. The investigated selective statistical
set SSS couldnt be selected from an external set BSS. The quotient between and
0
is, on
the significance level = 0,05, statistically important (it can be noted that the value
0
isnt
the element of the 95% confidence interval in the case b2))
c) Assigned example testing hypotheses about equality of mean values
An analogous observation of the export ability as within the assign example (here it
was investigated the selective statistical set SSS
1
n
1
= 50 enterprises with the result
1
= 2.5)
has led to the average export ability c1)
2
= 2.6, c2)
2
= 2.9 for n
2
= 100 enterprises (the
variances were comparable, but the information about variance size is missing it is needful
to use two-selective t-test). Determine if this selective statistical set SSS
2
could be, for the
statistical significance level = 0.05, selected from the same basic statistical set BSS as the
set SSS
1
.
The two-selective t-test will be used:
( ) ( )
( )
( ) ( )
1 2 1 2
1 2 1 2
1 2
exp
2 2
1 2
1 1 2 2
2 2
2
,
1 1
W ( ; ; )
2 2
x x
n n n n
n n n n
t
n n
n S n S
t t
o o
+ +
+
=
+
+
= ) (
57
The formulation of zero and alternative hypothesis: H
0
:
1
=
2
, H
a
:
1
2
The determination of critical values and and critical domain:
t
148
(0.025) = 1.96, W = ( ; 1.96) ( 1.96; )
The calculation of statistical criterion experimental value for the case c1):
t
exp
= 0.574, t
exp
eW
The result interpretation:
The experimental value t
exp
doesnt belong to the critical domain, it is possible to accept
the zero hypotheses H
0
for the significance level = 0.05. The investigated selective
statistical set SSS
1
and the additional selective set SSS
2
could be selected from one and the
same external set BSS. The difference between
1
and
2
is statistically unimportant with the
significance level = 0.05.
The calculation of statistical criterion experimental value for the case c2):
t
exp
= 2.298, t
exp
eW
The result interpretation:
The experimental value t
exp
belongs to the critical domain, on the significance level
= 0.05 it isnt possible to accept the zero hypothesis H
0
. The investigated selective set SSS
1
and the additional selective set SSS
2
couldnt be selected from one and the same external set
BSS. The difference between
1
and
2
is statistically important with the significance level
= 0.05.
d) Assigned example testing hypotheses about equality of variances
An analogous observation of the export ability as within the assign example (here it
was investigated the selective statistical set SSS
1
n
1
= 50 enterprises with the result
S
x1
2
=
1
2
=1.01) has led to the average export ability for n
2
= 100 enterprises which enabled
the calculation of variance d1) S
x2
2
=
2
2
= 1, d2) S
x2
2
=
2
2
= 1.631. Determine if this selective
statistical set SSS
2
could be, for the statistical significance level = 0.05, selected from the
same basic statistical set BSS as the set SSS
1
.
58
The two-selective F-test (with the right-sided critical domain W) will be used:
( )
1 2
2
1
exp 1, 1
2
2
, W ; )
x
n n
x
S
F F
S
o
= = ( for the case d1),
( )
1 2
2
2
exp 1, 1
2
1
, W ; )
x
n n
x
S
F F
S
o
= = ( for the case d2).
The formulation of the zero and right-sided alternative hypothesis:
H
0
:
1
=
2
, i.e. S
x1
= S
x2
H
a
:
1
>
2
, tj. S
x1
> S
x2
(the case d1))
H
0
:
2
=
1
, i.e.. S
x2
= S
x1
H
a
:
2
>
1
, tj. S
x2
> S
x1
(the case d2))
The determination of critical value and right-sided critical domain:
F
49,99
(0.05) = 1.545, W = ( 1.545; )
The calculation of statistical criterion experimental value for the case d1):
F
exp
= 1.01, F
exp
e W
The result interpretation:
The experimental value F
exp
doesnt belong to the critical domain, it is possible to
accept the zero hypothesis H
0
for the significance level = 0.05. The investigated selective
statistical set SSS
1
and the additional selective set SSS
2
could be selected from one and the
same external set BSS. The difference between S
x1
2
= 1.01 and S
x2
2
= 1 is statistically
unimportant with the significance level = 0.05.
The calculation of statistical criterion experimental value for the case d2):
F
exp
= 1.615, F
exp
e W
The result interpretation:
The experimental value F
exp
belongs to the critical domain, on the significance level
= 0.05 it is possible to refuse the zero hypothesis H
0
. The investigated selective set SSS
1
and
the additional selective set SSS
2
couldnt be selected from one and the same external set BSS.
The difference between S
x1
2
= 1.01 and S
x2
2
= 1.631 is statistically important with the
significance level = 0.05.
59
2.3. Measurement of Statistical Dependences Some Fundaments
of Regression and Correlation Analysis
Goals:
Association investigation: Statistical dependence causal, non-causal
Association picture of selective statistical set: Regression analysis, Correlation analysis
Acquired concepts and knowledge pieces:
Simple and multiple selective statistical set
Statistical dependence
Simple and multiple regression dependence
Linear and nonlinear regression dependence
Regression analysis
Simple and multiple correlation
Correlation analysis
Pearson correlation coefficient
60
Check questions:
What is the difference between simple and multiple statistical set?
What is the statistical dependence?
What is the difference between simple and multiple regression and correlation analysis?
Wherein do the regression analysis basic tasks lie?
Wherein do the correlation analysis basic tasks lie?
What is the method of the least squares?
What is the normal equations system for simple linear and quadratic regression?
What is the difference between Pearson correlation coefficient and correlation index
2.3.1. Delimitation of Problem
The simple selective set SSS was investigated hitherto, only one statistical sign was
explored for the statistical units of this set. The statistical dependences measurement is
connected with a multiple selective set SSS, it will be simultaneously explored more
statistical signs for the statistical units.
The statistical dependence between the signs x, s is given by an instruction which
assigns exactly one empirical distribution of the frequencies of statistical sign s (the values of
sign s have to show the character of a random variable) to measured or entered values of sign
x (the values of sign x contrarily not has to have the character of a random variable).
61
The simple (paired) regression dependence then generally is one-sided dependence of
the given random variable s on another variable x (not necessarily random) the point is an
inestigation of two-dimensional selective statistical set SSS. The multi-dimensional
(multiple) regression dependence is the dependence of given random variable s on the larger
number of another variable x, y, z, (not necessarily random) the point is an investigation
of multiple set SSS.
The concept correlation dependence is the narrower concept than regression
dependence. The simple (paired) correlation can be understood as the mutual dependence of
two random variables (two statistical signs x, s) which is associated, for a change of values of
one statistical sign (either x or s), with a change of the arithmetic mean deduced from the
exploration of the second statistical sign (either s or x). In the continuity with the dependence
of larger number of random variables (statistical signs) it would be possible analogously to
define the multiple correlation.
The definitions of regression and corretation dependence are different from the
definitions of the functions of one or more variables, and so from the functional dependences.
The part of mathematical statistics, which deals with the study of regression and
correlation dependences, is called regeression and correlation analysis.
The basic tasks of regression analysis consist in the detection of suitable regression
function for the expression of observed dependence, in the point and interval estimation of
the parameters and the values of theoretical regression function and in the verification of
harmony of regression function with experimental data. According to the type of the
appropriate theoretical regression function it can be spoken also about the types of regression
analysis e.g. on polynomial regression, exponential regression, logarithmic regression,
hyperbolic regression and the like. The following explanation will be aimed at the seeking of
the suitable theoretical regression functions.
The basic tasks of correlation analysis consist in the measurement of correlation
tightness (strength, intensity). The problems of simple linear and non-linear correlation is
usually investigated, provided that the changes of random variables x, s (statistical signs x, s)
are correctly expressed by linear or non-linear regression function. Also for an investigation
of multiple correlation it is worked on the dependence description which is given by
62
regression function. The tasks of correlation analysis can be then transferred to the seeking of
correlation coefficients as the basic measures of tightness of the given correlation type. In
addition to using the correlation coefficients associated with the metric scales it is also
essential to explore the coefficients of ordinal correlation these are worked on the ordinal
scales. The following explanation will be aimed only at the use of a simple relation for the
linear correlation coefficient.
On the basis of the reduction of the number of investigated statistical signs of the two
the problem of regression dependences measurement can be described in a simplified form.
Two-dimensional selective statistical set SSS is connected with the exploration of two
statistical signs SS-x and SS-s. The metric scale with elements x
1
, x
2
, , x
n
is associated with
the sign x (the elements of scale were measured and the results of these measurements are
given by the absolute frequencies of individual elements), the measurement results
s
1
, s
2
, , s
n
are then connected with the sign s (the absolute frequencies measured for the
sign x are included in these results). By this way the measurement results are at disposal in
the form of n ordered pairs |x
i
, s
i
|.
On the basis of described simplification it is possible to use the method of least squares
in measuring the dependence between the signs SZ-x and SZ-s (the condition is that the
measurement errors of sign SZ-s, whose the values show the character of special random
variable, have the zero mean value and the same, although unknown, but the final variance).
Let the theoretical regression function generally described within the simple regression by an
equation y = f(x). The summation of least squares can be then expressed by relation
S = (s
i
- y
i
)
2
where y
i
are the values of function y = f(x) corresponding to the values x = x
i
.
The method of least squares then consists in the seeking of regression function y = f(x) by
means of the minimum value of summation S.
2.3.2. Simple Linear and Quadratic Regression Analysis
The way of the regression function seeking will be described by means of the graphical
delimitation of problem in the figure Fig.5 Simple linear regression analysis. In this figure it
is work on n = 5 of the ordered pairs |x
i
, s
i
|, which characterize the statistical dependence
between statistical signs SS-x and SS-s. The scale elements x
1
, x
2
, , x
5
, connected with the
statistical sign x, are deposited on the horizontal axis. The measurement results s
1
, s
2
, , s
5
of
the sign s (the absolute frequencies, measured for the sign x, are already included in these
63
results) are deposited on the vertical axis. The ordered pairs |x
i
, s
i
| are the coordinates of five
points A
1
|x
1
, s
1
|, A
2
|x
2
, s
2
|, A
3
|x
3
, s
3
|, A
4
|x
4
, s
4
|, A
5
|x
5
, s
5
|. These 5 points graphically
express the dependence between the signs SS-x and SS-s. The goal of simple linear regression
analysis is to express this statistical dependence by the straight line the analytical expression
of which is given by the usual form y = b
0
+ b
1
.x for polynomial function of the 1.order.
Fig.5 Simple linear regression analysis
The least squares method is aimed at the seeking of minimum value of expression
S = (s
i
y
i
)
2
in which the adding index i acquires the values i = 1, 2, , 5. Through y
i
it will
be installed y
i
= b
0
+ b
1.
x
i
and it will be looked for the minimum of function S which is the
function of two variables b
0
a b
1
, i.e. S = g(b
0
, b
1
).
64
The conditions for the seeking of minimum are given by the realization of partial
derivatives of function S according to both variables and by their annulment (for the persons
interested in the exact seeking of function extremes with more variables it is possible to
recommend to acquaint themselves with Sylvestr theorem from the area of mathematical
analysis).
The conditions for the seeking of minimum of function S can be recorded in the form
0 b
S
c
c
= 0,
1 b
S
c
c
= 0.
Obtained system of the equations is called the system of normal equations for simple
linear regression and after the realization of derivatives it acquires the known form
Es
i
= nb
0
+ b
1
Ex
i
Es
i
x
i
= b
0
Ex
i
+ b
1
Ex
i
2
.
The adding index i generally acquires the values i = 1, 2, , n. The values of
parameters b
0
, b
1
can be obtained through the solution of normal equations system and then it
is possible to record the straight line equation y = b
0
+ b
1
.x. The predictions of values s
i
corresponding with the relevant values x
i
for i > 5 can be then done according to the figure
Fig.5 through the obtained regression function. The predictions of the time or also the
comparative trends would not be possible without the realization of linear regression analysis.
By the analogous way it is possible to explain the fundaments of simple quadratic
regression. In this case the investigated statistical dependence would be expressed by
polynomial function of 2.order the graph of which is a parabola. The analytical expression
y = f(x) of a parabola is given by the equation y = b
0
+ b
1
x + b
2
x
2
, the method of least squares
leads again to the seeking of minimum of function S = (s
i
y
i
)
2
. This function
S = h(b
0
,b
1
,b
2
) is function of three variables, for the discovery of minimum the three partial
derivatives are already needful and their annulment leads to the normal equations system
0
S
b
c
c
= 0 .
1
S
b
c
c
= 0 .
2
S
b
c
c
= 0.
After the realization of derivatives the normal equations system for simple quadratic
regression acquires the form
65
Es
i
= nb
0
+ b
1
Ex
i
+ b
2
Ex
i
2
Es
i
x
i
= b
0
Ex
i
+ b
1
Ex
i
2
+ b
2
Ex
i
3
Es
i
x
i
2
= b
0
Ex
i
2
+ b
1
Ex
i
3
+ b
2
Ex
i
4
.
The adding index i acquires the values i = 1, 2, ,5 in the figure Fig.5, in the general
case then the values i = 1, 2, , n (in the case of quadratic regression the group of points
A
1
|x
1
, s
1
|, A
2
|x
2
, s
2
|, A
3
|x
3
, s
3
|, A
4
|x
4
, s
4
|, A
5
|x
5
, s
5
| should naturally map the progress of
the parabola instead of the straight line). The values of parameters b
0
, b
1
, b
2
can be obtained
by the solution of normal equations system and then it is possible to record the parabola
equation y = b
0
+ b
1
.x + b
2
.x
2
. The predictions of values s
i
corresponding with the relevant
values x
i
for i > 5 can be then done according to the figure Fig.5 by means of obtained
regression function. The predictions of the time or also the comparative trends would not be
possible without the realization of quadratic regression analysis.
2.3.3. Simple Linear and Quadratic Correlation Analysis
For the delimitation of problem it is again possible to use the graphical way indicated
by means of the figure Fig.5. After the realization of simple linear regression analysis (the
result is indicated by the drawn straight line in Fig.5) it is possible to approach to the
determination of statistical dependence tightness between the statistical signs SS-x and SS-s
of investigated selected statistical set SSS.
The most used measure of simple linear correlation tightness is Pearsoncorrelation
coefficient k
xs
. This coefficient is given by relation
k
xs
=
s x
xs
S S
S
.
,
it acquires the values from interval 1, 1
xs
k e + (this conclusion can be easily deduced
from so called Schwarz inequality). The values approaching to 1 from the right correspond
with the case of positive correlation (the values of both statistical signs SS-x and SS-s
increase or decrease at the same time, the figure Fig.5 is connected with this case). The
values approaching to 1 from the left describe the negative correlation (while the values of
one statistical sign are increasing the values of the second sign are decreasing). The values
around 0 indicate the signs dont correlate (it is possible to express no collective trends in the
66
increases or the decreases of the signs values). The Pearson correlation coefficient as the
empirical parameter has the character of a random variable and it can be used as a point
estimation of theoretical correlation coefficient.
In the relation for Pearson correlation coefficient the mixed central moment
C
2
(x,s) = S
xs
of 2.order also occurs in addition to the usual standard deviations S
x
and S
s
(i.e.
the square roots of central moments C
2
(x) and C
2
(s)) connected with the investigation of
statistical signs SS-x and SS-s. The mixed central moment of 2.order is defined by relation
(k is number of scale elements for both statistical signs)
( )( )
1 1
i
xs i x i s
n
S x O s O
n
=
).
The basic formulas for Binomial model (Value Function Fair Price for call option is
marked C ( ) , Value Function Fair Price for put option is marked P ( ) ):
0
1
n
j j n
j
C C
q
=
( ) = H
, C
j
= max (0, S
j
X)
j
n
j
j
n
P
q
P
=
H = ) (
0
1
, P
j
= max (0, X S
j
)
( )
j n j
j
p p
j
n
|
|
.
|
\
|
= H 1
S d u S S d u S
j k j k
j
j n j
j
= = ,
( )
m m
k k n
n
k
n
....... 2 . 1 ! ,
! !
!
=
=
|
|
.
|
\
|
,1
q d u q
p p
u d u d
= =
.
The Trinomial model observes the evolution of the option's key underlying variables in
discrete-time. This is done by means of a trinomial tree, for a number of time steps between
the valuation and expiration dates (the number of time steps is marked n). Each node, in the
tree, represents a possible price of the underlying at a given point in time.
The fair price can be determined numerically. The Binomial model after Cox-Ross-
Rubinstein can be used. In this section it will be introduced a less complex but numerically
efficient approach based on trinomial trees. It is related to the classical numerical procedures
for solving partial differential equations, which are also used to solve the Black-Scholes
differential equations.
The Trinomial model follows the procedure of the binomial model whereby the price at
each time step can change to three instead of two directions.
74
At each step, it is assumed that the underlying instrument will move up or down by
a specific factor (e.g. two up factors u
1
, u
2
and one down factor d) per step of the tree (where,
by definition, u
1
,u
2
1 and 0<d1). So, if S is the Spot price, then in the next period the price
will either be S
u1
= S.u
1
, S
u2
= S.u
2
or S
d
= S.d. The probability with which the price moves
from S to S
u1
, S
u2
, S
d
is represented as p
1
, p
2
, p
3
(p
1
+ p
2
+ p
3
= 1).
The number of u
1
factors is marked j, the number of u
2
factors is marked i, and the
number of d factors is nji.
The basic formulas for Trinomial model (Value Function Fair Price for call option is
marked C ( ) , Value Function Fair Price for put option is marked P ( ) ):
( )
max
0 0
1
,
max 0,
n n
ij ij n
i j
ij ij
C C i j n
q
C S X
= =
( ) = H + =
=
1 2
max
0 0
,
j i n i j
ij
n n
ij ij
i j
S u u d S
S S i j n
= =
=
= H + =
( )
1 2 1 2
1
n i j
i j
ij
n
p p p p
ij
| |
H =
|
\ .
max
0 0
1,
n n
ij
i j
i j n
= =
H = + =
( )
!
! ! !
n
n
ij i j n i j
| |
=
|
\ .
3.1.5. Statistical and Probability Data Mining Tools Normal, Binomial and Trinomial
Distribution
a) Standard normal probability density (x) and standard normal distribution
function N(x)
( ) ( )
( )
2
2
1
2
x
x
N x x dx
x e
=
=
}
75
b) Binomial and Trinomial probability function
( )
j n j
j
p p
j
n
|
|
.
|
\
|
= H 1
( )
1 2 1 2
1
n i j
i j
ij
n
p p p p
ij
| |
H =
|
\ .
3.1.6. Conclusion
The statistical and probability base of financial options as a part of statistical data mining
tools is created by
- Normal distribution,
- Binomial distribution,
- Trinomial distribution.
76
3.2. Description of Statistical and Probability Base of Greeks
3.2.1. Introduction
In mathematical finance, the Greeks are the quantities representing the sensitivities of
derivatives such as options to a change in underlying parameters on which the value function
of an instrument or portfolio of financial instruments is dependent. The name is used because
the most common of these sensitivities are often denoted by Greek letters.
The Greeks in the Black-Scholes model are relatively easy to calculate, a desirable
property of financial models, and are very useful for derivatives traders, especially those who
seek to hedge their portfolios from unfavourable changes in market conditions. For this
reason, those Greeks which are particularly for Hedging Delta, Gamma and Vega are well-
defined for measuring changes in Price, Time and Volatility.
The statistical and probability base of financial options is also connected with the
Greeks. These statistical applications will be described by means of data mining approach.
3.2.2. Greeks
(quoted according to http://en.wikipedia.org/wiki/Greeks_(finance) )
The Greeks are the quantities describing the sensitivities of financial options to
a change in underlying parameters on which the fair price (the value function) of an
instrument or portfolio of financial instruments is dependent. Collectively these have also
been called the Risk Sensitivities, Risk Measures or Hedge Parameters.
The Greeks are vital tools in Risk Management. Each Greek measures the sensitivity
of the fair price (the value function) of a financial instrument or portfolio to a small change in
a given underlying parameter, so that component risks may be treated in isolation, and the
portfolio rebalanced accordingly to achieve a desired state (see for example Delta Hedging).
According to 3.2.1. the Greeks in the Black-Scholes model are relatively easy to
calculate, a desirable property of financial models, and are very useful for derivatives traders,
especially those who seek to hedge their portfolios from adverse changes in market
conditions. For this reason, those Greeks which are particularly for Hedging Delta, Gamma
and Vega are well-defined for measuring changes in Price, Time and Volatility.
77
The most common of the Greeks are the first order derivates: Delta, Dual Delta, Vega,
Theta and Rho as well as Gamma, a second-order derivate of fair price (value function).
Although Rho is a primary input into the Black-Scholes model, the overall impact on the fair
price (the value function) of an option corresponding with changes in the risk-free rate is
generally insignificant and therefore higher-order derivates involving the risk-free interest rate
are not common.
The most used of the Greeks are some second order derivates: Gamma, Dual Gamma,
Vomma, Vanna, Charm, DvegaDtime. Also the most used of the Greeks are some third order
derivates: Speed, Zomma, Color, Ultima.
The Greeks in the Binomial model observe the evolution of the option's key
underlying variables in discrete-time. The most used of the Greeks are the Delta and Gamma.
Those Greeks are well-defined for Hedging Delta and Gamma.
The most common of the Greeks in the Black-Scholes and Binomial models are the
Delta, Vega, Theta and Gamma. The most used of the Option Hedging are the Hedging Delta
and Gamma. The remaining sensitivities (and hedging connected with them) in this list are
common enough that they have common names, but this list is by no means exhaustive.
3.2.3. Value Function
(quoted according to Zkodn,P., Havlek,I., Budinsk,P. (2010-2011), Partial Data
Mining Tools in Statistics Education in Greeks and Option Hedging (In: Tarbek,P.,
Zkodn,P. (2010-2011), Educational and Didactic Communication 2010, Bratislava,
Slovak Republic: Didaktis, www.didaktis.sk.)
According to 3.1.2. the financial options are those derivative contracts in which the
underlying assets are financial instruments such as stocks, bonds or an interest rate. The
options on financial instruments provide a buyer with the right to either buy or sell the
underlying financial instruments at a specified price on a specified future date. Although the
buyer gets the rights to buy or sell the underlying options, there is no obligation to exercise
this option. However, the seller of the contract is under an obligation to buy or sell the
underlying instruments if the option is exercised.
According to 3.1.2. two types of financial options exist, namely call options and put
options. Under a call option, the buyer of the contract gets the right to buy the financial
instrument at the specified price at a future date, whereas a put option gives the buyer the
right to sell the same at the specified price at the specified future date. The price that is paid
78
by the buyer to the seller for exercising this level of flexibility is called the premium (the fair
price, the value function). The prescribed future price is called the strike price.
The theoretical calculation of premium is connected namely with both the Black-
Scholes Model (continuous statistical model based on normal distribution) and the Binomial
or Trinomial Model (discrete statistical models based on binomial or trinomial distribution).
In this explanation the priority will be given to Black-Scholes Model.
The Black-Scholes model traces the evolution of the options key underlying variables
in continuous-time. This is done by means of both the standard normal probability densities
(d
1
), (d
2
) and the standard normal distribution functions N(d
1
), N(d
2
).
The variables d
1
, d
2
are connected with Spot price S, Strike price X, Risk-Free Rate r,
Annual Dividend d, Time to Maturity , Volatility , and Annual Dividend Yield d.
Value Function V (as Fair Price or as Premium) can be expressed as a function of five
quantities V = f (S, X, r, , )
The basic formulas for Black-Scholes model (Value Function V Fair Price for call
option is marked C ( ) , Value Function Fair Price for put option is marked P ( ) ):
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
1 2
2 2
1 2
1 2 2 1
2
1 2 1
1 1 1 2 2 2
2 2
1 2
,
ln
2
,
,
1 1
,
2 2
d r r d
d d
d d
C Se N d Xe N d P Xe N d Se N d
S
r d
X
d d d
N d d d d N d d d d
d e d e
t t t t
o
t
o t
o t
t t
( ) = ( ) =
+ +
= =
= =
= =
} }
79
3.2.4. Segmentation and Definitions of Greeks
a) Greeks of first order
The speeds of value function change:
( )
Dual
vega
V
S
V
X
V
V
V
r
v
o
t
c
A =
c
c
A =
c
c
=
c
c
O =
c
c
=
c
b) Greeks of individual second order
The accelerations of value function change & the speeds of first order greeks change:
2
2
2
2
2
2
2
2
2
2
Dual
Vomma
Out of Use
Out of Use
V
S
V
X
V
V
V
r
o
t
c
I =
c
c
I =
c
c
=
c
c
=
c
c
=
c
c) Greeks of combined second order
The speeds of first order greeks change:
2
2
2
Vanna
Charm
DvegaDtime
V
S
V
S
V
o
t
o t
c
=
c c
c
=
c c
c
=
c c
80
d) Greeks of third order
The speeds of second order greeks change:
3
3
3
2
3
2
3
3
Speed
Zomma
Color
Ultima
V
S
V
S
V
S
V
o
t
o
c
=
c
c
=
c c
c
=
c c
c
=
c
3.2.5. Indications of Greeks
a) Greeks of First Order
( )
DvalueDspot
Dual DvalueDstrike
Vega DvalueDvol
DvalueDtime
DvalueDrate
V
S
V
X
V
V
V
r
v
o
t
c
A = =
c
c
A = =
c
c
= =
c
c
O = =
c
c
= =
c
b) Greeks of Second Order
2
2
2
2
2
2
DdeltaDspot
Dual
Dual DdualdeltaDstrike
X
Vomma DvegaDvol
V
S S
V
X
V v
o o
c cA
I = = =
c c
c c A
I = = =
c c
c c
= = =
c c
( )
( )
( )
( )
2
2
2
Vanna DdeltaDvol DvegaDspot
Charm DdeltaDtime D theta Dspot
S
DvegaDtime D theta Dvol DvegaDtime
V
S S
V
S
V
v
o o
t t
v
o t o t
c cA c
= = = = =
c c c c
c O
c cA
= = = = =
c c c c
c O
c c
= = = = =
c c c c
81
c) Greeks of Third Order
( )
3 2
3 2
3 2 2
2 2
2
3 2
2 2
3 2
3 2
Speed DgammaDspot
Zomma DgammaDvol
Color DgammaDtime
vomma
Ultima DvommaDvol
V
S S S
V
S S S
V
S S S
V
v
o o o
t t t
v
o o o
c cI c A
= = = =
c c c
c cI c A c
= = = = =
c c c c c c
c O
c cI c A
= = = = =
c c c c c c
c c c
= = = =
c c c
3.2.6. Formulas for Greeks (CO Call Option, PO Put Option)
a) Formulas for Delta Greek A
( )
1
d
CO
e N d
t
A =
( )
1
d
PO
e N d
t
A =
b) Formulas for Dual Delta Greek Dual A
( )
2
Dual
r
CO
e N d
t
A =
( )
2
Dual
r
PO
e N d
t
A =
c) Formulas for Vega Greek v
( ) ( )
, 1 2
d r
CO PO
e S d Xe d
t t
v t t
= =
d) Formulas for Theta Greek O
( )
( )
1
2
2
d r
CO
S d
e rXe N d
t t
o
t
O =
( )
( )
1
2
2
d r
PO
S d
e rXe N d
t t
o
t
O = +
e) Formulas for Rho Greek
( )
2
r
CO
Xe N d
t
t
=
( )
2
r
PO
Xe N d
t
t
=
f) Formula for Gamma Greek I
( )
1
,
d
CO PO
d
e
S
t
o t
I =
g) Formula for Dual Gamma Greek Dual I
( )
2
,
Dual
r
CO PO
d
e
X
t
o t
I =
82
i) Formulas for Vomma Greek Vomma
( )
1 2 1 2
, 1
Vomma
d
CO PO
d d d d
Se d
t
t v
o o
= =
j) Formulas for Vanna Greek Vanna
( )
2 2 1
, 1
Vanna 1
d
CO PO
d d d
e d
S S
t
v v
o o t o t
| |
= = =
|
\ .
k) Formulas for Charm Greek Charm
( ) ( )
( )
( ) ( )
( )
2
1 1
2
1 1
2
Charm
2
2
Charm
2
d d
CO
d d
PO
r d d
de N d e d
r d d
de N d e d
t t
t t
t o t
ot t
t o t
ot t
= +
= +
l) Formulas for DvegaDtime Greek DvegaDtime
( )
( )
( )
1
1 2
, 1
1
1 2
,
1
DvegaDtime
2
1
DvegaDtime
2
d
CO PO
CO PO
r d d
d d
e S d d
r d d
d d
d
t
t
t o t
v
t o t
| | +
= +
|
\ .
| | +
= +
|
\ .
m) Formulas for Speed Greek Speed
( )
1
1 1
,
2
Speed 1 1
d
CO PO
d
d d
e
S S
t
o t o t o t
I | | | |
= + = +
| |
\ . \ .
n) Formulas for Zomma Greek Zomma
( )
( )
1
1 2
, 1 2
2
1
Zomma 1
d
CO PO
d
d d
e d d
S
t
o o t
| |
= = I
|
\ .
o) Formulas for Color Greek Color
( ) ( )
( )
1 2
, 1
2
, 1
2
Color 2 1
2
2
Color 2 1
2
d
CO PO
CO PO
d r d d
e d d
S
r d d
d d
t
t o t
t
ot t o t
t o t
t
t o t
| |
= + + |
|
\ .
| |
I
= + + |
|
\ .
p) Formulas for Ultima Greek Ultima
( )
( )
( )
( )
( )
( )
( )
1
, 1 2 1 2 2 1 2
, 1 2 1 2 2 1 2
Ultima 2 1
Ultima 2 1
d
CO PO
CO PO
S d
e d d d d d d
d d d d d d
t
t
t o o t
o
v
t o o t
o
= +
= +
83
3.2.7. Needful Statistical and Probability Relations for Deduction of Greeks Formulas
( ) ( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( ) ( )
2 2
1 2
2
2 1
1 2 2 1
2 2
1 2
2 1
2 2
1 2
2
1 2 2 1
a) Value Function
,
ln ln
2 2
,
b) Standard Normal ProbabilityDensities
1 1
,
2 2
,
d r r d
d d
d d
C Se N d Xe N d P Xe N d Se N d
S S
r d r d
X X
d d
d d
d e d e
d d e e d d e
t t t t
to
o t
o o
t t
o t o t
o t
t t
( ) = ( ) =
+ + +
= =
=
= =
= =
( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( )
( )
( )
( )
2
2 2
1 2
1 2
2
2 2
1 1 1 2 2 2
1 1 2 2
1 2
1 2
1 2
,
c) Standard Normal Distribution Functions
,
1, 1
,
r r
d d d d
d d
e
S S
e e e e e e
X X
N d d d d N d d d d
N d N d N d N d
N d N d
d d
d d
to
o t
o t o t
o t o t t t
+
= =
= =
+ = + =
c c
= =
c c
} }
3.2.8. Conclusion, References
The results of explanation:
- Description of Value Function as Fair Price
- Description of Greeks of First Order
- Description of Greeks of Second Order
- Description of Greeks of Third Order
- Names and Indications of Greeks
- Survey of Formulas for Greeks Calculation
- Survey of Needful Relations for Greeks Calculation
84
References
- Keim,D.A. (2002)
I nformation Visualization and Visual Data Mining.
IEEE Transactions on Visualization and Computer Graphics. Vol.7, No.1, January-March 2002
- Zkodn,P., Tarbek,P. (2010-2011)
Data Mining Tools in Statistics Education
In: Tarbek,P., Zkodn,P. (2010-2011), Educational and Didactic Communication 2010
Bratislava, Slovak Republic: Didaktis, ISBN 978-80-89160-78-5
www.didaktis.sk.
- Zkodn,P., Havlek,I., Budinsk,P. (2010-2011)
Partial Data Mining Tools in Statistics Education in Greeks and Option Hedging
In: Tarbek,P., Zkodn,P. (2010-2011), Educational and Didactic Communication 2010
Bratislava, Slovak Republic: Didaktis, ISBN 978-80-89160-78-5
www.didaktis.sk.
85
3.3. Data Mining Tools in Statistics Education
3.3.1. Introduction
In the introduction of chapter 3.3. the quotations showing the importance of educational
data mining are presented. These quotations from i) to vi) are selected according to
C.Romero, S.Ventura (2006) (In: Tarbek,P., Zkodn,P. (2009) Educational and Didactic
Communication 2009, Bratislava, Slovak Republic: Didaktis, www.didaktis.sk,
ISBN 978-80-89160-69-3).
i) Currently there is an increasing interest in data mining and educational systems (well-known
learning content management systems, adaptive and intelligent web-based educational systems),
making educational data mining as a new growing research community
ii) After preprocessing the available data in each case, data mining techniques can be applied in
educational systems statistics and visualization, clustering, classification and detection, association
rule mining and pattern mining, text mining
iii) Data mining oriented towards students to show recommendations and to use, interact,
participate and communicate by students within educational systems
iv) Data mining oriented towards educators (and academic responsible-administrators) to
show discovered knowledge and to design, plan, build and maintenance by educators (administrators)
within educational systems
v) Data mining tools provide mining algorithms, filtering and visualization techniques. The examples
of Data Mining tool:
- Tool name: Mining tool, Authors: Zaane and Luo (2001), Mining task: Association and patterns
- Tool name: Multistar, Authors: Silva and Vieiva (2002), Mining task: Association and classification
- Tool name: Synergo/ColAT, Authors: Avouris et al (2005), Mining task: Visualization
vi) Future research lines in educational data mining
- Mining tools more facilitate the application of data mining by educators or not expert users
- Standardization of data and methods (preprocessing, discovering, postprocessing)
- Integration with the e-learning system
- Specific data mining techniques
The main principle of chapter 3.3.:
Data Mining in Statistics Education (DMSTE) as Problem Solving
The main goal of chapter 3.3.:
Delimitation of Complex Tool and Partial Tool of DMSTE
The procedure of chapter 3.3.:
- Data Preprocessing in Statistics Education
- Data Processing in Statistics Education
- Complex Tool of DMSTE Curricular Process (CP-DMSTE)
- Partial Tool of DMSTE Analytical Synthetic Modelling (ASM-DMSTE)
- Application of CP-DMSTE and ASM-DMSTE
- Supplement describing the principles of data mining approach
86
The results of chapter 3.3.:
1. Educational Communication of Statistics as Result of Data Preprocessing
2. Educational Communication of Statistics as Five Transformations T1-T5 of Knowledge
from Statistics to Mind of Educant
3. Curricular Process of Statistics as Result of Data Processing
4. Curricular Process of Statistics as Structuring, Algorithm Development and Formalization
of Educational Communication of Statistics
5. Curricular Process as Succession of Five Transformations T1-T5 of Curriculum Variant
Forms
6. Curriculum Variant Forms as Forms of Education Content Existence
7. Formalization of Curriculum Variant Form (Four of Universal Structural Elements: Sense
and Interpretation, Set of Objectives, Conceptual Knowledge System, Factor of Following
Transformation)
8. Variant Forms of Curriculum Conceptual Curriculum (Communicable Scientific System
of Statistics), Intended Curriculum (Educational System of Statistics), Projected
Curriculum (Instructional Project of Statistics and Its Textbook), Implemented
Curriculum-1 (Preparedness of Educator to Education), Implemented Curriculum-2
(Results of Education in Mind of Educant), Attained Curriculum (Applicable Results of
Education)
9. Curricular Process as CP-DMSTE (Structuring, Algorithm Development and
Formalization of Five Transformations Succession T1-T5)
10. Analytical Synthetic Modeling as ASM-DMSTE (Modeling Inputs and Outputs of
Transformations T1-T5)
11. Analytical Synthetic Models as Results of Problems Solving (Real or Mediated Problems)
12. Application of CP-DMSTE and ASM-DMSTE (Visualia of Conceptual Curriculum in
Area of Statistics with Concrete Basic Statistical Set, Need of Visualiae of All Curriculum
Variant Forms as Application of CP-DMSTE)
3.3.2. Data Mining (see also Supplement of chapter 3.3.)
Data Mining analytical synthetic way of extraction of hidden and potencially useful information
from large data files (continuum data-information-knowledge, knowledge discovery)
Data Mining Techniques the system functions of structure of formerly hidden relations and patterns
(e.g. classification, association, clustering, prediction)
Data Mining Tool a concrete procedure how to reach the intended system functions
Complex Tool a resolution of complex problem of relevant science branch
Partial Tool a resolution of partial problem of relevant science branch (e.g. analytical synthetic
modeling, needful mathematical or statistical procedures)
Result of Data Mining a result of data mining tool application
Representation of Data Mining Result a description of this what is expressed
Visualization of Data Mining Result optical retrieval of data mining result
Data Mining Cycle Data Definition, Data Gathering, Data Preprocessing, Data Processing,
Discovering Knowledge or Patterns, Representation and Visualization of Results
See P.Tarabek, P.Zaskodny, V.Pavlat, P.Prochazka, V.Novak, J.Skrabankova (2009-2010,
2009-2010abcde and quoted sources).
Quoted sources in 2009-2010abcde:
E.g. American Library Association, M.C.Borba, E.M.Villarreal, G.M.Bowen, W-M Roth, C.Brunk,
J.Kelly, R.Kohavi, Mineset, B.V.Carolan, G.Natriello, N.Delavari, M.R.Beikzadeh, S.Phon-
Amnuaisuk, U-D Ehlers, J.M.Pawlowski, U.M.Fayyad, G.Piatelsky-Shapiro, P.Smyth, J.Fox, D.Gabel,
J.K.Gilbert, O.de Jong, R.Justi, D.F.Treagust, J.H.Van Driel, M.Reiner, M.Nakhleh, W.Hmlinen,
T.H.Laine, E.Sutinen, M.Hesse, A.H.Johnstone, M.J.Kearns, U.V.Vazivani, D.A.Keim, R.Kwan,
87
R.Fox, FT Chan, P.Tsang, Le Jun, J.Luan, J.Manak, National research Council-NRC, R.Newburgh,
I.Nonaka, H.Takeuchi, C.J.Petroselli, E.F.Redish, D.Reisberg, C.Romero, S.Ventura, N.Rubenking,
R.E.Scherr, M.Sabella, D.A.Simovici, C.Djeraba, V.Spousta, L.Talavera, E.Gaudioso, E.R.Tufte,
J.Tuminaro, R.Vilalta, C.Giraud-Carrier, P.Brazdil, C.Soares, D.M.Wolpert.
3.3.3. Data Preprocessing in Statistics Education
Result of Data Preprocessing Educational Communication of Statistics as
a succession of transformations of education content forms (taken over from physics education):
- The transformation T1 is transformation of scientific system of statistics to communicable
scientific system of statistics (the first form of education content existence),
- The transformation T2 is transformation of communicable scientific system of statistics to
educational system of statistics (the second form of education content existence),
- The transformation T3 is transformation of educational system of statistics to both instructional
project of statistics and preparedness of educator to education (the third and fourth forms of education
content existence),
- The transformation T4 is transformation of both instructional project of statistics and preparedness
of educator to results of education (the fifth form of education content existence),
- The transformation T5 is transformation of results of statistics education to applicable results of
statistics education (the sixth form of education content existence)
See J.Brockmeyer (1982), P.Zaskodny a kol. (2004, 2007), P.Tarabek, P.Zaskodny (2001, 2007-
2008abc, 2008-2009, 2009-2010), P.Zaskodny (2001, 2006, 2009).
3.3.4. Data Processing in Statistics Education
Result of Data Processing Curricular Process of Statistics as a succession of transformations
of algorithmized and formalized education content forms (taken over from physics education):
i. The form of education content existence - variant form of curriculum
ii. The curriculum - education content (see Prucha, 2005)
iii. The variant forms of curriculum have got the universal structure (four structural elements -
sense and interpretation, set of objectives, conceptual knowledge system, factor of following
transformation)
iv. The variant forms of curriculum were selected on the basis of fusion of Anglo-American
curricular tradition and European didactic tradition
v. The curricular process is defined as the succession of transformations T1-T5 of curriculum
variant forms:
conceptual curriculum (output of T1, the first variant form of curriculum) - the communicable
scientific system
intended curriculum (output of T2, the second variant form of curriculum) - the educational
system of statistics
88
projected curriculum (output of T3, the third variant form of curriculum) - the instructional project
of statistics
implemented curriculum-1 (output of T3, the fourth variant form of curriculum) - the preparedness
of educator to education
implemented curriculum-2 (output of T4, the fifth variant form of curriculum) the results of
education
attained curriculum (output of T5, the sixth variant form of curriculum) - applicable results of
education
See P.Prochazka, P.Zaskodny (2009-2010c).
Quoted sources in 2009-2010c:
E.g. A.V.Kelly, M.K.Smith, W.Doyle, M.Pasch, A.M.Sochor, V.V.Krajevskij, I.J.Lerner, J.McVittie,
K.Carter, G.M.Blenkin, L.Stenhouse, E.Newman, G.Ingram, F.Bobitt, R.W.Tyler, H.Taba,
C.Cornblet, S.Grundy, D.Lawton, P.Gordon, M.Certon, M.Gayle, G.J.Posner.
3.3.5. Complex and Partial Tool of DMSTE CP-DMSTE, ASM-DMSTE
Complex tool of DMSTE is given by curricular process of statistics (CP-DMSTE). CP-
DMSTE delimits the correct education content via succession of transformations T1-T5.
Partial tool of DMSTE is given by analytical synthetic modeling (ASM-DMSTE).
ASM-DMSTE describes the mediated or real problem solving within the inputs and outputs of
individual transformations T1-T5. In this paper, the description of ASM-DMSTE is realized
by means of both visualia Vis.1 and Legend to Vis.1.
Legend to Vis.1
a (Identified Complex Problem) Investigated area of reality, investigated phenomenon
B
k
(Analysis) Analytical segmentation of complex problem to partial problems
b
k
(Partial problems PP-k) Result of analysis: essential attributes and features
of investigated phenomenon
C
k
(Abstraction) Delimitation of partial problems essences by abstraction with goal
to acquire the partial solutions
c
k
(Partial solutions PS-k) Result of abstraction: partial concepts, partial pieces of
knowledge, various relations, etc.
D
k
(Synthesis) Synthetic finding dependences among results of abstraction
d
k
(Partial conclusions PC-k) Result of synthesis: principle, law, dependence, continuity
E
k
(Intellectual reconstruction) Intellectual reconstruction of investigated phenomenon /
investigated area of reality
e (Total solution of complex problem a) Result of intellectual reconstruction:
analytical synthetic structure of final knowledge (conceptual knowledge system)
89
Vis.1 General Analytical Synthetic Model of Problem Solving
ANALYSIS B
k
C
1
C
2
C
3
C
4
ABSTRACTION C
k
D
1
D
2
SYNTHESIS D
k
E
1
E
2
RECONSTRUCTION E
k
5. Application of Partial Tool ASM-DMSTE
The application of ASM-DMSTE is the visualia Vis.2 from the area of statistics education.
The visualia Vis.2 is analytical synthetic model of statistics with concrete basic statistical set. This
visualia constitutes a part of statistics conceptual curriculum as a part of communicable scientific
system of statistics (a part of output of transformation T1).
The visualized result Vis.2 of data mining in statistics education constitutes the paramorphic
model and hypertextual representation, represents the external conceptual knowledge systems as
external representation of general social experience. The visualized result also represents the concrete
type of data file the representation of statistics with concrete basic statistical set.
a - Identified Complex Problem
b
1
- Partial Problem
No. 1 (PP-1)
b
2
- Partial Problem
No. 2 (PP-2)
b
k
- Partial Problem
No. k (PP-k)
c
1
-Partial
Solution
No.1(PS-1)
c
2
-Partial
Solution
No.2(PS-2)
c
3
-Partial
Solution
No.3(PS-3)
c
4
-Partial
Solution
No.4(PS-4)
c
k
-Partial
Solution
No.k(PS-k)
d
1
- Partial Conclusion
No. 1 (PC-1)
d
2
- Partial Conclusion
No. 2 (PC-2)
d
k
- Partial Conclusion
No. k (PC-k)
e - Total Solution of Complex Problem "a" formed by means of PC-1, PC-2, .., PC-k
90
Vis.2: Analytical synthetic model of statistics formed by four partial models
a1-e1, a2-e2, a3-e3, a4-e4
(a part of conceptual curriculum of statistics a part of communicable scientific system
of statistics output of transformation T1)
Frequencies tables
(Empirical distribution)
Graphical expression Empirical parameters
Empirical picture of selective statistical set, Necessity of probable investigation e-2=a-3
Collective random phenomenon and reason of its investigation a-1
Statistical unit Statistical sign
Selective statistical set (SSS) as a part of basic statistical set, Goals of statistical examination e-1=a-2
Comparison of theoretical and
empirical parameters
Creating of scale
Measurement
Testing of non-parametric
hypotheses
Statistical dependence
(causal, non-causal)
Choice of acceptable
theoretical distribution
Quantification of
theoretical parameters
Regression analysis
Variants (values) of
statistical sign
Choice of statistical
units
Point & interval estimation
(e.g. confidence interval)
Testing of parametric hypotheses
Empirical & probable picture of selective statistical set, Necessity of association investigation e-3=a-4
Correlation analysis
Empirical & probable & association picture of selective statistical set
Interpretation and conclusions as the statistical & probable dimension e-4
of investigation collective random phenomenon
Applied statistics
(e.g. financial options and their mathematical and statistical elaboration by means of greeks calculation and
option hedging models)
91
LEGEND to whole visualia Vis.2
, , ,
One Sample Analysis, Two / Multiple Sample Analysis
LEGEND to partial models of visualia Vis.2
Formulation of statistical examination
Relative & Cumulative Frequencies (Empirical distribution)
Plotting functions: e.g. Plot Frequency Polygon (Graphical expression)
Average-Means, Variance-Standard Deviation, Obliqueness (Skewness), Pointedness
(Kurtosis) (Empirical parameters)
Theoretical Distribution (partial survey in alphabetical order):
Bernoulli, Beta, Binomial, Chi-square, Discrete Uniform, Erlang, Exponential, F, Gamma,
Geometric, Lognormal, Negative binomial, Normal, Poisson, Students, Triangular,
Trinomial, Uniform, Weibull
Testing of Non-parametric Hypotheses (Hypothesis test for H
0
receive or reject H
0
):
e.g. computed Wilcoxons test, Kolmogorov-Smirnov test, Chi-square test
e.g. at alpha = 0,05
Point & I nterval Estimation:
e.g. confidence interval for Mean, confidence interval for Standard Deviation
Testing of Parametric Hypotheses (Hypothesis test for H0 receive or reject H0):
e.g. computed u-statistic, t-statistic, F-statistic, Chi-square statistic, Cochrans test, Barletts
test, Hartleys test
e.g. at alpha = 0,05
Statistical dependence:
e.g. confidence interval for difference in Means (Equal variances, Unequal variances)
e.g. confidence interval for Ratio of Variances
Regression analysis:
simple multiple, linear non-linear
Correlation analysis:
e.g. Rank correlation coefficient, Pearson correlation coefficient
a-1 e-1 a-2 e-2 a-3 e-3 a-4 e-4
a-1 e-1
a-2 e-2
a-3 e-3
a-4 e-4
92
3.3.6. Conclusion, References
Modeling as a partial tool of data mining quotation acoording to J.K.Gilbert (2008)
(In: Tarbek,P., Zkodn,P. (2009) Educational and Didactic Communication 2009,
Bratislava, Slovak Republic: Didaktis, www.didaktis.sk, ISBN 978-80-89160-69-3).:
In a nightmare world, we would perceive the world around us being continuous and
without structure. However, our survival as a species has been possible because we have
evolved the ability do cut up that world mentally into chunks about which we can think and
hence give meaning to.
This process of chunking, a part of all cognition, is modelling and the products of the
mental actions that have taken place are models. Science, being concerned with the provision
of explanations about the natural world, places an especial reliance on the generation and
testing of models.
References
1. Used Publications
i. Brockmeyerov,J. (1982) I ntroduction into Theory and Methodology of Physics Education. Prague, Czech
Republic: SPN
ii. CSRG (2009). Curriculum Studies Research Group.
esk Budjovice: University of South Bohemia, Czech Republic, http://sites.google.com/site/csrggroup/
iii. Gilbert,J.K. (2008) Visualization: An Emergent Field of Practice and Enquiry. In: Visualization: Theory and Practice
in Science (Models and Modeling in Science Education). New York: Springer Science + Business Media
iv. Keim,D.A. (2002) Information Visualization and Visual Data Mining. IEEE Transactions on Visualization
and Computer Graphics. Vol.7, No.1, January-March 2002
v. Prcha,J (2005) Modern pedagogika (Modern Educational Science), Prague, Czech Republic: Portl
2. Used Papers, Monographs, and Books of Author (2001-2010)
i. Tarbek,P., Zkodn,P. (2001)
Structural Textbook and I ts Creation.
Bratislava, Slovak Republic: Didaktis, ISBN 80-85456-76-1
ii. Zkodn,P. (2001)
Statistical Dimension of Scientific Research.
KONTAKT, 2, 5, 2001 ISSN 1212-4117
iii. Tarbek,P., Zkodn,P. (2007-2008a)
Educational and Didactic Communication 2007, Vol.1 Theory.
Bratislava, Slovak Republic: Didaktis, www.didaktis.sk, ISBN 987-80-89160-56-3
iv. Tarbek,P., Zkodn,P. (2007-2008b)
Educational and Didactic Communication 2007, Vol.2 Methods.
Bratislava, Slovak Republic: Didaktis, www.didaktis.sk, ISBN 987-80-89160-56-3
v. Tarbek,P., Zkodn,P. (2007-2008c)
Educational and Didactic Communication 2007, Vol.3 Applications.
Bratislava, Slovak Republic: Didaktis, www.didaktis.sk, ISBN 987-80-89160-56-3
93
vi. Tarbek,P., Zkodn,P. (2008-2009)
Educational and Didactic Communication 2008.
Bratislava, Slovak Republic: Didaktis, www.didaktis.sk, ISBN 978-80-89160-62-4
vii. Tarbek,P., Zkodn,P. (2009-2010)
Educational and Didactic Communication 2009.
Bratislava, Slovak Republic: Didaktis, www.didaktis.sk, ISBN 978-80-89160-69-3
viii. Zkodn,P. a kol. (2004)
Zklady zdravotnick statistiky.
esk Budjovice, Czech Republic: South Bohemia University ISBN 80-7040-663-1
ix. Zkodn,P. (2006)
Survey of Principles of Theoretical Physics (with Application to Radiology)
(in English). Lucerne, Switzerland, Ostrava, Czech Republic: Avenira, Algoritmus, ISBN 80-902491-9-1
x. Zkodn,P. a kol. (2007)
Zklady ekonomick statistiky.
Prague, Czech Republic: Institute of Finance and Administration ISBN 80-86754-00-6
xi. Zkodn,P. (2009)
Curicular Process of Physics (with Survey of Principles of Theoretical Physics)
(in Czech). Lucerne, Switzerland, Ostrava, Czech Republic: Avenira, Algoritmus, ISBN 978-80-902491-0-3
xii. Zkodn,P. (2009-2010)
Data Mining Tools in Science Education (in: vii.)
xiii. Zkodn,P., Pavlt,V. (2009-2010a)
Data Mining A Brief Recherche (in: vii.)
xiv. Zkodn,P., Novk,V. (2009-2010b)
Data Mining A Brief Summary (in: vii.)
xv. Zkodn,P., Prochzka,P. (2009-2010c)
Collective Scheme of Both Educational Communication and Curricular Process (in: vii.)
xvi. Zkodn,P. , krabnkov,J.(2009-2010d)
Modelling and Visualization of Problem Solving (in: vii.)
xvii. Zkodn,P. (2009-2010e)
Representation of Results of Data Mining (in: vii.)
94
3.3.7. Supplement of Chapter 3.3. The Principles of Data Mining Approach
3.3.7.1. Quotations from Sources
i) Definitions of Data Mining
J .Luan (2002)
Definition of Data Mining
a) Data Mining is the process of discovering meaningful new correlations, patterns, and trends by
sifting through large amounts of data stored in repositories and by using pattern recognition
technologies as well as statistical and mathematical techniques
b) The notion of Data Mining for higher education: Data Mining is a process of uncovering hidden
trends and patterns that lend them to predicative modeling using a combination of explicit knowledge
base, sophisticated analytical skills and academic domain knowledge
N.Rubenking (2001)
Definition of Data Mining
Data Mining is the process of automatically extracting useful information and relationships from
immense quantities of data. In its purest form, Data Mining doesnt involve looking for specific
information. Rather than starting from a question or a hypothesis, Data Mining simply finds patterns
that are already present in the data.
R.Kohavi (2000)
Definition of Data Mining as Knowledge Discovery
Data Mining (or Knowledge Discovery) is the process of identifying new patterns and insights in data
Interpretation of Data Mining
As the volume of data collected and stored in databases grows, there is a growing need to provide data
summarization, identify important patterns and trends, and act upon findings
Le J un (2008)
Definition of Data Mining as New Technology
Data Mining is extraction of hidden predictive information from large database. Data Mining is
a powerful new technology with great potential to help an scientific area focus on the most important
information in its data
N.Delavari, M.R.Beikzadeh, S.Phon-Amnuaisuk (2005)
Definition of Data Mining
Searched knowledge (meaningful knowledge, previously unknown and potentially useful information
discovered) is hidden among the raw educational data set and it is extractable through Data Mining
R.Kwan, R.Fox, FT Chan, P.Tsang (2008), Le J un (2008)
Data, Information, Knowledge
Data, Information, Knowledge are different terms, which differentiate in means and values.
a) Data is a collection of facts and quantitative measures, which exists outside of any context from
which conclusions can be drawn.
b) Information is data that people interpret and place in meaningful context, highlighting patterns,
causes of relationships in data.
95
c) Knowledge is the understanding human development as reaction to and use of information, either
individually or as an organization.
Data-Information-Knowledge Continuum
a) Data, information and knowledge are separated but linked concepts which can form a data-
information-knowledge continuum.
b) Data becomes information when people place it in context through interpretation that might seek to
highlighting.
c) Knowledge can be described as a belief that is justified through discussion, experience and perhaps
action. It can be shared with others by exchanging information in appropriate contexts.
ii) Data Mining and Problem Solving
L.Talavera, E.Gaudioso (2002)
Data Mining as Analysis Problem
In this paper we propose to shape the analysis problem as a data mining.
J .Tuminaro, E.F.Redish (2005), E.F.Redish (2005)
Problem solving
Problem solving and the use of math in physics courses
Student Use of Math in the Context of Physics Problem Solving: A Cognitive Model
M.C.Borba, E.M.Villarreal (2005)
Problem solving
Problem solving as context
Problem solving as skill
Problem solving as art
Process of modeling, process of problem solving
The process of modeling or model building is a part of the process of problem solving
Steps of problem solving process (process of problem solving as entailing several steps):
The starting point is a real problematic situation
The first step is to create a real model, making simplifications, idealizations, establishing conditions
and assumptions, but respecting original situation
In the second step, the real model is mathematized, to get a mathematical model
The third step implies the selection of suitable mathematical methods and working within
mathematics in order to get some mathematical results
In the fourth step, these results are interpreted for and translated into the real situation
iii) Forms of Data Mining, Data Mining System, Goals of Data Mining, Scope of
Data Mining
R.Kohavi (2000)
Forms of Data Mining (Structured mining etc.)
Structured mining, Text mining, Information retrieval
96
W.Hmlinen, T.H.Laine, E.Sutinen (2003)
Data Mining system, educational system
Data Mining system in educational system: the educational system should be served by Data Mining
system to monitor, intervene in, and counsel the teaching-studying-learning process
R.Kohavi (2000)
Goals of Data Mining
Data Mining serves two goals:
-Insight: Identified patterns and trends are comprehensible
-Prediction: A model is built that predicts (scores) based on input data. Prediction as classification
(discrete variable) or as regression (continuous variable)
Scope of Data Mining
The majority of research in DM has concentrated on building the best models for prediction.
A learning algorithm is given the training set and produces a model that can map new unseen data into
the prediction.
iv) Results of Data Mining, Applications of Data Minings, Interdisciplinarity of Data
Mining
R.Kohavi (2000), D.M.Wolpert (1994), M.J .Kearns, U.V.Vazivani (1994)
Some theoretical results in Data Mining
- No free lunch (All concepts are equally likely, then learning is impossible)
- Consistency (non-parametric models - target concept given enough data, parametric models as linear
regression are known to be of limited power) - enough data = consistency
- PAC learning (probably approximately correct learning) is a concept introduced to provide
guarantees about learning
- Bias-Variance decomposition
U.M.Fayyad, G.Piatelsky-Shapiro, P.Smyth (1996)
Interdisciplinarity of Data Mining
Data Mining, sometimes referred to as knowledge Discovery, is at the intersection of multiple
research area, including machine learning, statistics, pattern recognition, databases and visualization
J .Luan (2002)
Potential applications of Data Mining
There are several ways to examine the potential applications of Data Mining
a) One is to start with the functions of the algorithms to reason what can be utilized for
b) Another is to examine the attributes of a specific area where data are rich, but mining activities are
scare
c) And another is to examine the different functions of a specific area to identify the needs that can
translate themselves into Data Mining project
Notes: a) - See Curricular Process as Data Mining Algorithm
b) - See Curriculum: Theory and Practice as scientific area in which mining activities are
scare
c) - Some of the most likely places where data miners (educational researchers who wear
this hat) may initiate Data Mining projects are: Variant Forms of Curriculum
97
v) Data Mining techniques
.
N.Delavari, M.R.Beikzadeh, S.Phon-Amnuaisuk (2005)
Data Mining techniques
DM techniques can be used to extract unknown pattern from the set of data and discover useful
knowledge. It results in extracting greater value from the raw data set, and making use of strategic
resources efficiently and effectively.
J .Luan (2001)
Data Mining techniques as Data Mining functions
Prediction, clustering, classification, association
Le J un (2008)
Data Mining techniques application of Data Mining tools
Application of DM tools: To solve the task of prediction, classification, explicit modeling and
clustering. The application can help understand learners learning behaviors
C.Romero, S.Ventura (2006)
Data Mining techniques in educational systems
After preprocessing the available data in each case, Data Mining techniques can be applied in
educational systems statistics and visualization, clustering, classification and outlier detection,
association rule mining and pattern mining, text mining
J .Luan (2002)
Clustering and prediction the most striking aspects of Data Mining techniques
- The clustering aspect of Data Mining offers comprehensive characteristics analysis of investigated
area
- The predicting function estimates the likelihood for a variety of outcomes
B.V.Carolan, G.Natriello (2001)
Clustering
Data-Mining Resources to identify structural attributes of educational research community-e.g.
clustering as collaboration of physicists and biologists
D.A.Simovici, C.Djeraba (2008)
Clustering, Taxonomy of clustering
a) Clustering is the process of grouping together objects that are similar. The groups formed by
clustering are referred to as clusters.
b) Clustering can be regarded as a special type of classification, where the clusters serve as
classes of objects
c) It is widely used data mining activity with multiple applications in a variety of scientific activities
from biology and astronomy to economics and sociology
d) Taxonomy of clustering (we follow here the taxonomy of clustering)
- Exclusive or nonexclusive: Clustering may be exclusive or may not be exclusive. It is exclusive,
where an exclusive clustering technique yields clusters that are disjoint. It is nonexclusive, where
a nonexclusive technique produces overlapping clusters.
98
- Intrinsic or extrinsic: Clustering may be intrinsic or extrinsic. Intrinsic - based only on
dissimilarities between the objects to be clustered. Extrinsic - which objects should be clustered
together and which should not, such information is provided by an external source.
- Hierarchical or partitional: Clustering may be hierarchical or partitional. Hierarchical - in
hierachical clustering algorithms, a sequence of partitions) is constructed. Partitional - partitional
clusterings creates a partition of the set of objects whose blocks are the clusters such that objects in
a cluster are more similar to each other than to objects that belong to different clusters
vi) Data Mining tools
C.Brunk, J .Kelly, R.Kohavi (1997)
Data Mining tool
Mineset is a Data Mining tool that integrates Data Mining and visualization very tightly. Models
built can viewed and interacted with.
C.Romero, S.Ventura (2006)
Data Mining tools
Data Mining tools provide mining algorithms, filtering and visualization techniques. The examples
of Data Mining tool:
- Tool name: Mining tool, Authors: Zaane and Luo (2001), Mining task: Association and patterns
- Tool name: Multistar, Authors: Silva and Vieiva (2002), Mining task: Association and classification
- Tool name: Synergo/ColAT, Authors: Avouris et al (2005), Mining task: Visualization
D.A.Simovici, C.Djeraba (2008)
Mathematical tools for Data Mining
a) This book was born from experience of the authors as researches and educators, which suggests
that many students of Data Mining are handicapped in their research by the lack of formal,
systematic education in its mathematics. The book is intended as a reference for the working data
miner.
b) In our opinion, three areas of math are vital for DM:
- set theory, including partially ordered sets and combinatorics,
- linear algebra, with its many applications in principal component analysis and neural networks,
- and probability theory, which plays a foundational role in statistics, machine learning and DM
vii) Modeling, Model
J .K.Gilbert, M.Reiner, M.Nakhleh (2008), J .K.Gilbert (2008), J .K.Gilbert, R.J usti ( 2002)
Definition of Modelling, Model
We have evolved the ability do cut up that world mentally into chunks about which we can think
and hence give meaning to. This process of chunking (Data Mining clustering),
a part of all cognition, is modelling and the products of the mental actions that have taken place are
models
Significance of Modelling, Model
Modelling as an element in scientific methodology and models at the outcome of modelling are both
important aspects of the conduct of science and hence of science education
Categorization of models
a) Historical models (Curriculum models) - learning specific consensus (the P-N junction model of
transistor). Curriculum models can be used to provide an acceptable explanation of
99
a wide range of phenomena and specific facts, thats why, it is useful way of reducing, by chunking,
the ever-growing factual load of science curriculum
b) New qualitative models - developed by following the sequence of learning: To revise an
established model, To construct a model de novo (to reconstruct an established model)
c) New quantitative models - developed by following the sequence of learning: quantitative version
of a useable qualitative model of phenomenon
d) Progress in the scientific enquiry is indicated by the value of particular combination of
qualitative and quantitative models in making successful predictions about it properties
C.M.Borba, E.M.Villarreal (2005)
Definition of modeling
Modeling can be understood as a pedagogical approach that emphasizes students choice of
a problem to be investigated in the classroom. Students, therefore, play an active role in curriculum
development instead of being just the recipients of tasks designed by others.
Problem solving
- problem solving as context
- problem solving as skill
- problem solving as art
Process of modeling, process of problem solving
The process of modeling or model building is a part of the process of problem solving.
Steps of problem solving process
Process of problem solving as entailing several steps:
a) The starting point is a real problematic situation
b) The first step is to create a real model, making simplifications, idealizations, establishing
conditions and assumptions, but respecting original situation
c) In the second step, the real model is mathematized, to get a mathematical model
d) The third step implies the selection of suitable mathematical methods and working within
mathematics in order to get some mathematical results
e) In the fourth step, these results are interpreted for and translated into the real situation
J .K.Gilbert, O.de J ong, R.J usti, D.F.Treagust, J .H.van Driel (2002)
Model as a major learning and teaching tool
Models are one of the main products of science, modelling is an element in scientific methodology,
(and) models are a major learning and teaching tool in science education
Model of Modeling Framework
1. Decide on purpose - Select source for model and Have experience - Produce mental model
2. Produce mental model - Express in mode(s) of representation
3. Express in mode(s) of representation - Conduct thought experiments
4a. Conduct thought experiments (pass) - Design and perform empirical tests
4b. Conduct thought experiments (fail) - Reject mental model (Modify mental model) and back to
Select source for model (negative result)
5a. Design and perform empirical tests (pass) - Fulfill purpose and Consider scope and limitations of
model and back to Decide on purpose (positive result)
5b. Design and perform empirical tests (fail) - Reject mental model (Modify mental model) and back
to Select source for model (negative result)
100
R.J usti, J .K.Gilbert (2002)
Role of chemistry textbooks in the teaching and learning of models and modelling
This role may be discussed from two main angles:
- the way that chemical models are introduced in textbooks
(note: projected curriculum, a learning model)
- and the teaching models that they present
(note: Implemented curriculum-1, a teaching model)
Teaching model, Learning model, Analogies
A teaching model is a representation produced with the specific aim of helping students to
understand some aspect of content. Assuming the abstract nature of chemical knowledge, they
(learning models) are used very frequently in chemical textbooks mainly in the form of overt
analogies, as drawings and as diagrams (specifically to the atom, chemical bonding and chemical
equilibrium)
Some future research directions
a) How can teacherspedagogical content knowledge about models and modelling be improved?
b) The role of models and modelling in the development of chemical knowledge?
c) How can it be made evident to teachers that the introduction of model-based teaching and learning
approach can be way to shift the emphasis in chemical education from transmission of existing
knowledge to a more contemporary perspective in which students will really understand the
nature of chemistry and be able to deal critically with chemistry-related situations?
viii) Representation (Creativity)
J .K.Gilbert, M.Reiner, M.Nakhleh (2008), J .K.Gilbert (2008)
Levels of Representation
The Representation in Science Education is concerned with challenges that students face in
understanding the three levels at which models can be represented - macro, sub-micro,
symbolic - and the relationships between them.
A.H.J ohnstone(1993), D.Gabel (1999)
Representations as distinct representational levels
a) The models produced by science are expressed in three distinct representational levels
b) The macroscopic level - this consists of what is seen in that which is studied
c) The sub-microscopic level - this consists of representations of those entities that are inferred to
underlie the macroscopic level, giving rise to the properties that it displays - molecules and ions are
used to explain the properties of pure solutions, of radiotherapy)
d) The symbolic level (this consists of any qualitative abstractions used to represent each item at the
sub-microscopic level - chemical equations, mathematical equations)
J .K.Gilbert (2008), M.Hesse(1966), G.M.Bowen, W.-M.Roth (2005))
The ontological categorization of representations
a) Two approaches to the ontological categorization of representations are put forward, one based on
the purpose which the representation is intended to serve, the other on the dimensionality -
1D,2D,3D - of the representation.
b) The purpose for which a Model is Produced
- All models are produced by the use analogy. The target (which is the subject of the model) is
depicted by a partial comparison with a source. The classification is binary: The target and the source
101
are the same things (they are homomorphs - an aeroplane, a virus), They are not (they are paramorphs
- paramorphs are used to model process rather than objects)
c) The dimensionality of the Representation
The idea that modelling involves the progressive reduction of the experienced world to a set of
abstract signs can be set out in terms of dimensions are follows:
- Macro level - Perception of the world-as-experienced - 3D, 2D
- Sub-micro level - Gestures, concrete representations (structural representations) - 3D
- Photographs, virtual representations, diagrams, graphs, data arrays - 2D
- Symbolic level - Symbols and equations - 1D
E.R.Tufte(1983), J .K.Gilbert (2008), D.Reisberg (1997)
External and internal representations, Series of internal representations and creativity
a) Visualization is concerned with External Representation, the systematic and focused public
display of information in the form of pictures, diagrams, tables, and the like
b) Visualization is also concerned with Internal Representation, the mental production, storage and
use of an image that often (but not always) is the result of external representation
c) External and internal representations are linked in that their perception uses similar mental
processes
d) Visualization is thus concerned with the formation of an internal representation from an
external representation. An internal representation must be capable of mental use in the making of
predictions about the behaviour of a phenomenon under specific conditions
e) It is entirely possible that once a series of internal representations have been visualized, that they
are amalgamated/recombined to form a novel internal representation that is capable of external
representation - this is creativity
ix) Visualization
J .K.Gilbert, M.Reiner, M.Nakhleh (2008), J .K.Gilbert (2008)
Definition of Visualization
The making of meaning for any such representation is visualization. Visualization is central
the production of representations of these models (curriculum models, qualitative and quantitative
models and their combinations).
J .K.Gilbert (2008)
Visualization and Internal Representation
Visualization is also concerned with Internal Representation, the mental production, storage and
use of an image that often (but not always) is the result of external representation.
R.Kohavi (2000)
Essence of Visualization - Data Summarization
As the volume of data collected and stored in databases grows, there is a growing need to provide data
summarization (e.g. through visualization), identify important patterns and trends, and act upon
findings.
C.Brunk, J .Kelly, R.Kohavi (1997)
Serviceability of Visualization
One way to did users in understanding the models is to visualize them.
102
D.A.Keim(2002)
Serviceability of Visualization
a) Information Visualization techniques may help to solve the problem
b) Data Mining will use Information Visualization technology for an improved data analysis
Application of Visualization
Application of Visualization is Visual Data Exploration
Benefits of Visual Data Exploration
- University of Berkeley - every year 1 Exabyte of data (10
18
bytes, Gigabyte = 10
9
bytes)
- Finding the valuable information hidden in them, however, is a difficult task
- The data presented textually - The range of some one hundred data items can be displayed
(a drop in the ocean)
- The basic idea of visual data exploration is to present the data in some visual form, allowing the
human to get insight into the data, draw conclusions, and directly interact with the data (to combine
the flexibility, creativity and general knowledge of the human with the enormous storage capacity and
the computational power of todays computers)
- The visual data exploration process can be seen a hypothesis generative process (coming up with
new hypotheses and the verification of the hypotheses can be done via visual data exploration)
- The main advantages of visual data exploration: Visual data exploration can easily deal with
inhomogenous and noisy data, visual data exploration is intuitive and requires no understanding of
mathematical and statistical algorithms, visual data exploration techniques are indispensable in
conjuction with automatic exploration techniques
- Visual data exploration paradigm: overview first, zoom and filter, details-on-demand
x) Metavisualization
N.R.C. (2006)
Metavisualization - spatial thinking
The associated visualization which can be called spatial thinking
J .K.Gilbert, M.Reiner, M.Nakhleh (2008), J .K.Gilbert (2008),
Metavisualization - learning from representations
It is of such importance in science and hence in science education that the acquisition of fluency in
visualization is highly desirable and may be called metavisual capability or metavisualization. A
fluent performance in visualization has been described as requiring metavisualization and involving
the ability to acquire, monitor, integrate, and extend learning from representations. Metavisualization
- learning from representations.
Criteria for Metavisualisation
Four criteria are suggested for attainment of metavisual status. The person concerned must be able to:
a) demonstrate an understanding of the convention of representation for all the modes and sub-
modes of 3D,2D,1D representations (what they can and cannot represent)
b) demonstrate a capacity to translate a given model between the modes and sub-modes in which it can
be depicted
c) demonstrate the capacity to be able to construct a representation within any mode and sub-mode of
dimensionality for a given purpose
d) demonstrate the ability to solve novel problems using a model-based approach
Developing the Skills of Metavisualization
level 1 - representation as depiction
level 2 - early symbolic skills
103
level 3 - syntactic use of formal representations
level 4 - semantic use of formal representations
level 5 - reflective, rhetorical use of representations
xi) Visual DM techniques
D.A.Keim(2002)
Classification of Visual Data Mining Techniques (abstraction criterium)
- Techniques as x-y plots, line plots, and histogram, but they are limited to relatively and low-
dimensional data sets
- Novel information visualization techniques allowing visualization of multidimensional data without
inherent 2D or 3D semantics.
D.A.Keim(2002)
Classification of Visual DM Techniques based on three criteria a), b), c)
a) The data to be visualized (one or two- dimensional data, multidimensional data, text and
hypertext, hierarchies and graphs, algorithms and software):
Dimensionality of date set = the number of variables of data set.
Text and hypertext = in the age of the world wide web one important data type is text and hypertext
Hierarchies and graphs = data records often have some relationship to other pieces of information,
i.e. a graph consists of set objects, called nodes, and connections between these objects, called edges.
Algorithms and software= the goal of V is to support software development by helping to understand
algorithms, e.g. by showing the flow of information in a program, to enhance the understading of
written code, e.g. by representing the structure of thousands of source code lines as graphs
b) The visualization techniques (Standard 2D/3D displays, Geometrically-transformed displays,
Icon-based displays, Dense pixel displays, Stacked displays-treemaps, dimensional stacking)
Geometrically-transformed displays = these techniques aim at finding interesting transformations of
multidimensional data sets. The class of geometric display techniques includes also the well-known
Parallel Coordinate Technique (PCT). The PCT maps the k-dimensional space onto the two display
dimensions by using k equidistant axes which are parallel to one of display axes
I con-based displays = the idea is to map the attribute values of a multidimensional data item to the
features of an icon
c) The interaction (IT) and distortion (DT) techniques used (interactive projection, interactive
filtering, interactive zooming, interactive distortion, interactive linking and brushing)
I nteraction techniques allow the data analyst to directly interact with visualizations and dynamically
change the visualizations according to exploration objectives
Distortion techniques help in the data exploration process by providing means for focusing on details
while preserving an overview of the data
I nteractive filtering, I nteractive zooming - in exploring large data sets it is important to interactively
partition the data into segments and focus on interesting subsets. This can be done by a direct selection
of the desired subset (BROWSING) or by a specification of properties of the desired subset
(QUERYING).
104
xii) Educational Data Mining
C.Romero, S.Ventura (2006)
Educational Data Mining
a) Currently there is an increasing interest in data mining and educational systems (well-known
learning content management systems, adaptive and intelligent web-based educational systems),
making educational data mining as a new growing research community
b) After preprocessing the available data in each case, data mining techniques can be applied in
educational systems statistics and visualization, clustering, classification and detection, association
rule mining and pattern mining, text mining
c) Data Mining oriented towards students to show recommendations and to use, interact,
participate and communicate by students within educational systems
d) Data Mining oriented towards educators (and academic responsible-administrators) to show
discovered knowledge and to design, plan, build and maintenance by educators (administrators) within
educational systems
e) Data Mining tools provide mining algorithms, filtering and visualization techniques. The examples
of Data Mining tool:
- Tool name: Mining tool, Authors: Zaane and Luo (2001), Mining task: Association and patterns
- Tool name: Multistar, Authors: Silva and Vieiva (2002), Mining task: Association and classification
- Tool name: Synergo/ColAT, Authors: Avouris et al (2005), Mining task: Visualization
f) Future research lines in educational Data Mining
- Mining tools more facilitate the application of data mining by educators or not expert users
- Standardization of data and methods (preprocessing, discovering, postprocessing)
- Integration with the e-learning system
- Specific data mining techniques
W.Hmlinen, T.H.Laine, E.Sutinen (2003)
Data Mining system, educational system
Data Mining system in educational system: the educational system should be served by Data Mining
system to monitor, intervene in, and counsel the teaching-studying-learning process
R.E.Scherr, M.Sabella, E.F.Redish (2007)
Curriculum development
Conceptual knowledge is only one aspect of good knowledge structure: how and when knowledge is
activated and used are also important.
Representation of knowledge structure
The nodes represent knowledge. The lines represent relations between different nodes.
R.Newburgh (2008)
Linear and lateral (structural) thought process (in physics)
Why do we lose physics students?
a) There is a wide spectrum in thought process. Of the two major types one is linear (i.e. sequential)
and the other lateral (i.e. seeking horizontal connections).
b) Those who developed physics - from Galileo to Newton to Einstein to Heisenberg - were almost
exclusively linear thinkers. Paradigm for linear thought is Eucledian thinking, Eucledian logic
(many physicists chose physics for their career as a result of their exposure to geometry - a
consequence of this is that textbooks are usually written in a Eucledian format). The sense of
discovery is lost. Many students do not recognize that the Eucledian format is not a valid description
how we do physics. Their way of approaching problems is different but just as valid. Too many
105
physics teachers refuse to recognize the limitations of this approach (thereby causing would-be
students who do not think in a Eucledian fashion to leave).
c) The format of our textbooks is Eucledian. Newtons laws, Hamilton-Jacobi theory, and
Maxwells equations are often presented as quasi-axioms in advanced texts. The laboratories become
fixed exercises in which the student must confirm some principle already established. He knows the
answer before he does the experiment.
d) Now I yield to no one in my admiration for Euclid. He has been an inspiration to many of us. We
understand his genius but also see his limitations. Unfortunately there are many who do not follow
his way of thinking.
e) By presenting alternate approaches to students (specifically uses of lateral thinking), false starts
that must be corrected, and lessons that are discoveries not memorization, we can retain more
students in physics.
f) We should remember that lateral thinking is essential to the formation of analogies, an activity
that one cannot describe as Euclidean. Doing science without analogies seems to me an impossibility.
J .K.Gilbert, O.de J ong, R.J usti, D.F.Treagust, J .H.van Driel (2002), R.J usti, J .K.Gilbert (2002)
Model as a major learning and teaching tool
Models are one of the main products of scince, modelling is an element in scientific methodology,
(and) models are a major learning and teaching tool in science education.
Role of chemistry textbooks in the teaching and learning of models and modelling
This role may be discussed from two main angles:
- the way that chemical models are introduced in textbooks
- and the teaching models that they present.
Teaching model, Learning model, Analogies
A teaching model is a representation produced with the specific aim of helping students to
understand some aspect of content. Assuming the abstract nature of chemical knowledge, they
(learning models) are used very frequently in chemical textbooks mainly in the form of overt
analogies, as drawings and as diagrams (specifically to the atom, chemical bonding and chemical
equilibrium)
Some future research directions
a) How can teacherspedagogical content knowledge about models and modelling be improved?
b) The role of models and modelling in the development of chemical knowledge?
c) How can it be made evident to teachers that the introduction of model-based teaching and learning
approach can be way to shift the emphasis in chemical education from transmission of existing
knowledge to a more contemporary perspective in which students will really understand the
nature of chemistry and be able to deal critically with chemistry-related situations?
J .K.Gilbert, O.de J ong, R.J usti, D.F.Treagust, J .H.van Driel (2002), J .H.van Driel (2002)
Curriculum for Chemical Eduaction
a) The central question is concerns the design of curricula for chemical education (note: curricular
process) which make chemistry interesting and relevant for various groups of learners (professional
chemists, general educational purposes-it is useful for all citizens in the future)
b) In recent decades, curricula have been changed, on the one hand for general educational
purposes, this has led to context-based approaches to teaching chemistry, on the other hand for
professional chemists specific chemistry courses have been developed in the context of vocational
training, aimed at developing the specific chemical competencies that are needed for various
professions.
c) Finally, chemistry is nowadays also presented in informal ways, for instance, in science centres and
through chemistry shows.
106
U-D.Ehlers, J .M.Pawlowski (2006)
Quality and Standardization in E-learning
- Quality development: Methods and approaches
Methods, models, concepts and approaches for the development, management and assurance of quality
in e-learning are introduced
- E-learning standards
The main goal of e-learning standards is to provide solutions to enable and ensure interoperability and
stability of systems, components and objects.
R.Kwan, R.Fox, FT Chan, P.Tsang (2008), Le J un (2008)
Knowledge management, Data Mining
We set up a few objects and value propositions of the initiative which was set up to improve teaching
and learning, to enhance the quality of curriculum, and to extent learning support. We apply Data
Mining tools to discover behavioral characteristics. A few strategies for knowledge management in the
curriculum development in distance education will be discussed.
Le J un (2008), I .Nonaka, H.Takeuchi (1995), I.Nonaka, H.Takeuchi (2005)
Types of knowledge, Interaction of types
Many knowledge management experts agree that there are two general types of knowledge:
a) Tacit knowledge is linked to personal perspective intuition, emotion, belief, experience and value. It
is intangible, not easy to articulate, and difficult to share with others.
b) Explicit knowledge has a tangible dimension that can be more easily captured, codified and
communicated
Based on I .Nonaka, H.Takeuchi these two versions of knowledge can interact when the
knowledge conversion occurs:
- socialization: from tacit to tacit
- externalization: from tacit to explicit
- combination: from explicit to explicit
- internalization: from explicit to tacit
Le J un (2008), I .Nonaka, H.Takeuchi (2005)
Research methods for knowledge management
a) Data Mining techniques
b) Web text mining is discovery knowledge from based non-structural text (text representation,
feature extraction, text categorization, text clustering, text summarization, semantic analysis, and
information extraction)
c) Learning theory
Learning theories are classified into four paradigms: behavioral theory, cognitive theory,
constructive theory, social learning theory.
We emphasize: Learning is continuous process that was indistinguishable from ongoing work practice
- by discovering the problems, recognizing their types, and by solving problems in routine work and
learning. Learners can continuously refine their cognitive, information, social and learning
competencies.
d) Knowledge management
Knowledge sharing and application of the SECI model (see I.Nonaka, H.Takeuchi)
107
xiii) Metadata Mining Process
R.Vilalta, C.Giraud-Carrier, P.Brazdil, C.Soares (2004)
Meta-learning Support Data Mining
Current data mining tools are characterized by a plethora of algorithms but a lack of guidelines to
select the right method according to the nature of the problem under analysis. Producing such
guidelines is a primary goal by the field of meta-learning; the research objective is to understand the
interaction between the mechanism of learning and the concrete contexts in which that mechanism is
applicable. The field of meta-learning has seen continuous growth in the past years with interesting
new developments in the construction of practical model-selection assistants, task-adaptive learners,
and a solid conceptual framework. In this paper, we give an overview of different techniques
necessary to build meta-learning systems. We begin by describing an idealized meta-learning
architecture comprising a variety of relevant component techniques. We then look at how each
technique has been studied and implemented by previous research. In addition, we show how
metalearning has already been identified as an important component in real-world applications.
J .Fox (2007)
Definition Metadata Mining process
Since metadata is just another type of data, applying data mining to metadata is technically
straightforward. XML - eXtensible Markup Language
American Library Association (1999)
Definition of Metadata
a) As for most people the difference between data and information is merely a philosophical
one of no relevance in practical use, other definitions are:
Metadata is information about data.
Metadata is information about information.
Metadata contains information about that data or other data
b) There are more sophisticated definitions, such as:
Metadata is structured, encoded data that describe characteristics of information-bearing
entities to aid in the identification, discovery, assessment, and management of the described
entities.
3.3.7.2. Brief Summary
Data Mining an analytical synthetic way of extraction of hidden and potencially useful information
from the large data files (continuum data-information-knowledge, knowledge discovery)
Data Mining Techniques system functions of the structure of formerly hidden relations and patterns
(e.g. classification, association, clustering, prediction)
Data Mining Tool a concrete procedure how to reach the intended system functions
Complex Tool a resolution of the complex problem of relevant science branch
Partial Tool a resolution of the partial problem of relevant science branch
Result of Data Mining a result of the data mining tool application
Representation of Data Mining Result a description of this what is expressed
Visualization of Data Mining Result an optical retrieval of the data mining result
108
3.3.7.3. Data Mining Cycle, References
i) Quotations from Sources
U.M.Fayyad, G.Piatelsky-Shapiro, P.Smyth (1996)
Cycle of Data mining
Data Mining can be viewed as a cycle that consists of several steps:
- Identify a problem where analyzing data can provide value
- Collect the data
- Preprocess the data obtain a clean, mineable table
- Build a model that summarizes patterns of interest in a particular representational form
- Interpret/Evaluate the model
- Deploy the results incorporating the model into another system for further action.
J .Luan (2002)
Steps for Data Mining preparation (algorithm, building, visualization)
a) Investigate the possibility of overlaying Data Mining algorithms directly on a data warehouse
b) Select a solid querying tool to build Data Mining files. These files closely resemble
multidimensional cubes
c) Data Visualization and Validation. This means both examining frequency counts as well as
generating scatter plots, histograms, and other graphics, including clustering models
d) Mine your data
Le J un (2008)
Main processes of Data Mining
- The main processes include data definition, data gathering, preprocessing, data processing and
discovering knowledge or patterns (Data Mining techniques can be implemented rapidly on existing
software and hardware)
- Application of Data Mining tools: To solve the task of prediction, classification, explicit modeling
and clustering. The application can help understand learnerslearning behaviors.
ii) Brief Summary of Data Mining Cycle
- Data Definition, Data Gathering
- Data Preprocessing, Data Processing
- Data Mining Techniques and Data Mining Tools,
- Discovering Knowledge or Patterns,
- Representation and Visualization of Data Mining Results,
- Application.
References
i. Tarbek,P., Zkodn,P. (2009-2010)
Educational and Didactic Communication 2009.
Bratislava, Slovak Republic: Didaktis, www.didaktis.sk, ISBN 978-80-89160-69-3
ii. Zkodn,P., Pavlt,V. (2009-2010a)
Data Mining A Brief Recherche (in: i.)
iii. Zkodn,P., Novk,V. (2009-2010b)
Data Mining A Brief Summary (in: i.)
109
Part 4. STATISTICAL TABLES
Table I.: Values of distribution function of standardized normal distribution
u F(u) u F(u) u F(u) u F(u)
0,00 0,500 00 0,35 0,636 83 0,70 0,758 04 1,05 0,853 14
0,01 0,503 99 0,36 0,640 58 0,71 0,761 15 1,06 0,855 43
0,02 0,507 98 0,37 0,644 31 0,72 0,764 24 1,07 0,857 69
0,03 0,511 97 0,38 0,648 03 0,73 0,767 30 1,08 0,859 93
0,04 0,515 95 0,39 0,651 73 0,74 0,770 35 1,09 0,862 14
0,05 0,519 94 0,40 0,655 42 0,75 0,773 77 1,10 0,864 33
0,06 0,523 92 0,41 0,659 10 0,76 0,776 37 1,11 0,866 50
0,07 0,527 90 0,42 0,662 76 0,77 0,779 35 1,12 0,868 64
0,08 0,531 88 0,43 0,666 40 0,78 0,782 30 1,13 0,870 76
0,09 0,535 86 0,44 0,670 03 0,79 0,785 24 1,14 0,872 86
0,10 0,539 83 0,45 0,673 64 0,80 0,788 14 1,15 0,874 93
0,11 0,543 80 0,46 0,677 24 0,81 0,791 03 1,16 0,876 98
0,12 0,547 76 0,47 0,680 82 0,82 0,793 89 1,17 0,879 00
0,13 0,551 72 0,48 0,684 39 0,83 0,796 73 1,18 0,881 00
0,14 0,555 67 0,49 0,687 93 0,84 0,799 55 1,19 0,882 98
0,15 0,559 62 0,50 0,691 46 0,85 0,802 34 1,20 0,884 93
0,16 0,563 56 0,51 0,694 97 0,86 0,805 11 1,21 0,886 86
0,17 0,567 49 0,52 0,698 47 0,87 0,807 85 1,22 0,888 77
0,18 0,571 42 0,53 0,701 94 0,88 0,810 57 1,23 0,890 65
0,19 0,575 35 0,54 0,705 40 0,89 0,813 27 1,24 0,892 51
0,20 0,579 26 0,55 0,708 84 0,90 0,815 94 1,25 0,894 35
0,21 0,583 17 0,56 0,712 26 0,91 0,818 59 1,26 0,896 17
0,22 0,587 06 0,57 0,715 66 0,92 0,821 21 1,27 0,897 96
0,23 0,590 95 0,58 0,719 04 0,93 0,823 81 1,28 0,899 73
0,24 0,594 83 0,59 0,722 40 0,94 0,826 39 1,29 0,901 47
0,25 0,598 71 0,60 0,725 75 0,95 0,828 94 1,30 0,903 20
0,26 0,602 57 0,61 0,729 07 0,96 0,831 47 1,31 0,904 90
0,27 0,606 42 0,62 0,732 37 0,97 0,833 98 1,32 0,906 58
0,28 0,610 26 0,63 0,735 65 0,98 0,836 46 1,33 0,908 24
0,29 0,614 09 0,64 0,738 91 0,99 0,838 91 1,34 0,909 88
0,30 0,617 91 0,65 0,742 15 1,00 0,841 34 1,35 0,911 49
0,31 0,621 72 0,66 0,745 37 1,01 0,843 75 1,36 0,913 09
0,32 0,625 52 0,67 0,748 57 1,02 0,846 14 1,37 0,914 66
0,33 0,629 30 0,68 0,751 75 1,03 0,848 50 1,38 0,916 21
0,34 0,633 07 0,69 0,754 90 1,04 0,850 83 1,39 0,917 74
110
u F(u) u F(u) u F(u) u F(u)
1,40 0,919 24 1,85 0,967 84 2,30 0,989 28 3,00 0,998 65
1,41 0,920 73 1,86 0,968 56 2,31 0,989 56 3,02 0,998 74
1,42 0,922 20 1,87 0,969 26 2,32 0,989 83 3,04 0,998 82
1,43 0,923 64 1,88 0,969 95 2,33 0,990 10 3,06 0,998 89
1,44 0,925 07 1,89 0,970 62 2,34 0,990 36 3,08 0,998 97
1,45 0,926 47 1,90 0,971 28 2,35 0,990 61 3,10 0,999 03
1,46 0,927 86 1,91 0,971 93 2,36 0,990 86 3,12 0,999 16
1,47 0,929 22 1,92 0,972 57 2,37 0,991 11 3,14 0,999 16
1,48 0,930 56 1,93 0,973 20 2,38 0,991 34 3,16 0,999 21
1,49 0,931 89 1,94 0,973 81 2,39 0,991 58 3,18 0,999 26
1,50 0,933 19 1,95 0,974 41 2,40 0,991 80 3,20 0,999 31
1,51 0,934 48 1,96 0,975 00 2,41 0,992 02 3,22 0,999 36
1,52 0,935 74 1,97 0,975 58 2,42 0,992 24 3,24 0,999 40
1,53 0,936 99 1,98 0,976 15 2,43 0,992 45 3,26 0,999 44
1,54 0,938 22 1,99 0,976 70 2,44 0,992 66 3,28 0,999 48
1,55 0,939 43 2,00 0,977 25 2,45 0,992 86 3,30 0,999 52
1,56 0,940 62 2,01 0,977 78 2,46 0,993 05 3,32 0,999 55
1,57 0,941 79 2,02 0,978 31 2,47 0,993 05 3,34 0,999 58
1,58 0,942 95 2,03 0,978 82 2,48 0,993 43 3,36 0,999 61
1,59 0,944 08 2,04 0,979 32 2,49 0,993 48 3,38 0,999 64
1,60 0,945 20 2,05 0,979 82 2,50 0,993 79 3,40 0,999 66
1,61 0,946 30 2,06 0,980 30 2,52 0,994 13 3,42 0,999 69
1,62 0,947 38 2,07 0,980 77 2,54 0,994 46 3,44 0,999 71
1,63 0,948 45 2,08 0,981 24 2,56 0,994 77 3,46 0,999 73
1,64 0,949 50 2,09 0,981 69 2,58 0,995 06 3,48 0,999 75
1,65 0,950 53 2,10 0,982 14 2,60 0,995 34 3,50 0,999 77
1,66 0,951 54 2,11 0,982 57 2,62 0,995 60 3,55 0,999 81
1,67 0,952 54 2,12 0,983 00 2,64 0,995 85 3,60 0,999 84
1,68 0,953 52 2,13 0,983 41 2,66 0,996 09 3,65 0,999 87
1,69 0,954 49 2,14 0,983 82 2,68 0,996 32 3,70 0,999 89
1,70 0,955 43 2,15 0,984 22 2,70 0,996 53 3,75 0,999 91
1,71 0,956 37 2,16 0,984 61 2,72 0,996 74 3,80 0,999 93
1,72 0,957 28 2,17 0,985 00 2,74 0,996 93 3,85 0,999 94
1,73 0,958 18 2,18 0,985 37 2,76 0,997 11 3,90 0,999 95
1,74 0,959 07 2,19 0,985 74 2,78 0,997 28 3,95 0,999 96
1,75 0,959 94 2,20 0,986 10 2,80 0,997 44 4,00 0,999 97
1,76 0,960 80 2,21 0,986 45 2,82 0,997 60 4,05 0,999 97
1,77 0,961 64 2,22 0,986 79 2,84 0,997 74 4,10 0,999 98
1,78 0,962 46 2,23 0,987 13 2,86 0,997 88 4,15 0,999 98
1,79 0,963 27 2,24 0,987 45 2,88 0,998 01 4,20 0,999 99
111
u F(u) u F(u) u F(u) u F(u)
1,80 0,964 07 2,25 0,987 78 2,90 0,998 13 4,25 0,999 99
1,81 0,964 85 2,26 0,988 09 2,92 0,998 25 4,30 0,999 99
1,82 0,965 62 2,27 0,988 40 2,94 0,998 36 4,35 0,999 99
1,83 0,966 38 2,28 0,988 70 2,96 0,998 46 4,40 0,999 99
1,84 0,967 12 2,29 0,988 99 2,98 0,998 56 4,45 1,000 00
112
Table II.: Critical values of u-test
0,20 0,10 0,05 0,025 0,01 0,005
u() 0,842 1,282 1,645 1,960 2,326 2,576
113
Table III.: Critical values of t-test
0,05 0,025 0,01 0,005
1 6,31 12,71 31,82 63,66
2 2,92 4,30 6,96 9,92
3 2,35 3,18 4,54 5,84
4 2,13 2,78 3,75 4,60
5 2,02 2,57 3,36 4,03
6 1,94 2,45 3,14 3,71
7 1,90 2,36 3,00 3,50
8 1,86 2,31 2,90 3,38
9 1,03 2,26 2,82 3,25
10 1,81 2,23 2,76 3,17
11 1,80 2,2 2,72 3,11
12 1,70 2,18 2,68 3,06
13 1,77 2,16 2,65 3,01
14 1,76 2,14 2,62 2,98
15 1,75 2,13 2,6 2,95
16 1,75 2,12 2,58 2,92
17 1,74 2,11 2,57 2,90
18 1,73 2,10 2,55 2,88
19 1,73 2,09 2,54 2,86
20 1,72 2,09 2,53 2,84
21 1,72 2,08 2,52 2,83
22 1,72 2,07 2,51 2,82
23 1,71 2,07 2,50 2,81
24 1,71 2,06 2,49 2,80
25 1,71 2,06 2,48 2,79
26 1,71 2,06 2,48 2,78
27 1,70 2,05 2,47 2,77
28 1,70 2,05 2,47 2,76
29 1,70 2,04 2,46 2,76
30 1,70 2,04 2,46 2,75
31 1,70 2,04 2,45 2,75
32 1,69 2,03 2,45 2,74
33 1,69 2,03 2,45 2,74
114
Table IV.: Critical values of
2
-test
0,995 0,975 0,05 0,025 0,01 0,005
1 0,00 0,00 3,84 5,02 6,63 7,88
2 0,01 0,05 5,99 7,38 9,21 10,6
3 0,07 0,22 7,81 9,35 11,34 12,84
4 0,21 0,48 9,49 11,14 13,28 14,86
5 0,41 0,83 11,07 12,83 15,09 16,75
6 0,68 1,24 12,59 14,45 16,81 18,55
7 0,99 1,69 14,07 16,01 18,48 20,28
8 1,34 2,18 15,51 17,52 20,09 21,45
9 1,73 2,7 16,92 19,02 21,67 23,59
10 2,16 3,25 18,31 20,48 23,21 25,19
11 2,60 3,82 19,68 21,92 24,72 26,76
12 3,07 4,40 21,03 23,34 26,22 28,30
13 3,57 5,01 22,36 24,74 27,69 29,82
14 4,07 5,63 23,68 26,12 29,14 31,32
15 4,60 6,26 25,00 27,49 30,58 32,80
16 5,14 6,91 26,3 28,85 32,00 34,27
17 5,70 7,56 27,59 30,19 33,41 35,72
18 6,26 8,23 28,87 31,53 34,81 37,16
19 6,84 8,91 30,14 32,85 36,19 38,58
20 7,43 9,59 31,41 34,17 37,57 40,00
21 8,03 10,28 32,67 35,46 38,93 41,40
22 8,64 10,98 33,92 36,76 40,29 42,80
23 9,26 11,69 35,17 38,08 41,64 44,18
24 9,89 12,40 36,42 39,36 42,98 45,56
25 10,52 13,12 37,65 40,65 44,31 46,93
30 13,79 16,79 43,77 46,98 50,89 53,67
35 17,19 20,57 49,80 53,2 57,34 60,27
40 20,71 24,43 55,76 59,34 63,69 66,70
45 27,99 23,57 61,66 65,41 69,96 73,17
50 34,31 32,36 67,5 71,42 76,15 79,49
60 35,53 40,46 79,46 83,30 38,38 91,95
70 43,28 48,76 90,58 95,02 100,43 104,21
80 51,17 57,15 101,88 106,63 112,33 116,32
90 59,20 65,65 113,15 118,14 124,12 128,30
100 67,33 74,22 124,34 129,56 135,81 140,17
115
Table V.: Critical values of F-test for = 0,05
1 2 3 4 5 6 7 8 9 10 20 40 60 120
1 161 200 213 225 230 234 237 239 241 242 248 251 252 253
2 18,5 19,0 19,2 19,2 19,3 19,3 19,4 19,4 19,4 19,4 19,4 19,5 19,5 19,5
3 10,1 9,55 9,28 9,12 9,01 8,94 8,89 8,85 8,81 8,79 8,66 8,59 8,57 8,55
4 7,71 6,94 6,95 6,39 6,26 6,16 6,09 6,04 6,00 5,96 5,80 5,72 5,69 5,66
5 6,91 5,79 5,41 5,19 5,05 4,95 4,88 4,82 4,77 4,74 4,56 4,46 4,43 4,40
6 5,99 5,14 4,76 4,53 4,39 4,28 4,21 4,15 4,10 4,06 3,87 3,77 3,74 3,70
7 5,59 4,74 4,35 4,12 3,97 3,87 3,79 3,73 3,68 3,64 3,44 3,34 3,30 3,27
8 5,32 4,46 4,07 3,84 3,69 3,58 3,50 3,44 3,39 3,35 3,15 3,04 3,01 2,97
9 5,12 4,26 3,86 3,63 3,48 3,37 3,29 3,23 3,18 3,14 2,94 2,83 2,79 2,75
10 4,96 4,10 3,71 3,48 3,33 3,22 3,14 3,07 3,02 2,98 2,77 2,66 2,62 2,58
11 4,84 3,98 3,59 3,36 3,20 3,09 3,01 2,95 2,90 2,85 2,65 2,53 2,49 2,45
12 4,75 3,89 3,49 3,26 3,11 3,00 2,91 2,85 2,80 2,75 2,54 2,43 2,38 2,34
13 4,67 3,81 3,41 3,18 3,03 2,92 2,83 2,77 2,71 2,67 2,46 2,34 2,30 2,25
14 4,60 3,74 3,64 3,11 2,96 2,85 2,76 2,7 2,65 2,60 2,39 2,27 2,22 2,18
15 4,64 3,68 3,29 3,06 2,90 2,79 2,71 2,64 2,59 2,54 2,33 2,20 2,16 2,11
116
Table V.: Critical values of F-test for = 0,05
1 2 3 4 5 6 7 8 9 10 20 40 60 120
16 4,49 3,63 3,24 3,01 2,85 2,74 2,66 2,59 2,54 2,49 2,28 2,15 2,11 2,06
17 4,45 3,59 3,20 2,96 2,81 2,70 2,61 2,55 2,49 2,45 2,23 2,10 2,06 2,01
18 4,41 3,55 3,16 2,93 2,77 2,66 2,58 2,51 2,46 2,41 2,19 2,06 2,02 1,97
19 4,38 3,52 3,13 2,9 2,74 2,63 2,54 2,48 2,42 2,38 2,16 2,03 1,98 1,93
20 4,35 3,49 3,10 2,87 2,71 2,60 2,51 2,45 2,39 2,35 2,12 1,99 1,95 1,90
21 4,32 3,47 3,07 2,84 2,68 2,57 2,49 2,42 2,37 2,32 2,10 1,96 1,92 1,87
22 4,30 3,44 3,05 2,82 2,66 2,55 2,46 2,40 2,34 2,30 2,07 1,94 1,89 1,84
23 4,28 3,42 3,03 2,80 2,64 2,53 2,44 2,37 2,32 2,27 2,05 1,91 1,86 1,81
24 4,26 3,40 3,01 2,78 2,62 2,51 2,42 2,36 2,30 2,25 2,03 1,89 1,84 1,79
25 4,24 3,39 2,92 2,76 2,60 2,49 2,40 2,34 2,28 2,24 2,01 1,87 1,82 1,77
26 4,23 3,37 2,98 2,74 2,59 2,47 2,39 2,32 2,27 2,22 1,99 1,85 1,80 1,75
27 4,21 3,35 2,96 2,73 2,57 2,46 2,37 2,31 2,25 2,20 1,97 1,84 1,79 1,73
28 4,20 3,34 2,95 2,71 2,56 2,45 2,36 2,29 2,24 2,19 1,96 1,82 1,77 1,71
29 4,18 3,33 2,93 2,70 2,55 2,43 2,35 2,28 2,22 2,18 1,94 1,81 1,75 1,70
30 4,17 3,32 2,92 2,69 2,53 2,42 2,33 2,27 2,21 2,16 1,93 1,79 1,74 1,68
40 4,08 3,23 2,84 2,61 2,45 2,34 2,25 2,18 2,12 2,08 1,84 1,69 1,64 1,58
60 4,00 3,15 2,76 2,53 2,37 2,25 2,17 2,10 2,04 1,99 1,75 1,59 1,53 1,47
120 3,92 3,07 2,68 2,45 2,29 2,17 2,09 2,02 1,96 1,91 1,66 1,50 1,43 1,35
117
Table VI.: Critical values of F-test for = 0,01
1 2 3 4 5 6 7 8 9 10 20 40 60 120
1 4050 5000 5400 5620 5760 5860 5930 5980 6020 6060 6210 6290 6310 6340
2 998,5 99 99,2 99,2 99,3 99,3 99,4 99,4 99,4 99,4 99,4 99,5 99,5 99,5
3 34,1 30,8 29,5 28,7 28,2 27,9 27,7 27,5 27,3 27,2 26,7 26,4 26,3 26,2
4 21,2 18 16,7 16 15,5 15,2 15 14,8 14,7 14,5 14 13,7 13,7 13,6
5 16,3 13,3 12,1 11,4 11 10,7 10,5 10,3 10,2 10,1 9,55 9,2 9,2 9,11
6 13,7 10,9 9,78 9,15 8,75 8,47 8,26 8,1 7,98 7,87 7,4 7,14 7,06 6,97
7 12,2 9,55 8,45 7,85 7,46 7,19 6,99 6,84 6,72 6,62 6,16 5,91 5,82 5,74
8 11,3 8,65 7,59 7,01 6,63 6,37 6,18 6,03 5,91 5,81 5,36 5,12 5,03 4,95
9 10,6 8,02 6,99 6,42 6,06 5,8 5,61 5,47 5,35 5,26 4,81 4,57 4,48 4,4
10 10 7,56 6,55 5,99 5,64 5,39 5,2 5,06 4,94 4,85 4,41 4,17 4,08 4
11 9,65 7,21 6,22 5,67 5,32 5,07 4,89 4,74 4,63 4,54 4,1 3,86 3,78 3,69
12 9,33 6,93 5,95 5,41 5,06 4,82 4,64 4,5 4,39 4,3 3,86 3,62 3,54 3,45
13 9,07 6,7 5,74 5,21 4,86 4,62 4,44 4,3 4,19 4,1 3,66 3,43 3,34 3,25
14 8,86 6,51 5,56 5,04 4,69 4,46 4,28 4,14 4,03 3,94 3,51 3,27 3,18 3,09
15 8,68 6,36 5,42 4,89 4,56 4,32 4,14 4 3,39 3,8 3,37 3,13 3,05 2,96
118
Table VI.: Critical values of F-test for = 0,01
1 2 3 4 5 6 7 8 9 10 20 40 60 120
16 8,53 6,23 5,29 4,77 4,44 4,2 4,03 3,89 3,78 3,69 3,26 3,02 2,93 2,84
17 8,4 6,11 6,18 4,67 4,34 4,1 3,93 3,79 3,68 3,59 3,16 2,92 2,83 2,75
18 8,29 6,01 5,09 4,58 4,25 4,01 3,84 3,71 3,6 3,51 3,08 2,84 2,75 2,66
19 8,18 5,93 5,01 4,5 4,17 3,94 3,77 3,63 3,52 3,43 3 2,76 2,67 2,58
20 8,1 5,85 4,94 4,43 4,1 3,87 3,7 3,56 3,46 3,37 2,94 2,69 2,61 2,52
21 8,02 5,78 4,87 4,37 4,04 3,81 3,64 3,51 3,4 3,31 2,88 2,64 2,55 2,46
22 7,95 5,72 4,82 4,31 3,99 3,76 3,59 3,45 3,35 3,26 2,83 2,58 2,5 2,4
23 7,88 5,66 4,76 4,26 3,94 3,71 3,54 3,41 3,3 3,21 2,78 2,54 2,45 2,35
24 7,82 5,61 4,72 4,22 3,9 3,67 3,5 3,36 3,26 3,17 2,74 2,49 2,4 2,31
25 7,77 5,57 4,68 4,18 3,85 3,63 3,46 3,32 3,22 3,13 2,7 2,45 2,36 2,27
26 7,72 5,63 4,64 4,14 3,82 3,59 3,42 3,29 3,18 3,09 2,66 2,42 2,33 2,23
27 7,68 5,49 4,6 4,11 3,78 3,56 3,39 3,26 3,15 3,06 2,63 2,38 2,29 2,2
28 7,64 4,45 4,57 4,07 3,75 3,53 3,36 3,23 3,12 3,03 2,6 2,35 2,26 2,17
29 7,6 5,42 4,54 4,04 3,73 3,5 3,33 3,2 3,09 3 2,57 2,33 2,23 2,14
30 7,56 5,39 4,51 4,02 3,7 3,47 3,3 3,17 3,07 2,98 2,55 2,3 2,21 2,11
40 7,31 5,18 4,31 3,83 3,51 3,29 3,12 2,99 2,89 2,8 2,37 2,11 2,02 1,92
60 7,08 4,98 4,13 3,65 3,34 3,12 2,95 2,82 2,72 2,63 2,2 1,94 1,84 1,73
120 6,85 4,79 3,95 3,48 3,17 2,96 2,79 2,66 2,56 2,47 2,03 1,76 1,66 1,53
119
CV of Author
Assoc.Prof. RNDr. Pemysl Zkodn,CSc.
Assoc.Prof. RNDr. Pemysl Zkodn,CSc., graduated from the Mathematical-Physics
Faculty of Charles University, CSc. in the physics education, and docent (assoc. professor) of
physics education. As a university teacher, he is affiliated to the University of South Bohemia
in esk Budjovice and to the University of Finance and Administration in Prague.
He is active in scientific work in cooperation with the International Institute of
Informatics and Systemics in U.S.A., and the Curriculum Studies Research Group in
Slovakia. In his scientific work, aimed at science and statistics education, he deals with
structuring and modelling physics and statistics knowledge and systems of knowledge and
also data mining and curricular process.
In addition to support from his faculty and university, the projects granted to the
author by the Avenira Foundation in Switzerland and the University of Finance and
Administration in Czech Republic has brought a considerable contribution to the results
achieved.
The conception of the last books Survey of Principles of Theoretical Physics,
Curricular Process in Physics, Fundaments of Statistics (with co-authors), and From
Financial Derivatives to Option Hedging (with co-author) and last monographs Educational
and Didactic Communication 2008, 2009, 2010, 2011 are based on the scientific work of the
author. Some of the further works published by the author are quoted in the bibliography.
Assoc.Prof. RNDr. Pemysl Zkodn, CSc. is active as general chair of international
e-conferences OEDM-SERM 2011 and OEDM-SERM 2012 (Optimization, Education and
Data Mining in Science, Engineering and Risk Management).
120
Bibliography of Author
i) The monographs
Tarabek,P., Zaskodny,P.: Analytical-Synthetic Modelling of Cognitive Structures (volume 1:
New structural methods and their application).
Educational Publisher Didaktis Ltd., Bratislava, London 2001
Tarabek,P., Zaskodny,P.: Analytical-Synthetic Modelling of Cognitive Structures (volume 2:
Didactic communication and educational sciences).
Educational Publisher Didaktis Ltd., Bratislava, New York 2002
Tarabek,P., Zaskodny,P.: Structure, Formation and Design of Textbook (volume 1:
Theoretical basis).
Educational Publisher Didaktis Ltd., Bratislava, London 2003
Tarabek,P., Zaskodny,P.: Structure, Formation and Design of Textbook (volume 2: Theory
and practice).
Educational Publisher Didaktis Ltd., Bratislava, London 2004
Tarabek,P., Zaskodny,P.: Modern Science and Textbook Creation (volume 1: Projection of
scientific systems).
Educational Publisher Didaktis Ltd., Bratislava, Frankfurt a.M. 2005
Tarabek,P., Zaskodny,P.: Modern Science and Textbook Creation (volume 2: Modern
tendencies in textbook creation).
Educational Publisher Didaktis Ltd., Bratislava, Frankfurt a.M. 2006
Tarabek,P., Zaskodny,P.: Educational and Didactic Communication 2007
Educational Publisher Didaktis Ltd., Bratislava, Frankfurt a.M. 2008
Tarabek,P., Zaskodny,P.: Educational and Didactic Communication 2008
Educational Publisher Didaktis Ltd., Bratislava, Frankfurt a.M. 2009
Tarabek,P., Zaskodny,P.: Educational and Didactic Communication 2009
Educational Publisher Didaktis Ltd., Bratislava, 2010
Tarabek,P., Zaskodny,P.: Educational and Didactic Communication 2010
Educational Publisher Didaktis Ltd., Bratislava, 2011
Tarabek,P., Zaskodny,P.: Educational and Didactic Communication 2011
Educational Publisher Didaktis Ltd., Bratislava, 2012
ii) The books
Pavlt,V., Zkodn,P. at al: Capital Market, The first edition, 2003
Zkodn,P.: Survey of Principles of Theoretical Physics (with Application to Radiology)
(in Czech). Didaktis, Bratislava, Slovak Republic 2005
121
Zkodn,P.: Survey of Principles of Theoretical Physics (with Application to Radiology) (in
English). Avenira, Switzerland, Algoritmus, Ostrava, Czech Republic 2006
Pavlt,V., Zkodn,P. at al: Capital Market, The second edition, 2006
Zkodn,P.: Curricular Process in Physics (in Czech). Avenira, Switzerland, Algoritmus,
Ostrava, Czech Republic 2009
Zkodn,P. at al.: Fundaments of Statistics (in Czech). Curriculum, Czech Republic 2011
Pavlt,V., Zkodn,P.: From Financial Derivatives to Option Hedging. Curriculum, Czech
Republic 2012
iii) The textbooks
Zkodn,P.: Theoretical Mechanics in Examples I (in Czech). PF, Ostrava, Czech
Republic 1984
Zkodn,P., Sklenk,L.: Theoretical Mechanics in Examples II (in Czech). PF, Ostrava,
Czech Republic 1986
Zkodn,P. et al.: Principles of Economical Statistics (in Czech). VSFS, Praha, Czech
Republic 2004
Budnsk,P., Zkodn,P.: Financial and Investment Mathematics. VSFS, Prague 2004
Zkodn,P. et al.: Principles of Health Statistics (in Czech). JU, esk Budjovice, Czech
Republic 2005
Kozlovsk,D., Skalick,Z., Zkodn,P.: Introduction to Practicum from Radiological
Physics. JCU, esk Budjovice, Czech Republic, 2007
Zkodn,P., Pavlt,V., Budk,J.: Financial Derivates and Their Evaluation. Prague,
University of Finance and Administration, 2009
iv) The papers
Approximately 100 papers
122
Global References
Dalgaard,P. (2008). Introductory Statistics with R. Second Edition. New York, USA:
Springer. (In English)
ISBN-13: 978-038779-053-4
Field,A. (2009). Discovering Statistics Using SPSS. Third Edition. London, New Delhi,
Singapore: SAGE. (In English)
ISBN-13: 978-184787-907-3
Jorion,P. (2007). Financial Risk Manager. Handbook. Hoboken, New Jersey, USA:
Wiley&Sons. (In English)
ISBN 978-0-470-12630-1
Matloff,N. (2011). The Art R Programming: A Tour of Statistical Software Design. USA: No
Starch Press. (In English)
ISBN-13: 978-159327-384-2
Pavlt,V., Zkodn,P. (2012). From Financial Derivatives to Option Hedging. Prague, Czech
Republic: Curriculum. (In Czech)
ISBN 978-80-904948-3-1
Tarbek,P., Zkodn,P. (2011). Data Mining Toos in Statistics Education. In:
Educational&Didactic Communication 2010. Bratislava, Slovakia: Didaktis. (In English)
ISBN 978-80-89160-78-5
Zkodn,P. et al (2007). Principles of Economical Statistics. Prague, Czech Republic:
Eupress. (Partly on English)
ISBN 80-86754-00-6