Professional Documents
Culture Documents
net/publication/325060898
CITATIONS READS
0 42
3 authors, including:
Some of the authors of this publication are also working on these related projects:
vehicle routing problem with heterogeneous fleet capacity and velocity View project
All content following this page was uploaded by Toni Bakhtiar on 10 May 2018.
Abstract
1. Introduction
In this paper, missing values of the 2016 EPI data are imputed by DFMI,
Gabriel eigen, EM-SVD, and biplot imputation and then will be determined
how we obtain the goodness-of-fit of the imputation data. Finally, we
conclude the best imputation method.
k 1 lk uik w jk , i,
r
elements of X by the calculations of xij j. If xij
k 1 lk uik w jk ,
r
xˆij in which the lk , uik and w jk must be estimated from
value. If there are missing values, so in the beginning, they are imputed by
their respective columns means, thereby providing a complete matrix. The
next step, we supersede every imputation value separately by using DFMI
method.
set that can be arranged in matrix form. If xij is missing value from n X p
data matrix, so we denote the matrix partition by (1),
xij xi.
, (1)
x. j X i , j
where xij is missing value, xi. is the ith row from X by deleting xij , x. j is
the jth column from X by deleting xij , and X i , j is obtained from X by
deleting the ith row and the jth column. Furthermore from (1), we make the
multiple regression model x. j X i , j β ε. j with min x. j X i , j β .
i 1 li
s
ˆ 0 0 0 0
s
approximated by X k 1
l uk wk with s r and 0.75.
i 1 li
k r
ˆ 0 k 1 lk0uk0w k0
s
by X with s 2 or s 3. We supersede the
previous iteration, xij is the observation value (not missing) in the ith row
and jth column, and N is the total number of the observation values.
n
sij s ji
1
ykj y j yki yi wijk , i, j , (3)
n
w 1 k 1
k 1 ijk
where sij is the covariance between the ith and jth variables, n is the total
number of objects in data, ykj is the value of the jth variable on kth object,
y j is the mean of elements on jth variable that is not missing value, and
wijk is weight that be 0 if ykj or yki are missing and 1 otherwise. Suppose
that D dij is the Euclidean distance as the proximity matrix of the initial
data, the computation of D is obtained after we compute the Euclidean
distance by (4),
s 1 xis x js 2 mijs ,
p
where dij is the Euclidean distance between the ith and the jth objects, p is
the total number of variables in data, xis is the value of the ith object on sth
variable, and mijs is weight that be 0 if xis or x js are missing and 1
otherwise.
Goodness-of-fit of the Imputation Data in Biplot Analysis 1845
In the imputation data, the covariance and the proximity matrix will be
obtained by using biplot analysis that is provided by [6]. Suppose that Si is
the covariance matrix and Di is the proximity matrix of the ith imputation
data or Xi . The first step, we decompose Xi ULW by SVD, let
G UL and H L1 W so Xi GH. With the result that, Si is
obtained from HH (by choosing 0) because HH is proportional with
the covariance matrix of the initial data. Di is obtained from the Euclidean
where Si and S are the covariance matrix from imputation data and
the initial data, respectively, r and ii i 1, 2, ..., r is rank and singular
value, respectively, from Si T ST or ST Si T . ST is S after the translation-
normalization procedure. Si T is Si after the translation-normalization
procedure. The measure belongs to the interval of 0, 1, if Si , S 1 so
it means that Si has a good approximation to represent the correlation
among variables from the initial data. Conversely, Si , S 0 so it means
that Si has a bad approximation. Because of that, Si , S can be used to
obtain the goodness-of-fit of the covariance matrix. We must also compute
Di , D that is the goodness-of-fit of the proximity matrix from
imputation data.
1846 Ridho Ananda, Siswadi and Toni Bakhtiar
Table 1. The goodness-of-fit of the covariance matrices from the first nth
principal components
Gabriel Imputation Imputation
n DFMI EM-SVD
eigen biplot biplot
s 3 s 2
2 0.835 0.843 0.834 0.838 0.837
3 0.893 0.900 0.888 0.895 0.893
4 0.936 0.945 0.933 0.941 0.937
32 0.990 0.990 0.987 0.983 0.983
Table 2. The goodness-of-fit of the proximity matrices from the first nth
principal components
Gabriel Imputation Imputation
n DFMI EM-SVD
eigen biplot biplot
s 3 s 2
2 0.832 0.828 0.835 0.822 0.818
3 0.891 0.893 0.894 0.885 0.884
4 0.943 0.943 0.938 0.937 0.935
32 0.987 0.994 0.986 0.985 0.982
Goodness-of-fit of the Imputation Data in Biplot Analysis 1847
(a) (b)
Figure 2 shows that the objects are plotted as points, whereas variables
are plotted as lines. The interesting property of the biplot when 0 is that
the lengths of the lines are proportional to the standard deviation of the
variables and the cosines of the angles between two lines represent
correlations between variables correspondingly in the 2016 EPI data. The
visualization is satisfactory because the goodness-of-fit of the covariance
matrix in the first two principal components is 0.837, it means that the first
two principal components account for 83.7% of the total information of
the correlation among variables in the 2016 EPI data, so that the two-
dimensional representation is a reasonably faithful representation of the
correlation among variables in the 2016 EPI data. The Euclidean distance
1848 Ridho Ananda, Siswadi and Toni Bakhtiar
Figure 2. Biplot from the biplot imputation result with (a) 0 and
(b) 1.
4. Conclusions
References
Siswadi: siswadimathipb@gmail.com