Multivariate Analysis Applied to Forestry

Agricultural Sciences: The Model-Directed
Angela Helena Silva Mendes Stival 1, Gilberto de Souza Iris Oliveira1, Jessica
Bezerra Bandeira1, Josué Luiz Marinho Junior1, Larissa da Silva Cintra1,
Patricia Cardoso Dias1, Debora Portella Biz2, Augustus Caeser Franke
1 Studying Master in the Post Graduate Program in Forestry and Environmental Sciences - Federal University of Tocantins.
2 .Professional Mastersin Urban Environment and Industrial of the Federal University of Paraná (UFPR) -Curitiba-Paraná.
3.Bioprocess Engineering and Biotechnology Division - Federal University of Tocantins (UFT) - Campus of Gurupi, Gurupi-

TO, Brazil. Corresponding author:


Abstract— This is a literature review that aimed to find 2004).According to Hair et al., (2009) multivariate
articles that exemplify and describe the use of techniques are popular because they allow organizations
multivariate analysis in different fields of Forest to create knowledge, thereby improving their decision -
Agricultural Sciences, considering effective practices making. Multivariate analysis refers to all the statistical
using multivariate statistical techniques for the techniques that simultaneously analyze multiple
simultaneous processing of data. For data collection were measurements on individuals or objects under
selected for the meta-analysis of 70 technical articles of investigation.
which 54 were employed in the study directed to the use of For Gerhardt, et al., 2001 multivariate analysis
multivariate techniques applied in the areas of comes to data through a set of statistical techniques
agricultural sciences. The results showed thatstudies considering measures many variables simultaneously.
directed to certain areas within the Forest Agricultural And to obtain such results some multivariate methods are
Sciences exhibit some regularity in the use of multivariate applied to data depending on the research objectives,since
analysis, and most application analyzes were more usual it is known that an exploratory data analysis, aims to
as the Cluster Analysis (AA) and Principal Component generate hypotheses that is exactly the goal of the
Analysis (PCA). Thus the use of multivariate analysis multivariate analysis (VICINI, 2005).
studies and evaluations of experiments in Agricultural Multivariate analysis is a vast field in which
Sciences proved to great value to allow greater clarity even experienced statistical move carefully, because this
and better interpretation of dealing with complex is a new area of science, much is yet to be discovered.
phenomena. The art of the use of multivariate analysis is the choice of
Keywords— Multivariate Analysis, Multivariate the most appropriate options to detect the standards
Methods, Forest Agricultural Sciences. expected in the data (MAGNUSSON, 2003).
The purpose of their application may be to
I. INTRODUCTION reduce data or structural simplification, sort and group, to
Statistically data analysis is classified into investigate the dependency between variables, prediction
univariate or multivariate, i.e., it variables alone or jointly and develop hypotheses and test them (JOHNSON;
respectively. According VICINI, 2005 until the advent of WICHERN, 1992).
computers the data were treated only in isolation, and Multivariate techniques can meet the specific
when a phenomenon depends on many variables such interests of a forestry company or a research institution,
analysis became unfeasible. aiming at a particular interest, apart from a property or set
Multivariate analysis corresponds to a large of properties. Thus, this study aims to quantify and clarify
number of methods and techniques that utilize, what and how the main tools of multivariate analysis
simultaneously, all variables in the theoretical applied in various areas of study of forest agricultural
interpretation of the set of obtained data (Neto, sciences are used, reviewing a number of literature Page | 217

articles updated where various techniques of multivariate that the inverse correlation matrix approaches the
analysis are used. diagonal matrix, therefore compares the correlations
between observed variables Solomon et al., (2012).
II. REVIEW According VICINI, 2005 KMO can be obtained
The application of multivariate analysis is a by the following equation:
combination of multiple information entered in the
experimental unit, so that the selection is based on a
complete set of important variables that discriminate
between materials that are more promising (Maeda et al.,
2001). Since the multivariate techniques have numerous
The ratio of the sum of the squares of the
applications, one needs to know about the main of them
correlations of all variables is divided by itself, plus the
being applied in the areas of Forest Agricultural Sciences,
sum of the squares of the partial correlations of all
its functions and objectives. We as the main examples of
multivariate techniques, multivariate normal distribution,
At where:
matrix and vectors, quadratic forms, eigenvalues and
= r ij is the correlation coefficient between the observed
eigenvectors, analysis of multivariate variance -
variables i and j.
MANOVA, the multivariate linear regression models,
= ij is the partial correlation coefficient between the same
simultaneous tests on several variables, multivariate
variables. The aij should be close to zero, because the
distances, component analysis , factors analysis, cluster
factors are orthogonal to each other.
analysis and discriminant, canonical correlation analysis.
So that the data can fit the factor analysis should
Factor analysis
be noted the following regarding the value found in
Factor Analysis (FA) aim to reduce the number
Kaiser's equation:
of initial analysis with the least possible waste of
Table.1: Relationship between the KMO and the use of
information, taking advantage of a set statistical
Factor Analysis
techniques (VICINI, 2005). CARVALHO, 2013 says that
whenever there is a strong correlation with variables is KMO Recommendati on AF
conceivable to group them into a group, since different ≥ 0.9 Great
variable groups have weak correlation. ≥ 0.8 and <0.9 Good
Factor analysis is applied when there is a large ≥ 0.7 and <0.8 Average
number of variables and correlated, includes principal ≥ 0.6 and <0.7 Acceptable
component analysis and analysis of common factors, in ≥ 0.5 and <0.6 weak
order to identify a smaller number of new alternatives ≤ 0.5 Unacceptable
variables, uncorrelated and that somehow , summarize the Source: CARVALHO, 2013
main information of the original variables finding factors
or latent variables (Mingoti, 2005). Sphericity test Bartlett
According to Carvalho, 2013 generic formula for Another test used widely in the factor analysis is
applying a factor analysis is defined by: to Bartlett sphericity test (BTS), which tests the following
X - = μ + ΛF ɛ (1) hypothesis: the correlation matrix is an identity matrix, ie
the values of the main diagonal are equal to 1 and the
Whereas X = [X1 X2. . . Xp] T as a real random other figures be zero, concluding that its determinant is
vector of dimension P, with mean vector μ = [μ1 μ2. . . equal to 1. This means that the variables have no
μp] T and covariance matrix Σ variance-defined positive. correlation and the null hypothesis can be rejected if the
The model of factorial analysis each observable variable adopted α is equal to 0.05 or 5% and the value found is
Xi expressed as a linear function of m random variables less than the value of α. (Pereira, 2001).
F1, F2,. . . , Fm (m <p), called common factors , and one Bartlett's test evaluates the overall significance of
factor or error, Ɛi, i = 1, 2,. . . , P. Which it is also a the correlation matrix, i.e. tests the null hypothesis that
random variable that explains the part of the respective the correlation matrix is an identity matrix Solomon et al.,
variable variance not explained by common factors. (2012).
Already Λ would be the matrix (PXM), the common Principal Component Analysis
factors m and p only factors are not observable. The goal of the principal component analysis
Method Kaiser-Meyer-Olkin (KMO) (PCA) is to address issues such as the generation,
Using factor analysis there is an adequacy of selection and interpretation of the investigated
data that is very important proposal by Kaiser-Meyer- components. Intending thereby determine the most
Olkin (KMO). The KMO test is based on the principle Page | 218

influential variables in the formation of each component to form groups with homogeneous properties of large
(VICINI, 2005). heterogeneous samples. Should be sought more
According to Castro et al., 2013 by the ACP, a homogeneous groups possible and that the differences
random vector can be explained by the variance and between them are as large as possible (Hair et al., 2005).
covariance structure (composed of random variables p) by The AA encompasses a variety of techniques and
constructing linear combinations of the original variables. algorithms, and the goal is to find and separate similar
Through data covariance matrix becomes a data in the same group and are distinct from the data of
major component estimated. For the application of the other groups (VICINI, 2005).
analysis it is necessary to standardize the data so that the According Ruhoff et al., 2009 AA seeks to group
whole series will have the same magnitude of values. data elements that are more like each other. The groups
After obtain the eigenvectors that are values representing are determined so as to obtain homogeneity between the
the weights of each component in each variable and range elements of the groups and heterogeneity between them.
of (-1 to 1) and function as correlation coefficients that Dendrogram
represents the contribution of each component to explain As a result of AA we get the dendrogram or phenograms
the total variation of the dataRuhoff et al., (2009). also known as graphic tree that is graphic with a summary
Clusters analysis of the groups obtained by the analysis.
The Cluster Analysis or Cluster (AA) in
multivariate data identifies groups of objects. The goal is

Fig.1: Grouping according to the quality of wood for the production of charcoal, obtained by simple connection method to
use the Euclidean distance.
Source: CASTRO et al, 2013.
It is observed in Figure 1 that the genetic to form the first group. So then come variables 10:09, and
material of 11:08 have the greatest similarity dendrogram, after 1 and 5, and so on, the variables are grouped in
by having the smallest Euclidean distance being such as descending similarity order, ie 12 variable formed the last Page | 219

group of the dendrogram, which remained If different The multivariate analysis of variance and
from the other groups formed, because this variable has covariance is also known as MANOVA (multivariate
little resemblance to the others. analysis of variance) and MANCOVA (multivariate
Distance Euclidean analysis of covariance), aim to verify the similarity
In Cluster Analysis some distance measurement between multivariate groups simultaneously exploring the
coefficients are important, and among them is the relationship between several independent variables and
Euclidean distance also known as dissimilarity measure. two or more variables dependent metrics (Hair et al,
According PARENTS; SILVA; Ferreira (2012) 2005).
considering two points A and B, the Euclidean distance
can be calculated with the following formula: III. RESULTS AND DISCUSSIONS
Distance between A and B = DAB = √Σpj = 1 In this study we selected 54 subjects who treat
(xja - XJB) ² (3) articles that are inserted in the area of Agricultural
In matrix form, this distance is given by: Sciences Forest with the application of multiple
DAB = √ (xa - xb) '. (Xa - xb) (4) multivariate statistical methods. The articles selected were
published between 1990 and 2018 and in this range in
Mahalanobis distance 2015 has been the year with most publications, 8 in total,
The similarity between samples (treatment, followed by 2003 and 2012 with 6 each publications. In
individuals, populations) correlated to a set of contrast the years 1990, 2002, 2005, 2011, 2016 and 2017
characteristics and the distance between any pairs of contributed one article.
sampling units, the degree of dependence between Dealing with multivariate analyzes, among the
variables must be considered. To quantify distance most used in the selected works we can mention among
between two populations when there is data repetition, it the most important the grouping or cluster analysis used
is recommended to use the Mahalanobis distance (d²) 25 times, followed by Component Analysis Principal 20
(VICINI, 2005). times, the factor analysis 10 times, a Canonical
Canonical Correlation Analysis Correspondence Analysis which was used 9 times and 8
The Canonical Correlation Analysis (CCA) has as times Discriminant Analysis.
its main objective the study of existing linear relationship Since the case of the multivariate analysis used
between two sets of variables. Applying this analysis in each study, we observed a pattern between the
summarizes the information of each response variables set multivariate method used and certain lines of research in
in linear combinations seeking to maximize the the area has been established. Knowing this we sought to
correlation between the two sets (Mingoti, 2005).The verify this pattern lines separating the search by subject
ACC is a type of statistical technique in the multivariate and quantifying which types of multivariate method used
analysis which aims according to Protasio et al., 2012 to was more.
check associations between groups with different Multivariate analysis ins studies involving
characteristics. managements soil
This multivariate analysis model allows to Of the 11 works found in this area can be seen in
discover the relationship between two groups or sets of studies Freitas et al. (2015b) and Mantovanelli et
variables, increasing the correlation between the vectors al.(2015) using the same multivariate Cluster Analysis
of independent and dependent variables Burt, (2015). (AA), Principal Component Analysis (PCA) and
Multiple Regression Analysis MANOVA the applicant ACP and the most jobs found in
Multiple regression provides the changes in the this area as it is noted in studies SILVA; et al., (2010a)
dependent variable in accordance with changes in the SILVA; et al., (2010b), Oliveira; et al., (2015) (JORDAN,
independent variables. The method is suitable when there 2018) SILVA et al., (2009), BARRETO et al., (2006). In
is a single analysis dependent variable metric related to addition to the AA already applicant, it was also used
two or more independent variables (Hair et al., 2005). multivariate techniques such as discriminant analysis
(DA), Canonical Correspondence Analysis (CCA) and
Discriminant analysis factor analysis as noted in the articles ofGerhardt; et al.
The multiple discriminant analysis consists of a set (2001), Baretta;Baretta; Cardoso(2008) and BENITES et
of tools and methods used to distinguish populations al., (2010).
groups and classifying new observations in certain groups Multivariate Analysis in environmental studies
and used when groups are known a priori (Mingoti, This area of study other 12 works were selected
2005). of which can be seen the use of factor analysis the most
recurrent among multivariate methods as noted in the
MANOVA / MANCOVA study Scatena, (2005), Campos et al., (2015), Cunha et Page | 220

al., (2008) Parents; Silva; Ferreira (2012), Pinto; Col., evaluate which clones of eucalyptus is the production for
(2014), Silva; Feather; Souza (2015). Other analyzes energy purposes.
Multivariate as Regression Analysis Calijuri et al. (2009), In plant stratification 3 selected studies make use
Discriminant Analysis Braga et al. (2009), Clustered of the same methodology for the processing AA and AD
SILVA Analysis, (2003) were also used to a lesser extent as can be seen in the articles of SOUZA et al., (2003b),
in this area of study. Souza et al. (2006) and Souza et al. (2012) and
Articles of BERTOSSI, (2013), BERTOSSI et multivariate classification of the forest classes of
al., (2013), HUGO et al., (2012), applying multivariate volumetric stocks proved to be an efficient method for
analysis data indicators of water quality, it is noted that all laminating homogeneous areas in the three types of forest,
studies Valley ACP main multivariate analysis to analyze which can be constituted by extracts, compartments, site
the data. classes and annual production units .
Technical analysis of multivariate in experiments Reforestation of mined areas CUNHA et al.
involving forests and Forest products of origin (2003) and / or degraded LOSCHI et al., (2011) there was
In this part of 31 studies not fit this line of a similarity in the use of multivariate ACC that despite
research which were separated in sub-items for better distinct areas there was a similarity in the results
visualization of multivariate analysis applied in the presented by the same analysis. Oliveira et al. (2016)
aforementioned area of study. made use of the ACP which showed efficient use of
Considering the application of multivariate multivariate analysis in response to a high variance data,
analysis of floristic data analysis 7 items fall into this is used as a tool for annual use, may best reference
issue where there was no standard in the use of ecological standards of the area, can be used to identify
multivariate methods getting use well distributed in this indicators of forest restoration.
type of study, it was noted that HIGUGHI et al., (2012 ) In studies aimed at planting was unanimous the
and Higuchi et al., (2013) took advantage of the ACP and use of AA as noted in Article developed by Grigolo et al.
the AA to analyze your data and dealing with similar (2018). In four studies selected 3 of them also used the
themes applied in different areas in areas of Santa ACP to complement the study as noted in studies of
Catarina methods have adapted perfectly to the proposed NETO et al., (2018) andRuhoff et al. (2009). The use of
studies. In PEIXOTO work (2004) in Rio de Janeiro and principal components showed that higher yields are
Narvaes; LONGHI; BRENA, (2008) in Rio Grande do correlated with proper growth of the shoot, in conditions
Sul, also because it is similar studies in different areas of of lower bulk density, providing high dry matter
the same multivariate technique can be applied to both production of roots (FREDDI; FERRAUDO;
studies and AA managed to separate the similar data in CENTURION, 2008).
different groups where using the dendrogram and In studies involving forests of the 11 selected
Euclidean distance data were easily spotted. Still treating articles met similarity in the use of multivariate analysis
the flora analysis Souza et al., (2003a) andBERTANI in studies Rovedder et al., (2014) and LÚCIO et al.
(2001) took advantage of the ACC to analyze the floristic (2006)They used the ACP to reduce the maximum
diversity in riparian forests. LastlySOLOMON; JUNIOR; number of variables that could represent possible and
SANTANA, (2012) used the factor analysis to carry out most of the variance found. However, studies Almeida et
the floristic analysis of primary forest for restoration of a al. (2015) and CANUTO et al., (2015) used the AA to
mined area. separate the samples that have greater similarity in
In studies dealing with the quality of wood for different groups and thus can make a better analysis of the
energy purposes, each author made use of different sampled data. In the study by MACHADO (2004)
multivariate statistics for the analysis of data as can be Rectified Correspondence Analysis,LONGHIL et al.,
seen in Protasio et al., (2012) with only the ACC could (2009) Regression Analysis, Silva et al., (2012) Factor
verify the associations between the group formed by the Analysis, MARTINS; SAUCER; OLIVEIRA, (2002) and
characteristics of Eucalyptus clones with the MANOVA Canonical Correspondence Analysis (ACC),
characteristics of the group formed by her charcoal TRUGILHO; LIME; MORI (2003) ACC Oliveira et al.
obtained. Already Castro et al., (2013) used three (2017) Discriminant analysis (DA), Souza et al. (1990)
multivariate analysis they being the ACC ACP and AA AD and Cluster Analysis, We can not show a pattern in
that through them it can be concluded that the properties the use of multivariate analysis where each author made
of charcoal are strongly correlated to the wood, especially use of a different method to analyze your data.
the apparent density of charcoal and the gravimetric yield.
GADELHA et al. (2015) has focused his study of this IV. CONCLUSION
same area MANOVA multivariate method used to Presented results it is seen that studies directed to
certain areas within the Forest Agricultural Sciences have Page | 221

certain regularity in the use of multivariate analysis, floresta ribeirinha. Revta brasil. Bot., São Paulo,
making use of the same techniques to observe its data. V.24, n.1, p.11-23, mar. 2001
And that because of the usemultivariate analysis deduce [7] BERTOSSI; Ana Paula Almeida et al. Seleção e
from some knowledge, very complex methods are rarely agrupamento de indicadores da qualidade de
used in the searched items in exchange for simpler águas utilizando Estatística Multivariada.
analysis that were most useful as the Cluster Analysis Ciências Agrárias, Londrina, v. 34, n. 5, p. 2025-
(AA) and Principal Components Analysis (PCA) which 2036, set./out. 2013
were the most widely used . But the determining factor in [8] BRAGA, F. A. et. al.Características ambientais
the choice of multivariate analysis applied is the purpose determinantes da capacidade produtiva de sítios
of the analysis, which generally applies in the cultivados com eucalipto. Revista Brasileira de
simultaneous analysis of multiple sets factors when it Ciência do Solo, vol. 23, núm. 2, pp. 291-298, 2009.
needs to reduce data, identify relationships between [9] Canuto, Daniela Silvia de Oliveira, et al. "Genetic
variables, split group of similar factors among others. characterization of a progeny test of
The use of multivariate analysis studies and Dipteryxalata Vog., from a forest fragmentof
evaluations of experiments in Agricultural Sciences Estação Ecológica Paulo de Faria, São Paulo
proved to great value to allow greater clarity and better State, Brazil." Hoehnea 42.4 (2015): 641-648.
interpretability of dealing with complex phenomena. [10] CALIJURI; Maria Lúcia, SANTIAGO; Aníbal da
Fonseca, CAMARGO; Rodrigo de Arruda, NETO;
