You are on page 1of 7

Food Chemistry 128 (2011) 555561

Contents lists available at ScienceDirect

Food Chemistry
journal homepage: www.elsevier.com/locate/foodchem

Analytical Methods

Preliminary study on the application of near infrared spectroscopy and pattern recognition methods to classify different types of apple samples
Weiqi Luo, Shuangyan Huan , Haiyan Fu, Guoli Wen, Hanwen Cheng, Jingliang Zhou, Hailong Wu, Guoli Shen, Ruqin Yu
State Key Laboratory of Chemo/Biosensing & Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China

a r t i c l e

i n f o

a b s t r a c t
In this paper, near infrared (NIR) spectroscopy combined with pattern recognition methods was used in an attempt to classify different types of apple samples. Three pattern recognition methods such as Knearest neighbour (KNN), partial least-squares discriminant analysis (PLSDA) and moving window partial least-squares discriminant analysis (MWPLSDA) were used to classify apple samples of different geographical origins, grades and varieties. The result indicates that MWPLSDA is superior to these two conventional pattern recognition methods. Because MWPLSDA method can select narrow but informative wavelength intervals to reconstruct an efcacious classication model with high predicting accuracy. In conclusion, MWPLSDA coupled with near-infrared bre-optic technology is proved to be an effective method for fruit classication. 2011 Elsevier Ltd. All rights reserved.

Article history: Received 22 April 2010 Received in revised form 21 January 2011 Accepted 13 March 2011 Available online 17 March 2011 Keywords: Near-infrared bre-optic technology Apple Pattern recognition MWPLSDA

1. Introduction Apple fruit is considered to be an important part of a healthy diet which is one of the most frequently consumed fruits. In China, commercial apple production in recent years amounts to 24 million ton/year (Wu et al., 2007). Apple fruit contains several health and sensory related constituents including monosaccharides, minerals, dietary bre, and various biologically active compounds, such as vitamin C and phenolic compounds. Polyphenols are one of the most important dietary antioxidants, and apple fruit represents a major source of phenol compounds, since its consumption is widespread in many countries and it is available on the market for the whole year. About 22% of the fruit phenolics consumed in the United States are from apples, making them the largest source of dietary antioxidant (Vinson, Su, Zubik, & Bose, 2001). Apple intake contributes to improved health and well being by reducing the risk of diseases such as lung cancer, asthma, type-2 diabetes, thrombotic stroke, and ischemic heart disease (Hyson, Studebaker-Hallman, Davis, & Gershwin, 2000; Knekt et al., 2002). Eating quality and appearance of apples are important for the consumer. Their quality is dened by colour, texture, avour and taste (sweet, sour, salt and bitter sensations), in addition to physical parameters such as size and shape (Karlsen, Aaby, Sivertsen, Baardseth, & Ellekjr, 1999), these factors of apples are affected by the variety of the fruit, the geographical origins, the cultivation conditions and the climate (Bobelyn et al., 2010).
Corresponding author. Fax: +86 731 88821916.
E-mail address: shuangyanhuan@yahoo.com.cn (S. Huan). 0308-8146/$ - see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.foodchem.2011.03.065

Conventional analytical techniques used for quality assessment and classication of apple are time-consuming, usually require complicated sample preparations and need high running costs. Thus, development of rapid analytical techniques for fruit analysis is of great interest. Prediction of apple quality has also been investigated extensively in these years. (Xing, Landahl, Lammertyn, Vrindts, & Baerdemaeker, 2003) applied visNIR spectroscopy to discriminate bruises and non-bruised healthy spots on Golden Delicious apples. (Yamada, Takechi, Hoshi, & Amano, 2004) compared the water relations in watercored and non-watercored apples induced by fruit temperature treatment and suggested that fruit temperature affect watercore development via the inuence on evapotranspiration from the fruit surface and water relations in the fruit. Transmission NIR spectroscopy was utilised to detect brownheart in apple (Clark, McGlone, & Jordan, 2003). Near-infrared spectroscopy (NIRS) (Han, Tu, Lu, Liu, & Wen, 2006; Kim, Mowat, Poole, & Kasabov, 2000; Kurz, Leitenberger, Carle, & Schieber, 2010; Li, He, & Fang, 2007) as a non-destructive and fast method, has been widely used in the eld of agricultural products as well as in food chemistry. The NIR region contains information concerning the relative proportions of CAH, NAH and OAH bonds which are the primary structural components of organic molecules (Cozzolino et al., 2004; Xie, Ye, Liu, & Ying, 2009). Several reports are available on the use of NIRS techniques for variety discrimination, ranging from the origins of wine (Liu et al., 2008; Yu, Zhou, Fu, Xie, & Ying, 2007), to the brands of instant noodles (Liu & He, 2008), the types of different citrus oil (Steuer, Schulz, & Lger, 2001) and the varieties of tea (He, Li, & Deng,

556

W. Luo et al. / Food Chemistry 128 (2011) 555561

2007), wheat (Miralbs, 2008), and oranges (Cen, He, & Huang, 2007). However, there were few papers on the discrimination of apple fruits using near infrared spectroscopy. Therefore, it is of great importance to set up a rapid, sensitive and robust discriminant analysis model for identifying different types of apple samples. The work undertaken in this current study was designed to examine the feasibility of using near-infrared bre-optic technology (Pedreschi, Segtnan, & Knutsen, 2010; Yu et al., 2009), in combination with K-nearest neighbour (KNN), partial least-squares discriminate analysis (PLSDA) and moving window partial least-squares discriminant analysis (MWPLSDA) (Fu et al., 2007, 2009; Jiang, Berry, Siesler, & Ozaki, 2002), to distinguish apples of different geographical origins, grades and species. The obtained results show that MWPLSDA offers superior performance than that of the KNN and PLSDA. 2. Method and principle 2.1. KNN KNN (Amendolia et al., 2003; Holmstrm, 2002; Yu, Cui, Wang, & Su, 2007) is a non-parametric classication method that has been widely used in Chemistry. An unknown sample of the prediction set is classied by a majority vote of its neighbours in the training set, an appropriate value of k has a great inuence on the identication rate of KNN model, and optimal k value is determined by cross-validation in calibration set. The value of k which gives the lowest error rate is chosen. Typically k is an integer less than 10. Even if the system studied is linear inseparable, this method can still be feasible. 2.2. PLSDA Partial least-squares discriminant analysis (PLSDA) (Fu et al., 2009) is a linear-regression method implied in the eld of classication. For model coding, vector fj is used to encode samples category in the jth group, in which the element of the jth position is 1 and the other elements are 0. And the types of unknown samples can be identied according to the classication vectors of the classication matrix. If the maximal element of the ith sample appears at the jth position of the classication vector, the sample will be classied as the jth group. 2.3. MWPLSDA Considering the continuity of spectral data measured by near-infrared spectroscopy, MWPLSDA is used to identify the continuous spectral intervals which are highly relevant to the classication. Namely, if a certain wavelength variable contains useful information, the adjacent wavelengths will also contain useful information. Analogously, if a wavelength is contaminated by non-chemical composition related factors, the wavelength intervals around the channel will also be interfered by these factors. So selection of reasonable wavelength intervals is the preconditions for building a stable discriminant model. In MWPLSDA, a suitable window with the width of H is built and moves along the whole spectra to select useful wavelength intervals, and then using the selected spectral intervals to build PLSDA models. The minimal model complexity and preferable capacity are the basis of selection. The window starts at the ith wavelength, and ends at the (i + H 1)th wavelength. The ith is dened as the position of the window. The spectra obtained in the continuous moving spectral window are series of sub-matrix. Each sub-matrix contains the ith to the (i + H 1)th columns of the training matrix. PLS sub-models

with different model dimensions are established to relate the spectra in these windows. Subsequently, the sums of squared residues (SSR) are calculated for each sub-models. Then, spectral intervals with lower classication errors and less latent variables are selected and combined to develop the nal MWPLSDA model. 3. Experimental 3.1. Samples The apples for experiment were obtained from six local markets and marked at the same day and a total of 500 apple samples were collected. There were four varieties of apples samples (200 Fuji, 200 New Jonagold, 50 Red Star, 50 Ralls Janet samples, respectively), were collected. Two hundred Fuji samples were collected from the four geographical origins (Japan, Shanxi, Shandong, Hebei in China), the number of apple samples in each geographical origin was 50 and all Fuji apples are First grade. Two hundred New Jonagold samples from Shandong in China were composed of 50 Special grade, 50 First grade, 50 Second grade, 50 substandard grade samples, respectively(according to the national standard GB10651-89, apples are divided into three grades: Special grade, First grade, Second grade. In this paper the rest worse apples were considered as Substandard level). They were placed in airtight polyethylene bags and stored in an ice lled refrigerator to keep at cold temperature (4 1 C). After storage, the fruit were brought up to room temperature (25 C) before analysis. 3.2. Instrumentation In this experiment, apple samples were scanned in diffuse reectance mode using a spectrometer FT-NIR (Thermo Nicolet Nexus 870, Thermo Fisher Scientic Inc, MA, USA) which was equipped with an InGaAs detector and a Smart Near-IR Fiber Port Accessory. NIR spectra were collected using OMNIC software. 3.3. Acquisition of NIR data All apple samples were put on a xed place and prepared for NIRS analysis from 4000 to 10,000 cm1. The resolution was 8 cm1 in this work. Each spectrum consisted in an average of 50 successive scans. For each fruit, ve reection spectra were taken from ve equidistant positions and three obtained spectra were averaged at exactly the same position. The spectra collected from the samples were randomly split into three groups, named A, B and C, respectively. Group A was the Fuji apples from different geographical origins: Af1 (Japan, 001-050), Af2 (Shanxi, China, 051-100), Af3 (Shandong, China, 101-150), Af4 (Hebei, China, 151-200). Group B was New Jonagold apples of different grades, Bf1 (Special grade, 001-050), Bf2 (First grade, 051-100), Bf3 (Second grade, 101-150), Bf4 (Substandard grade, 151-200). Group C was apple samples of different varieties: Cf1 (Fuji, 001-050), Cf2 (New Jonagold, 051100), Cf3 (Red Star, 101-150), Cf4 (Ralls Janet, 150-200). 3.4. Data processing KNN, PLSDA and MWPLSDA programs used for spectral data processing and analysis were performed on the Matlab environment (MathWorks, Natick, MA, USA). More than 60% of the samples were taken randomly as the training set according to the best training results and the optimal proportions of the training sets in all the samples were selected. The 200 spectra of Fuji apples in Group A were randomly divided into two sets, while training sets containing 128 spectra (Af1: 29, Af2: 30, Af3: 36, Af4: 33),

W. Luo et al. / Food Chemistry 128 (2011) 555561

557

and the remaining 72 were used as prediction sets (Af1: 21, Af2: 20, Af3: 14, Af4: 17). Group B was composed of New Jonagold apples of different grades. All the 200 spectra were randomly split into two groups, 150 spectra were used for training sets (Bf1: 39; Bf2: 39; Bf3: 35; Bf4: 37) and the rest 50 were used for prediction sets (Bf1: 11; Bf2: 11; Bf3: 15; Bf4: 13). Similarly, in the Group C, among the 200 spectra of apples samples of different varieties, 148 spectra were used for training sets (Cf1: 38, Cf2: 38, Cf3: 34, Cf4: 38) and the remaining 52 were used for prediction sets (Cf1: 12, Cf2: 12, Cf3: 16, Cf4: 12). 4. Results and discussion 4.1. Discrimination results of Fuji apple samples from different geographical origins (Japan, Shanxi, Shandong and Hebei) 4.1.1. Near infrared reectance spectra Fig. 1 shows the diffuse reectance raw spectra of the Fuji apples from four geographical origins in Group A. There are many crossovers and overlapping among these samples. The peaks at about 4400 cm1 and 4700 cm1 are likely related to the second overtone of CAH deformation mode and the combination mode of OAH, NAH and CAH, respectively. The strong band cluster appeared at 5200 6500 cm1 is probably arising from the overtone of CAH, NAH and OAH stretching. The peak located between 7100 cm1 and 9000 cm1 might be associated with the second overtone of CAH, OAH and NAH stretching mode. (Ortiz-Somovilla, Espaa-Espaa, Gaitn-Jurado, Prez-Aparicio, & De Pedro-Sanz, 2007; Zhou et al., 2006). By examination of the raw reectance spectra of Fuji apple samples from different geographical origins, no signicant spectral differences are observed, which means the spectra are very similar. Therefore, it is difcult to discriminate the geographical origins directly based on diffuse reectance raw spectra. It is necessary to construct some efcacious pattern recognition model such as KNN, PLSDA and MWPLSDA for discrimination of the samples geographical origins. 4.1.2. Classication by KNN, PLSDA, MWPLSDA model The K-nearest neighbour (KNN) is the simplest method of all non-parametric classication algorithms. The method has several advantages (Berrueta, Alonso-Salces, & Hberger, 2007): it is mathematical simplicity and is free from statistical assumptions, in addition, its effectiveness does not depend on the space distribution of the classes. The choice of k is optimised by calculating the prediction ability with different k values. Small k values (3 or 5)

are to be preferred frequently. In this research, the best k parameter was found to be 3, so we used this xed k value in the test for classication. All samples in training sets were identied correctly, however, 20 samples were wrongly classied in the prediction sets. (Berrueta et al., 2007) KNN cannot work well if each class of samples present large differences. KNN provides poor information about the structure of the classes and the relative importance of each variable in the classication. Moreover, it does not allow a graphical representation of the results and large number of samples, so the computation will become excessively slow. In this paper, since KNN method for the feature clustering of intra-class sample homogeneity and inter-class sample heterogeneity is vague, the classication accuracy rate is not ideal. Then, PLSDA, a linear regression method based on the information of the whole wavelengths, was used to relate the dummy group codes with all the spectral variables. Herein, the dummy codes for the four groups can be dened as Af1 (1, 0, 0, 0), Af2 (0, 1, 0, 0), Af3 (0, 0, 1, 0) and Af4 (0, 0, 0, 1). Firstly, the optimal number of latent variables has to be determined. Too many variables will lead to over-tting, but too few will induce lack of tting. The optimal latent variable was identied as eight by sixfold cross-validation. The dummy code Af1 (1, 0, 0, 0) represented 29 training and 21 prediction samples from the (001-050) samples; Af2 (0, 1, 0, 0) represented 30 training and 20 prediction samples from (051-100) samples; Af3 (0, 0, 1, 0) represented 36 training and 14 prediction samples from (101-150) samples; Af4 (0, 0, 0, 1) represented 33 training and 17 prediction samples from (150-200) samples. The 128 samples in training set were divided into six subsets, where ve subsets contain 21 samples each and the remained one contains 23 samples. All the samples in the training set were identied accurately. However, two samples in the prediction set were wrongly classied. The unsatisfactory predictive accuracy of PLSDA in this study might due to the interference and overlapping of some uninformative spectral variables. In addition, the spectra contain thousands of wavelengths, which add to the computation burden, thus the classication is very time-consuming. Then, MWPLSDA algorithm based on the principle of minimum sums of squared residues (SSR) was performed as an attempt to extract and visualise the main information in multivariate data to examine qualitative differences between the four kinds of samples. The window size of the experiment was selected as 40. Residue lines obtained by MWPLS for training sets are shown in Fig. 2. From the gure, one can see that when the window located in the spectral intervals of 51005500 cm1and 71007450 cm1, the PLS models achieved the least complexity and low SSR. And then a

Fig. 1. NIR spectra of 200 samples of four kinds of apples from different origins.

558

W. Luo et al. / Food Chemistry 128 (2011) 555561

Fig. 2. Residue line obtained by MWPLS for the training sets in near-infrared spectroscopy.

MWPLSDA model was established by combining the selected intervals. MWPLSDA can obtain simple and optimal model by greatly reducing the number of model variables, which will make an improvements in prediction accuracy and the computation velocity. MWPLSDA makes use of dummy codes to associate with the selected spectral variables, and it use classication error rather than correcting errors to establish spectral window, thus spectral intervals of lower classication errors and less latent variables are selected out and assembled. The optimum number of the latent variable for the MWPLSDA model was determined to be eight by sixfold cross-validation. The denition of the dummy codes is similar to that of PLSDA. The apple samples from four different producing areas were classied by the position of the maximal dummy codes of the samples. The plots of dummy codes of the training and prediction sets are shown in Fig. 3(a) and (b), respectively. As shown in Fig. 3, four kinds of samples from different areas in the training sets were correctly classied, and only one sample in

the prediction sets was wrongly classied. The total accuracy rate reached 98.61%. MWPLSDA can eliminate interference of useless information variables and non-composition related factors, so the classication accuracy can be improved. 4.2. The classication of apple samples of different grades KNN, PLSDA and MWPLSDA were used to construct models for discrimination of New Jonagold apple samples of different grades. This was done in order to investigate whether the different grades of samples may have an inuence on the discrimination. To estimate properly the prediction ability of the built models, a crossvalidation of the classication rules proposed by these methods was performed, by dividing the data obtained from 200 samples into a training set and a prediction set. The samples in Group B were assigned randomly to a training set, consisting of 150 samples (Bf1: 39, Bf2: 39, Bf3: 35, Bf4: 37), and the test set, composed of the remaining 50 samples (Bf1: 11; Bf2: 11; Bf3: 15; Bf4: 13).

Fig. 3. Dummy vectors (f1, f2, f3, f4) of MWPLSDA for the samples: (a) 128 training samples and (b) 72 prediction samples.

W. Luo et al. / Food Chemistry 128 (2011) 555561

559

Such a division allows a sufcient number of samples in the training set and a representative number of members among the prediction set. When KNN was applied, the training set of four grades of apples was identied accurately, but the prediction sets suffered from overlapping and confusion. Ten samples were wrongly classied in the prediction sets. The result indicates that KNN has a good learning performance, but it would be less effective to classify unknown samples if each class of samples presents large differences. In PLSDA and MWPLSDA classication, the dummy codes for the four groups of samples were dened as Bf1 (1, 0, 0, 0), Bf2 (0, 1, 0, 0), Bf3 (0, 0, 1, 0), and Bf4 (0, 0, 0, 1). The optimum number of the latent variables model was determined as eight by cross-validation. Cross-validation used vefold cross-validation. In PLSDA method classication, all the samples were identied correctly in the training sets. However, in the prediction sets four samples were identied inaccurately. There present several defects in PLSDA (Fu et al., 2007): the computation speed become excessively slowly if a large number of samples were put in the training set, especially, for data sets composed of thousands of spectra. Furthermore, the classication accuracy might be inuenced by interference of wavelengths with uninformative data and overlapping NIR bands. Then MWPLSDA is used for classication, as illustrated in Fig. 4(a) and (b), all the four grades samples in the training sets were accurately classied, and in the prediction sets only two samples were wrongly classied. Useless information and overlapping spectral region for classication can be eliminated. Useful wavelength intervals are selected to construct a simple and efcacious classication model. So it could achieve excellent prediction ability. The results obtained reveal that MWPLSDA is the method of choice for identify apple samples of different grades compared to

the traditional pattern recognition methods such as KNN and PLSDA. It is a feasible and promising method for quality control and discrimination analysis of apple. 4.3. Apple varietal distinction In this work, three pattern recognition algorithms were also attempted to classify apple samples of different varieties. Parameters of these models were also optimised. In Group C, all samples were divided into two subsets. The training set was used to build model, and the prediction set was used to test the robustness of model. The training set contained 148 samples (Cf1: 38, Cf2: 38, Cf3: 34, Cf4: 38), and the remaining 52 samples constituted the prediction set (Cf1: 12, Cf2: 12, Cf3: 16, Cf4: 12). KNN method can work well if the samples are present slight differences in each class. In KNN classication, four varieties of apple samples in the training sets were identied accurately. However, 7 samples in the testing sets were wrongly classied. For PLSDA and MWPLSDA, four different varieties of samples were coded as Cf1 (1, 0, 0, 0), Cf2 (0, 1, 0, 0), Cf3 (0, 0, 1, 0), Cf4 (0, 0, 0, 1). The 148 training sets samples were divided into ve groups, four groups had 30 samples each, another group had 28 samples. The optimum number of the latent variables was calculated to be 8 by vefold cross-validation. For the PLSDA method, all samples in training sets were identied correctly, however, two samples in prediction sets were identied inaccurately. Perhaps the main reason was that PLSDA was disturbed by some uninformative variables in the whole spectral range, so the effective feature extraction of different category samples were affected and classication accuracy decreased. For MWPLSDA, as shown in Fig 5(a) and (b), four kinds of apple samples in training sets were

Fig. 4. Dummy vectors (f1, f2, f3, f4) of MWPLSDA for the samples: (a) 150 training samples and (b) 50 prediction samples.

560

W. Luo et al. / Food Chemistry 128 (2011) 555561

Fig. 5. Dummy vectors (f1, f2, f3, f4) of MWPLSDA for the samples: (a) 148 training samples and (b) 52 prediction samples.

Table 1 Comparison of forecasting results of KNN, PLSDA, and MWPLSDA. Data sets Group A Bb Cc
a B c a

The number of wrong discrimination for prediction KNN 20 10 7 PLSDA 2 4 2 MWPLSDA 1 2 1

Accuracy/% KNN 72.22 80.00 86.54 PLSDA 97.22 92.00 96.15 MWPLSDA 98.61 96.00 98.08

Fuji apples from different geographical origins. New Jonagold samples of different grades. Apples of different varieties.

identied correctly, and in the prediction sets only one sample was identied inaccurately. MWPLSDA can benet from less complexity and lower training errors for classication problems of spectral data with continuous variables, it can not only remove useless information variables which have no relationship with classication, but also t the interference introduced by non-linear components. This result demonstrates that the proposed MWPLSDA method coupled with NIR could be a very promising method for analysis of apple samples of different varieties. Some information about the predicting results is listed in Table 1. The total accuracy rates obtained using MWPLSDA were 98.61%, 96.00% and 98.08% for apples samples in Group A, B and C, respectively. Compared with the conventional classication methods, MWPLSDA shows its superior ability in solving classication problem. Therefore, in this work, MWPLSDA model got better performance in discrimination of apple samples. It can be con-

cluded that MWPLSDA method is an excellent method which can extract the ngerprint information of different species of apples, without determining single-component or multi-components as a measurement to obtain more reliable discriminant results.

5. Conclusions In this paper, three traditional pattern recognition methods KNN and PLSDA are applied to identify apples of different geographical origins, grades and varieties. Compared with KNN, PLSDA and MWPLSDA from the predicted results, MWPLSDA is more suitable for classifying the apple fruit. The results shows that MWPLSDA is not only able to overcome the interference of many non-component-related factors and inefcient information, but also it can extract essential attribute of substance category

W. Luo et al. / Food Chemistry 128 (2011) 555561

561

effectively from chemical measured data, then identify the material category. So this method can give a more robust recognition performance with high identication accuracy and rapid recognition rate. Acknowledgement The work was nancially supported by the National Natural Science Foundation of China (Grant Nos. 20605007, 20775025) and 973 National Key Basic Research Program of China (2007CB310500, 2007CB216404). References
Amendolia, S. R., Cossu, G., Ganadu, M. L., Golosio, B., Masala, G. L., & Mura, G. M. (2003). A comparative study of K-nearest neighbour, support vector machine and multi-layer perceptron for thalassemia screening. Chemometrics and Intelligent Laboratory Systems, 69, 1320. Berrueta, L. A., Alonso-Salces, R. M., & Hberger, K. (2007). Supervised pattern recognition in food analysis. Journal of Chromatography A, 1158, 196214. Bobelyn, E., Serban, A. S., Nicu, M., Lammertyn, J., Nicolai, B. M., & Saeys, W. (2010). Postharvest quality of apple predicted by NIR-spectroscopy: Study of the effect of biological variability on spectra and model performance. Postharvest Biology and Technology, 55, 133143. Cen, H. Y., He, Y., & Huang, M. (2007). Combination and comparison of multivariate analysis for the identication of orange varieties using visible and near infrared reectance spectroscopy. European Food Research and Technology, 225, 699705. Clark, C. J., McGlone, V. A., & Jordan, R. B. (2003). Detection of brownheart Braeburn apple by transmission NIR spectroscopy. Postharvest Biology and Technology, 28, 8796. Cozzolino, D., Kwiatkowski, M. J., Parker, M., Cynkar, W. U., Dambergs, R. G., Gishen, M., et al. (2004). Prediction of phenolic compounds in red wine fermentations by visible and near infrared spectroscopy. Analytica Chimica Acta, 513, 7380. Fu, H. Y., Huan, S. Y., Lu, X., Jiang, J. H., Wu, H. L., Shen, G. L., et al. (2009). Construction of an efcacious model for a nondestructive identication of traditional Chinese medicines Liuwei Dihuang Pills from different manufacturers using near-infrared spectroscopy and moving window partial least-squares discriminant analysis. Analytical Sciences, 25, 11431148. Fu, H. Y., Huan, S. Y., Lu, X., Tang, L. J., Jiang, J. H., Wu, H. L., et al. (2007). Moving window partial least-squares discriminant analysis for identication of different kinds of bezoar samples by near infrared spectroscopy and comparison of different pattern recognition methods. Near infrared spectroscopy, 15, 291297. Han, D. H., Tu, R. L., Lu, C., Liu, X. X., & Wen, Z. H. (2006). Nondestructive detection of brown core in the Chinese pear Yali by transmission visibleNIR spectroscopy. Food Control, 17, 604608. He, Y., Li, X. L., & Deng, X. F. (2007). Discrimination of varieties of tea using near infrared spectroscopy by principal component analysis and BP model. Journal of Food Engineering, 79, 12381242. Holmstrm, H. (2002). Estimation of single-tree characteristics using the kNN method and plotwise aerial photograph interpretations. Forest Ecology and Management, 167, 303314. Hyson, D., Studebaker-Hallman, D., Davis, P. A., & Gershwin, M. E. (2000). Apple juice consumption reduces plasma low-density lipoprotein oxidation in healthy men and women. Journal of Medicinal Food, 3, 159166. Jiang, J. H., Berry, R. J., Siesler, H. W., & Ozaki, Y. (2002). Wavelength interval selection in multicomponent spectral analysis by moving window partial leastsquares regression with applications to mid-infrared and near-infrared spectroscopic data. Analytical Chemistry, 74, 35553565.

Karlsen, A. M., Aaby, K., Sivertsen, H., Baardseth, P., & Ellekjr, M. R. (1999). Instrumental and sensory analysis of fresh Norwegian and imported apples. Food Quality and Preference, 10, 305314. Kim, J., Mowat, A., Poole, P., & Kasabov, N. (2000). Linear and non-linear pattern recognition models for classication of fruit from visible-near infrared spectra. Chemometrics and Intelligent Laboratory Systems, 51, 201216. Knekt, P., Kumpulainen, J., Jarvinen, R., Rissanen, H., Heliovaara, M., Reunanen, A., et al. (2002). Flavonoid intake and risk of chronic diseases. American Journal of Clinical Nutrition, 76, 560568. Kurz, C., Leitenberger, M., Carle, R., & Schieber, A. (2010). Evaluation of fruit authenticity and determination of the fruit content of fruit products using FTNIR spectroscopy of cell wall components. Food Chemistry, 119, 806812. Li, X. L., He, Y., & Fang, H. (2007). Non-destructive discrimination of Chinese bayberry varieties using Vis/NIR spectroscopy. Journal of Food Engineering, 81, 357363. Liu, F., & He, Y. (2008). Classication of brands of instant noodles using Vis/NIR spectroscopy and chemometrics. Food Research International, 41, 562567. Liu, L., Cozzolino, D., Cynkar, W. U., Dambergs, R. G., Janik, L., ONeill, B. K., et al. (2008). Preliminary study on the application of visible-near infrared spectroscopy and chemometrics to classify Riesling wines from different countries. Food Chemistry, 106, 781786. Miralbs, C. (2008). Discrimination of European wheat varieties using near infrared reectance spectroscopy. Food Chemistry, 106, 386389. Ortiz-Somovilla, V., Espaa-Espaa, F., Gaitn-Jurado, A. J., Prez-Aparicio, J., & De Pedro-Sanz, E. J. (2007). Proximate analysis of homogenized and minced mass of pork sausages by NIRS. Food Chemistry, 101, 10311040. Pedreschi, F., Segtnan, V. H., & Knutsen, S. H. (2010). On-line monitoring of fat, dry matter and acrylamide contents in potato chips using near infrared interactance and visual reectance imaging. Food Chemistry, 121, 616620. Steuer, B., Schulz, H., & Lger, E. (2001). Classication and analysis of citrus oils by NIR spectroscopy. Food Chemistry, 72, 113117. Vinson, J. A., Su, X., Zubik, L., & Bose, P. (2001). Phenol antioxidant quantity and quality in foods: fruits. Journal of Agricultural and Food Chemistry, 49, 53155321. Wu, J. H., Gao, H. Y., Zhao, L., Liao, X. J., Chen, F., Wang, Z. F., et al. (2007). Chemical compositional characterization of some apple cultivars. Food Chemistry, 103, 8893. Xie, L. J., Ye, X. Q., Liu, D. H., & Ying, Y. B. (2009). Quantication of glucose, fructose and sucrose in bayberry juice by NIR and PLS. Food Chemistry, 114, 11351140. Xing, J., Landahl, S., Lammertyn, J., Vrindts, E., & Baerdemaeker, J. D. (2003). Effects of bruise type on discrimination of bruised and nonbruised Golden Delicious apples by VIS/NIR spectroscopy. Postharvest Biology and Technology, 30, 249258. Yamada, H., Takechi, K., Hoshi, A., & Amano, S. (2004). Comparison of water relations in watercored and non-watercored apples induced by fruit temperature treatment. Scientia Horticulturae, 99, 309318. Yu, C., Cui, B., Wang, S. G., & Su, J. W. (2007). Efcient index-based KNN join processing for high-dimensional data. Information and Software Technology, 49, 332344. Yu, H. Y., Niu, X. Y., Lin, H. J., Ying, Y. B., Li, B. B., & Pan, X. X. (2009). A feasibility study on on-line determination of rice wine composition by VisNIR spectroscopy and least-squares support vector machines. Food Chemistry, 113, 291296. Yu, H. Y., Zhou, Y., Fu, X. P., Xie, L. J., & Ying, Y. B. (2007). Discrimination between Chinese rice wines of different geographical origins by NIRS and AAS. European Food Research and Technology, 225, 313320. Zhou, Y. P., Jiang, J. H., Wu, H. L., Shen, G. L., Yu, R. Q., & Ozaki, Y. (2006). Dry lm method with ytterbium as the internal standard for near infrared spectroscopic plasma glucose assay coupled with boosting support vector regression. Journal of Chemometrics, 20, 1321.