You are on page 1of 1

Using Decision Trees to Classify the “Undefined”

Galaxies from Galaxy Zoo 1


1a 1 1 1
P. H. Barchi R. Sautter R.R. Rosa R.R. de Carvalho
1
National Institute for Space Research (INPE), São José dos Campos, SP, Brazil
a
paulobarchi@gmail.com

1. OVERVIEW 3. CLASSIFICATION USING DECISION TREE


In observational research, the most basic process is the classification of objects Classified galaxies from GZ1 were used to train our model. For k ≥ 5, there are
into a taxonomy system. The challenge is to build a robust methodology to 17,101 ETGs and 98,750 LTGs; For k ≥ 10, there are 11,267 ETGs and 66,023
perform a realiable classification. In astrophysics, galaxies can be separated into LTGs; For k ≥ 20, there are 7,541 ETGs and 27,365 LTGs. Figure 4 shows a
two groups: early-type galaxies (ETGs), galaxies with dominant bulge compo- spectroscopic validation for the “undefined” galaxies from GZ1 classified by our
nent; and, late-type galaxies (LTGs) with a prominent disk. Galaxy Zoo 1 (GZ1) methodology. The parameters used in these histograms come from spectroscopy,
is a citizen science project open to community to perform this distinction. We so not related to the data used in building up the classifications. The number of
consider GZ1 classification as true label in this work. We present a methodology galaxies for each histogram bellow are as follows: For k ≥ 5, 13,373 ETGs and
to distinguish ETGs from LTGs based on a traditional machine learning approach 87,095 LTGs; For k ≥ 10, 9,030 ETGs and 59,096 LTGs; For k ≥ 20, 6,390
(Figure 1). CyMorph is the non-parametric galaxy morphology system that ex- ETGs and 24,988 LTGs. With this approach, we classifyed 100,832 “undefined”
tracts features from galaxies: Concentration, Asymmetry, Smoothness, Entropy galaxies from GZ1 – Figure 5 presents a sample.
and Gradient Pattern Analysis (GPA) – see Section 2. With the output from
these features, we apply supervised machine learning methods. Here, we focus
on Decision Tree, a method that adjusts the model through the learning process
to predict the classification by simple decision rules inferred from the dataset.

Figure 1: General schema proposed to morphologically classify galaxies into early- and late-type.

The data sample is composed of galaxies in r-band from SDSS-DR7 (Sloan


Digital Sky Survey - Data Release 7) in the redshift range 0.03 < z < 0.1,
Petrosian magnitude in r-band brighter than 17.78 (spectroscopic magnitude
limit), and |b| ≥ 30o , where b is the galactic latitude, and a restriction
related to minimum threshold of the area of the galaxy’s petrosian ellipse –
 2
k = F WR P
HM/2
, where R P is the Petrosian radius and F W HM is the
full width at half maximum. For k ≥ 5, there are 239,833 galaxies from SDSS,
and 104,787 galaxies with defined GZ1 classification; for k ≥ 10, 175,167
galaxies from SDSS, and 89,829 galaxies with defined GZ1 classification; for Figure 4: Spectroscopic validation for the “undefined” galaxies from Galaxy Zoo 1 classified by
k ≥ 20, 96,787 galaxies from SDSS, and 58,030 galaxies with defined GZ1 this methodology (ETGs in red; LTGs in blue).
classification. We worked with these three samples and achieved 95% Overall
Accuracy (OA) for k ≥ 5; 96% OA for k ≥ 10; and, 99% OA for k ≥ 20.

2. NON-PARAMETRIC GALAXY MORPHOLOGY


The non-parametric galaxy morphology met-
rics are explained as follows. Concentration:
ratio of the circular radii containing 75% and
35% of the Petrosian flux of the galaxy. Asym-
metry: correlation between an image and its
π-rotated variant. Smoothness: correlation
between an image and its smoothed variant.
Entropy: used to quantify the distribution of
pixel values in the image. GPA: estimative of
the local gradient properties of a set of points.
Figure 2: Pre-processing example: Beyond CyMorph metrics, Geometric His-
from original field of view (a), the togram Separation (δGHS ) objectively mea-
stamp is cut (b), cleaned (c) and sures the separation of a binomial distribution.
the galaxy is segmented (d).

Figure 5: Sample classified as ETGs (2 top rows) and LTGs (2 bottom rows) by this approach.

REFERENCES
Barchi, P. H. et al. 2016, Journal of Computational Interdisciplinary Sciences (JCIS), vol. 7, issue 3, paper 144,
DOI: 10.6062/jcis.2016.07.03.0114.
Sautter, R. & Barchi, P. H. 2017, Journal of Computational Interdisciplinary Sciences (JCIS), vol. 8, issue 1,
paper 121, DOI: 10.6062/jcis.2017.08.01.0121.
Rosa, R. R. et al. 2018, MNRAS, vol. 477, issue 1, p. L101-L105, https://doi.org/10.1093/mnrasl/sly054.
Barchi, P. H. et al. 2018, in preparation.

ACKNOWLEDGEMENT
P.H.B. acknowledges PhD scholarship financial support from CAPES. R.R.dC. and R.R.R.
acknowledge financial support from FAPESP through grant # 2014/11156-4.
Figure 3: Results on non-parametric galaxy morphology features (ETGs in red; LTGs in blue).
Created with LATEXbeamerposter http://www-i6.informatik.rwth-aachen.de/~dreuw/latexbeamerposter.php

XLII Reuni~
ao Anual da Sociedade Astron^
omica Brasileira July 9th - 12th, 2018 - S~
ao Paulo - SP, Brazil

You might also like