Professional Documents
Culture Documents
BSCS-1A
Our world revolves for about 365 days around the sun that equals to a
year and rotates on its axis for about 24 hours that equals to a day. From this
measures of time, we do not usually notice how bytes of information are sent
and spread throughout the world and we do not know how large the data we
have and we used. The recent innovations of our technologies today brought
us to another world of speed and ampleness. From these huge numbers of
data, it will take a long time and process and more efforts to extract the
patterns out of it and gain vital information to be used in solving real life
problems. The whole world is in need of analysis and ideal approaches to
manage it. Furthermore, with the ever increasing amounts of data becoming
available, there is good reason to believe that smart data analysis will become
even more pervasive as a necessary ingredient for technological progress
(Smola & Vishwanathan, 2008).
Machine Learning is not that new today because it exist decades ago.
In the era of creation and building of machines that intelligent enough as
humans way back year 1950, an IBM researcher named Arthur Lee Samuels
developed a self-learning program for playing the game checkers. It is
considered by the world as one of the earliest machine learning programs.
The checker program play 10000 games against itself and work out which
board positions were good and bad depending on wins/losses. The program
gradually learns how to defeat itself by making the data of moves to win and
lose as its foundation. Then, he coined the term machine learning (Hurwitz
and Kirsch, 2018) and defined as a field of study that gives computers the
ability to learn without being explicitly programmed (Al Musawi, 2018). In the
year 1959, a paper was published in IBM Journal of Research and
Development explaining his approach about the concept of machine learning.
Supervised Learning
Practical Application
https://doi.org/10.1016/B978-0-12-801329-8.00016-7
Unsupervised Learning
A. Internet
Like in Google News where it groups news stories into cohesive groups.
The goal in unsupervised learning is generally to cluster the data into characteristically different
groups. Unsupervised machine learning is more challenging than supervised learning due to the
absence of labels.
Same data can be clustered into different groups depending upon the way clustering is done. If you
look at the below figure, 16 animals which were represented using 13 boolean features (appearance
and activity based) can be clustered into two ways depending upon whether appearance based
features were given more weights or activity based features.
The first partitioning, clusters them into mammals and birds while the other clusters them into
predators and preys. Both are equally meaningful and therefore it is up to the scientist to choose his
representation to obtain a desired clustering, which obviously is a challenging task.
Some widely known application of unsupervised learning is in market segmentation for targeting
appropriate customers, anomaly/fraud detection in banking sector, image segmentation, gene
clustering for grouping gene with similar expression levels, deriving climate indices based on
clustering of earth science data, document clustering based on content etc.
The application I would like to talk about is related to ecology which I recently came across.
Clustering techniques are being used to cluster the audio recordings captured through
microphone placed at selected places in the region of interest. These recordings are then
analyzed using unsupervised learning techniques to gauge the biodiversity, say number of
species of birds and animals, in the region of interest.
Semi-supervised Learning
For a really good algorithm to solve my above example, have a look at the
Co-training Algorithm. Especially the wiki; it's really concise and provides a
good overview. It basically uses two weak learners on a small amount of hand
labeled data, and then each classifier participates in training data expansion,
but labeling some of the unlabeled set.”