You are on page 1of 7

Summer of Science -

Machine Learning and Neural Networks

Manan Sharma Alok Bishoyi (Mentor)


Abstract

“Just as electricity transformed almost everything 100 years ago, today I actually have a hard time
thinking of an industry that I don’t think AI (Artificial Intelligence) will transform in the next
few years.”
-Andrew NG

Nowadays, large amount of data is available everywhere. Therefore, it is very


important to analyse this data to extract some useful information and to develop
an algorithm based on this analysis. This can be achieved through data mining
and machine learning. Machine learning is an integral part of artificial
intelligence, which is used to design algorithms based on the data trends and
historical relationships between data. Machine learning is used in various fields
such as bioinformatics, intrusion detection, Information retrieval, game playing,
marketing, malware detection, image deconvolution and so on.

Machine learning has evolved from the study of computational learning theory
and pattern recognition. It is the most effective method used in the field of data
analytics to predict something by devising some models and algorithms. These
analytical models allow researchers, engineers, data scientists and analysts to
produce reliable and valid results and decisions. It also helps to discover some
hidden patterns or features through historical learning's and trends in data.
Feature selection is the most important task of machine learning. Model is
created based on the results gathered from the training data that is why machine-
learning algorithms are non-interactive. It studies the past observations to make
precise predictions. It is a very difficult task to make an accurate prediction rule
based on which algorithm can be developed.
 Mathematical Notations Used

Vectors are denoted by lower case bold Roman letters such as x, and all vectors are
assumed to be column vectors. A superscript T denotes the transpose of a matrix or
vector, so that 𝑥 𝑇 will be a row vector. Uppercase bold roman letters, such as M,
denote matrices. The notation (w1, . . ., wM) denotes a row vector with M elements,
while the corresponding column vector is written as w = (w1, . . ., wM)T.

The notation [a, b] is used to denote the closed interval from a to b, that is the interval
including the values a and b themselves, while (a, b) denotes the corresponding open
interval, that is the interval excluding a and b. Similarly, [a, b) denotes an interval
that includes a but excludes b. For the most part, however, there will be little need to
dwell on such refinements as whether the end points of an interval are included or
not.

The M × M identity matrix (also known as the unit matrix) is denoted IM, which will
be abbreviated to I where there is no ambiguity about it dimensionality. It has
elements Iij that equal 1 if i = j and 0 if i ≠ j.

The notation g(x) = O(f(x)) denotes that |f(x)/g(x)| is bounded as x→∞. For instance,
if g(x) = 3x2 + 2, then g(x) = O(x2).

The expectation of a function f(x, y) with respect to a random variable x is denoted by


Ex[f(x, y)]. In situations where there is no ambiguity as to which variable is being
averaged over, this will be simplified by omitting the suffix, for instance E[x]. If the
distribution of x is conditioned on another variable z, then the corresponding
conditional expectation will be written Ex[f(x)|z]. Similarly, the variance is denoted
var[f(x)], and for vector variables the covariance is written cov[x, y]. We shall also use
cov[x] as a shorthand notation for cov[x, x].

If we have N values x1, . . . , xN of a D-dimensional vector x = (x1, . . . , xD)T, we can


combine the observations into a data matrix X in which the nth row of X corresponds
to the row vector xTn. Thus, the n, i element of X corresponds to the ith element of the
nth observation xn. For the case of one-dimensional variables, we shall denote such
a matrix by x, which is a column vector whose nth element is xn. Note that x (which
has dimensionality N) uses a different typeface to distinguish it
from x (which has dimensionality D).
1
Introduction

The science of learning plays a key role in the fields of machine learning, statistics,
datamining and artificial intelligence, intersecting with areas of engineering and other
disciplines. Here are some examples of learning problems:

• Predict whether a patient, hospitalized due to a heart attack, will have a second
heart attack. The prediction is to be based on demographic, diet and clinical
measurements for that patient.
• Predict the price of a stock in 6 months from now, based on company
performance measures and economic data.
• Identify the numbers in a handwritten ZIP code, from a digitized image.
• Estimate the amount of glucose in the blood of a diabetic person, from the
infrared absorption spectrum of that person’s blood.
• Identify the risk factors for prostate cancer, based on clinical and
demographic variables.

In a typical scenario, we have an outcome measurement, usually quantitative (such


as a stock price) or categorical (such as heart attack/no heart attack), that we wish
to predict based on a set of features (such as diet and clinical measurements). We
have a training set of data, in which we observe the outcome and feature
measurements for a set of objects (such as people). Using this data, we build a
prediction model, or learner, which will enable us to predict the outcome for new
unseen objects. A good learner is one that accurately predicts such an outcome.
The examples above describe what is called the supervised learning problem. It is
called “supervised” because of the presence of the outcome variable to guide the
learning process. In the unsupervised learning problem, we observe only the features
and have no measurements of the outcome. Our task is rather to describe how the
data are organized or clustered.

2
Supervised Learning

In supervised learning, we are given a data set and already know what our correct
output should look like, having the idea that there is a relationship between the input
and the output.
Supervised learning problems are categorized into "regression" and "classification"
problems. In a regression problem, we are trying to predict results within a
continuous output, meaning that we are trying to map input variables to some
continuous function. In a classification problem, we are instead trying to predict
results in a discrete output. In other words, we are trying to map input variables into
discrete categories.
Example:
(a) Regression - Given a picture of a person, we have to predict their age on the basis of the
given picture,
(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is
malignant or benign.

Example (Email Spam) (Reference 1):


The data for this example consists of information from 4601 email messages, in a study to try to
predict whether the email was junk email, or “spam.” The objective was to design an automatic
spam detector that could filter out spam before clogging the users’ mailboxes. For all 4601 email
messages, the true outcome (email type) email or spam is available, along with the relative frequencies
of 57 of the most commonly occurring words and punctuation marks in the email message. This is
a supervised learning problem, with the outcome the class variable email/spam. It is also called a
classification problem.
Our learning method has to decide which features to use and how: for example, we might use a
rule such as if (%George < 0.6) & (%you > 1.5) then spam else email
Now we’ll document two simple but powerful prediction methods: the linear model
fit by least squares and the k-nearest-neighbour prediction rule. The linear model
makes huge assumptions about structure and yields stable but possibly inaccurate
predictions. The method of k-nearest neighbours makes very mild structural
assumptions: its predictions are often accurate but can be unstable.

2.1 Method of Least Squares


3
Unsupervised Learning

You might also like