You are on page 1of 121

1

I N TRO DU C T IO N TO N E U R A L N E TWO R K S

François Duval

2
How to contact us
If you find any damage, editing issues or any other issues in this book contain please immediately
notify our customer service by email at:
contact@aisciences.net

Our goal is to provide high-quality books for your technical learning in computer science subjects.

Thank you so much for buying this book.

Your honest feedback would be greatly appreciated. It really does make a difference.
Click to the link below to write a quick review
https://www.amazon.com/dp/B07DGLYMLX

3
Preface

“For me, data science is a mix of three things: quantitative analysis (for the rigor necessary to understand your data), programming (so that you can process your
data and act on your insights), and storytelling (to help others understand what the data means).”
―Edwin Chen, Data Scientist and Blogger

The overall aim of this book is to give you an overview of the important concepts, methods and
techniques used in artificial neural networks.

Artificial neural networks are also generally referred to as neural networks. This signal processing
model is based on a biological neural network. The theory test observations and later experiments of
the central nervous system of the human brain system were the motivation for the development of
neural networks.

This book is about basic neural network architectures and the learning rules. Every effort has been
made to present the material in simple and regular manner so that it can be read and used without
difficulty. The book provides a general idea of artificial neural networks and questions their position
as the preferred tool of practitioners. Since this literature is written about neural networks in particular,
our choice of topic is guided by one standard. We want to present the most useful picture of neural
networks from the simple to a complex structure. Researchers from different disciplines are designing
artificial neural networks to solve the problems of pattern recognition, prediction, optimization,
associative memory, and control.

4
Book Objectives
If you are interested in learning more about machine learning with practical examples and application
with python, then this book is exactly what you need.

This book will help you:


• Have an appreciation for machine learning and an understanding of their fundamental principles.
• Have an elementary grasp of machine learning concepts and algorithms.
• Have achieve a technical background in machine learning and also deep learning

Who Should Read This?


• Anyone curious about machine learning but with zero programming knowledge
• People who want to demystify machine learning (it’s not magic and it’s probably not the end of
the world)
• Technical people who want to quickly gain knowledge in machine learning

Is this book for me?


If you want to smash machine learning algorithms with Python, this book is for you. Little
programming experience is required. If you already wrote a few lines of code and recognize basic
programming statements, you’ll be OK.

5
© Copyright 2016 by AI Sciences LLC
All rights reserved.
First Printing, 2016

Edited by Davies Company


Ebook Converted and Cover by Pixels Studio
Publised by AI Sciences LLC

ISBN-13: 978-1985134560
ISBN-10: 198513456X

The contents of this book may not be reproduced, duplicated or transmitted without the direct written
permission of the author.

Under no circumstances will any legal responsibility or blame be held against the publisher for any reparation,
damages, or monetary loss due to the information herein, either directly or indirectly.

6
Legal Notice:

You cannot amend, distribute, sell, use, quote or paraphrase any part or the content within this book without
the consent of the author.

Disclaimer Notice:

Please note the information contained within this document is for educational and entertainment purposes
only. No warranties of any kind are expressed or implied. Readers acknowledge that the author is not engaging
in the rendering of legal, financial, medical or professional advice. Please consult a licensed professional before
attempting any techniques outlined in this book.

By reading this document, the reader agrees that under no circumstances is the author responsible for any
losses, direct or indirect, which are incurred as a result of the use of information contained within this
document, including, but not limited to, errors, omissions, or inaccuracies.

7
To my wife Melanie and my two children Mariane and Thomas
You are my life and I love you so much!

8
From AI Sciences Publisher

Your honest feedback would be greatly appreciated. It really does make a difference.
Click to the link below to write a quick review
https://www.amazon.com/dp/B07DGLYMLX

9
10
Author Biography

François Duval is a seasoned data science expert and emerging author who has quickly earned the
reputation as an innovative leader in the information technology and data analysis space. Throughout
the course of the past decade, he has gained extensive firsthand experience in his field.
Currently, François is proudly serving as a scientific data consultant. He is also working on series of
books in statistics and computer science.

Prior to his present venture, he was on the IBM team for two years and is the founder of Intelligence
Data Consulting, which is a prominent company that provides data analysis consulting solutions in
France and throughout Europe. He has also worked as an engineer across multiple industries,
including but not limited to finance (KPMG within the Financial Advisory Department in London)
as well as environmental and healthcare research (INERIS and Mines ParisTech in France).
No matter what venture he takes on, François is on a lifelong mission to impart data analysis newbies
with the knowledge they need to better grasp various industry techniques. By effectively simplifying
such methodologies, up-and-coming professionals are empowered to create positive changes in the
world of computer science that will continue to have a ripple effect in the entire field as we know it.
Furthermore, François holds a PhD in Quantitative Analysis and Stochastic Calculus and an
Undergraduate Degree in Statistics & Data Analysis.

When he isn't immersed in his multifaceted career, François Duval enjoys spending quality family time
with his loved ones. Today, he happily resides in London, England and is the proud father of two
beautiful children.

11
Table of Contents

Preface .......................................................................................................................................4
From AI Sciences Publisher ......................................................................................................9
Author Biography .................................................................................................................... 11
Table of Contents .................................................................................................................... 12
Introduction to Artificial Neural Network .............................................................................. 17
A Brief History of Neural Network ................................................................................................................. 17
Artificial Neural Network vs. Biological Neural Network? ............................................................................ 17
Real – Biological Neurons ..................................................................................................................................... 17
Artificial Neurons ................................................................................................................................................... 18
What Is Artificial Neural Network? ........................................................................................ 21
Let Us Introduce ............................................................................................................................................. 21
Artificial Neural Network Layers .................................................................................................................... 21
a. Input Layer:.......................................................................................................................................................... 22
b. Hidden Layer ....................................................................................................................................................... 22
c. Output Layer ....................................................................................................................................................... 23
Structure of a Neural Network ....................................................................................................................... 23
Learning Process ............................................................................................................................................ 25
Supervised Learning ............................................................................................................................................... 25
Unsupervised Learning .......................................................................................................................................... 25
Reinforcement Learning ........................................................................................................................................ 25
Why Neural Networks? ........................................................................................................... 27
Let Us Introduce ............................................................................................................................................ 28
Fundamentals of Artificial Neural Networks ................................................................................................. 29
Network Topology ................................................................................................................................................. 29
Feed forward Network........................................................................................................................................... 29
Feedback Network.................................................................................................................................................. 31
Single Layer Recurrent Network .......................................................................................................................... 31
MultiLayer Recurrent Network ............................................................................................................................ 31
Activation Functions ...................................................................................................................................... 34
Linear Activation Function ................................................................................................................................... 34
Sigmoid Activation Function ................................................................................................................................ 34
Binary threshold signal function ........................................................................................................................... 35
Bipolar threshold signal function ......................................................................................................................... 35
Linear threshold (RAMP) signal function ........................................................................................................... 35
Adjustments of Weights or Learning.............................................................................................................. 35
Learning Paradigms ....................................................................................................................................... 36
Supervised Learning ............................................................................................................................................... 36
Unsupervised Learning .......................................................................................................................................... 37
Semi-Supervised Machine Learning ..................................................................................................................... 37
Reinforcement Learning ........................................................................................................................................ 38

12
Major Variants of Artificial Neural Network .......................................................................... 39
Multilayer Perceptron (MLP) ......................................................................................................................... 40
Activation Function ................................................................................................................................................ 41
Layers ........................................................................................................................................................................ 41
Learning .................................................................................................................................................................... 42
Terminology............................................................................................................................................................. 43
Applications ............................................................................................................................................................. 43
Convolutional neural networks....................................................................................................................... 44
The convolutional layer.......................................................................................................................................... 44
The Pooling Layer................................................................................................................................................... 46
The Output Layer ................................................................................................................................................... 47
Recurrent Neural Networks ........................................................................................................................... 47
Recurrent Neural Network Extensions .......................................................................................................... 49
Long Short-Term Memory .............................................................................................................................. 51
Deep Belief Networks .................................................................................................................................... 52
Deep Reservoir Computing ............................................................................................................................ 52

Tools and Technologies .......................................................................................................... 54


Major libraries ................................................................................................................................................ 54
OpenNN – Open Neural Network: .................................................................................................................... 54
Neural Network Libraries by Sony: ..................................................................................................................... 54
Theano – Latest verion: Theano 0.7 .................................................................................................................... 54
Torch - Torch | Scientific computing for LuaJIT. ........................................................................................... 54
Caffe – Caffe: A Deep Learning Framework ..................................................................................................... 55
TensorFlow – Link to detailed documentation of tensorflow -> https://www.tensorflow.org/............. 55
MXNet - MXNet with Documentation .............................................................................................................. 55
Keras ......................................................................................................................................................................... 55
Lasagne ..................................................................................................................................................................... 55
Blocks........................................................................................................................................................................ 55
Pylearn2 .................................................................................................................................................................... 55
DeepPy ..................................................................................................................................................................... 56
Deepnet .................................................................................................................................................................... 56
Gensim ..................................................................................................................................................................... 56
nolearn ...................................................................................................................................................................... 56
Passage ...................................................................................................................................................................... 56
The Microsoft Cognitive Toolkit(CNTK) .......................................................................................................... 56
FANN ....................................................................................................................................................................... 56
Programming language support .................................................................................................................... 57
Python....................................................................................................................................................................... 57
Java ............................................................................................................................................................................ 57
Lisp ............................................................................................................................................................................ 57
Prolog........................................................................................................................................................................ 58
C++........................................................................................................................................................................... 58
AIML ........................................................................................................................................................................ 58
Practical implementations ....................................................................................................... 60
Text Classification .......................................................................................................................................... 60

13
Text Classification Using Neural Networks ....................................................................................................... 60
Image Processing ........................................................................................................................................... 70
Recognizing Objects with Deep Learning .......................................................................................................... 70
Building our Bird Classifier ................................................................................................................................... 83
Testing Our Network ............................................................................................................................................. 87
Major NN projects .................................................................................................................. 90
Recognition of Braille Alphabet Using Neural Networks .............................................................................. 90
Shuttle Landing Control ................................................................................................................................. 90
Music Classification by Genre Using Neural Networks ................................................................................ 90
Face Recognition Using Neural Network...................................................................................................... 90
Concept Learning and Classification - Hayes-Roth Data Set ........................................................................ 90
Predicting Poker Hands with Neural Networks ............................................................................................ 90
Predicting Relative Performance of Computer Processors with Neural Networks......................................... 91
Predicting Survival of Patients Using Habermans Data Set ........................................................................... 91
Predicting the Class of Breast Cancer with Neural Networks ........................................................................ 91
Breast Tissue Classification Using Neural Networks ..................................................................................... 91
Classification of Animal Species Using Neural Networks .............................................................................. 91
Car Evaluation Using Neural Networks ......................................................................................................... 91
Lenses Classification Using Neural Networks .............................................................................................. 92
Balance Scale Classification Using Neural Networks .................................................................................... 92
Blood Transfusion Service Center .................................................................................................................. 92
Predicting the Result of Football Match with Neural Networks ................................................................... 92
Predicting the Workability of High-Performance Concrete ........................................................................... 93
Concrete Compressive Strength Test ............................................................................................................. 93
Glass Identification Using Neural Networks ................................................................................................. 93
Teaching Assistant Evaluation....................................................................................................................... 94
Predicting Protein Localization Sites Using Neural Networks...................................................................... 94
Predicting the Religion of European States Using Neural Networks ............................................................ 94
Predicting the Burned Area of Forest Fires Using Neural Networks ............................................................ 95
Wine Classification Using Neural Networks ................................................................................................. 95
NeurophRM: Integration of the Neuroph Framework into RapidMiner ....................................................... 96

Open sources resources ........................................................................................................... 97


Issues and Challenges ............................................................................................................. 98
Uncertainty ..................................................................................................................................................... 98
Lots and Lots of Data ..................................................................................................................................... 98
Overfitting in Neural Networks ..................................................................................................................... 98
Hyperparameter Optimization ....................................................................................................................... 99
Requires High-Performance Hardware ......................................................................................................... 99
Neural Networks Are Essentially a Blackbox ............................................................................................... 100
Lack of Flexibility and Multitasking ............................................................................................................. 100

Applications of ANN ............................................................................................................. 102


Speech Recognition ....................................................................................................................................... 102
Character Recognition .................................................................................................................................. 102
Signature Verification Application ................................................................................................................ 102
Human Face Recognition ............................................................................................................................. 103
Image Compression ...................................................................................................................................... 103
Stock Market Prediction ................................................................................................................................ 103

14
Traveling Saleman's Problem ....................................................................................................................... 103
Future in NN ................................................................................................................................................ 104

Deep Learning: What & Why? .............................................................................................. 105


Deep Learning and Artificial Neural Networks ............................................................................................ 106
What Is Deep Learning, Exactly? ................................................................................................................. 107
What Is Learned? .......................................................................................................................................... 107
Feature Learning ........................................................................................................................................... 107
Why is it Called “Deep Learning”? ............................................................................................................... 108
History of Deep Learning ............................................................................................................................. 109
Deep Learning Timeline ............................................................................................................................... 110

Our Future with Deep Learning Applied ...............................................................................113


Medical Technology ...................................................................................................................................... 114
Biomechanics ................................................................................................................................................ 114
Fully Automated Smart Homes ..................................................................................................................... 114
Advanced Mobile Technology ...................................................................................................................... 115
Automated Commercial Use Programs......................................................................................................... 115
Partial Artificial Intelligences ........................................................................................................................ 116
Complete Artificial Intelligences ................................................................................................................... 116

Summary .................................................................................................................................119
Thank you ! .............................................................................................................................121

15
16
Introduction to Artificial Neural Network
An Artificial Neural Network (ANN) is a computational model. It is based on the structure and
functions of biological neural networks. It works like the way human (animal) brain processes
information. It includes a large number of connected processing units called neurons that work
together to process information. They also generate meaningful results from it. In this book, we will
take you through the complete introduction to Artificial Neural Network, Artificial Neural Network
Structure, layers of ANN, Applications, Algorithms, Tools and technology, Practical implementations
and the benefits and limitations of ANN.

A Brief History of Neural Network

The historical backdrop of neural networking seemingly began in the late 1800s with scientific
endeavors to ponder the workings of the human brain. In 1890, William James published the principal
work about brain activity patterns.
In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts composed a paper on
how neurons may function. Keeping in mind the end goal to portray how neurons in the brain may
function, they modeled a straightforward neural network utilizing electrical circuits around the same
time. In any case, the technology accessible around then did not enable them to do excessively.

Artificial Neural Network vs. Biological Neural Network?

Real – Biological Neurons

The human brain is a neural network. The fundamental element of the neural network is called a
neuron. Our brain has 10^11 neurons. And each of these neurons is connected to approximately 10^4
other neurons.
Structure of Neurons in a brain comprises of four important parts:

17
Dendrite: It receives signals from other surrounding neurons. It may have n number of branches
where each Dendrite branch is connected to one neuron.

Soma (The cell body): It is the body of the nucleus.  It sums all the incoming signals to generate an
input.
Axon: When the sum reaches a certain threshold value, the neuron fires a signal which travels down
the axon and is transmitted to other neurons via the synapses terminals.

Synapses:  The point of interconnection of one neuron with other neurons. The synapses of a neuron
is connected to dendrites of the neighboring neuron. The amount of signal transmitted depend upon
the strength (synaptic weights) of the connections.

Artificial Neurons

Our essential computational component (model neuron) is frequently called a node or unit. It gets
input from some different units, or maybe from an outer source. Each input has an associated weight
w, which can be altered in order to model synaptic learning. The unit processes some function f of
the weighted sum of its inputs:

Its output, in turn, can fill in as input to different units.

A Simple Artificial Neuron

18
• The weighted sum is called the net input to unit i, regularly written neti.
• Note that wij refers to the weight from unit j to unit i (not the other way around).
• The function f is the unit's activation function. In the least complex case, f is the identity
function, and the unit's output is only its net input. This is called a linear unit.

An artificial neuron is a mathematical function imagined as a model of biological neurons. Artificial


neurons are elementary units in an artificial neural network. The artificial neuron gets at least one
inputs related with a few weights and a bias. It sums them to deliver an output (or activation, speaking
to a neuron's activity potential which is transmitted along its axon).

19
20
What Is Artificial Neural Network?

Let Us Introduce

To better understand artificial neural computing it is important to know first how a conventional
'serial' computer and its software process information. A serial computer has a central processor that
can address an array of memory locations where data and instructions are stored.
Computations are made by the processor reading an instruction and in addition any data the
instruction requires from memory addresses, the instruction is then executed and the outcomes are
spared in a predefined memory area as required. In a serial system (and a standard parallel one too)
the computational steps are deterministic, sequential and logical, and the state of a given variable can
be followed starting with one operation then onto the next.

In comparison, ANNs are not sequential or fundamentally deterministic. There are no complex central
processors, rather there are numerous straightforward ones which for the most part do simply take
the weighted sum of their inputs from different processors. ANNs don't execute programed
instructions; they respond in parallel (either reenacted or genuine) to the pattern of inputs displayed
to it. There are additionally no different memory addresses for putting away data. Rather, information
is contained in the general activation 'state' of the network. 'Knowledge' is consequently spoken to by
the network itself, which is truly more than the sum of its individual segments.

An Artificial Neural Network is an information processing technique. It works like the way human
brain processes information. ANN incorporates an extensive number of associated processing units
that cooperate to process information. They likewise produce significant outcomes from it.

Neural networks discover extraordinary application in data mining utilized as a part of segments. For
instance financial matters, legal sciences, and so forth and for pattern recognition. It can be likewise
utilized for data classification in a lot of data after watchful training. We can apply neural network not
just for classification. It can likewise connected for regression of consistent target qualities.

Artificial Neural Network Layers

Artificial Neural network is typically organized in layers. Layers are being made up of many
interconnected ‘nodes’ which contain an ‘activation function’. A neural network may contain the
following 3 layers:
a. Input Layer
b. Hidden Layer
c. Output Layer
Patterns are presented to the network via the 'input layer', which communicates to one or more 'hidden
layers' where the actual processing is done via a system of weighted 'connections'. The hidden layers

21
then link to an 'output layer' where the answer is output as shown in the figure below.

a. Input Layer:

The purpose of the input layer is to receive as input the values of the explanatory attributes for each
observation. Usually, the number of input nodes in an input layer is equal to the number of explanatory
variables. ‘Input layer’ presents the patterns to the network, which communicates to one or more
‘hidden layers’.

The nodes of the input layer are passive, meaning they do not change the data. They receive a single
value on their input and duplicate the value to their many outputs. From the input layer, it duplicates
each value and sent to all the hidden nodes.

b. Hidden Layer

The Hidden layers apply given transformations to the input values inside the network. In this,
incoming arcs that go from other hidden nodes or from input nodes connected to each node. It
connects with outgoing arcs to output nodes or to other hidden nodes. In hidden layer, the actual

22
processing is done via a system of weighted ‘connections’. There may be one or more hidden layers.
The values entering a hidden node multiplied by weights, a set of predetermined numbers stored in
the program. The weighted inputs are then added to produce a single number.

c. Output Layer

The hidden layers then link to an ‘output layer‘. Output layer receives connections from hidden layers
or from input layer. It returns an output value that corresponds to the prediction of the response
variable. In classification problems, there is usually only one output node. The active nodes of the
output layer combine and change the data to produce the output values.

The ability of the neural network to provide useful data manipulation lies in the proper selection of
the weights. This is different from conventional information processing.

Structure of a Neural Network

A neural network has at least two physical components, namely, the processing elements and the
connections between them. The processing elements are called neurons, and the connections between
the neurons are known as links.

The structure of a neural network is also referred to as its ‘architecture’ or ‘topology’. It consists of
the number of layers, Elementary units. It also consists of interconnected Weight adjustment
mechanism. The choice of the structure determines the results which are going to obtain. It is the
most critical part of the implementation of a neural network.

The simplest structure is the one in which units distributes in two layers: An input layer and an output
layer. Each unit in the input layer has a single input and a single output which is equal to the input.
The output unit has all the units of the input layer connected to its input, with a combination function
and a transfer function. There may be more than 1 output unit. In this case, resulting model is a linear
or logistic regression. This is depending on whether transfer function is linear or logistic. The weights
of the network are regression coefficients.

By adding 1 or more hidden layers between the input and output layers and units in this layer the
predictive power of neural network increases. But a number of hidden layers should be as small as
possible. This ensures that neural network does not store all information from learning set but can
generalize it to avoid over fitting.

Over fitting can occur. It occurs when weights make the system learn details of learning set instead of
discovering structures. This happens when size of learning set is too small in relation to the complexity
of the model.

23
A hidden layer is present or not, the output layer of the network can sometimes have many units,
when there are many classes to predict.

24
Learning Process

Basically, learning means to do and adapt the change in itself as and when there is a change in
environment. ANN is a complex system or more precisely we can say that it is a complex adaptive
system, which can change its internal structure based on the information passing through it.

Learning rule or Learning process is a method or a mathematical logic which improves the artificial
neural network's performance and usually this rule is applied repeatedly over the network. It is done
by updating the weights and bias levels of a network when a network is simulated in a specific data
environment. A learning rule may accept existing condition (weights and bias) of the network and will
compare the expected result and actual result of the network to give new and improved values for
weights and bias. Depending on the complexity of actual model, which is being simulated, the learning
rule of the network can be as simple as an XOR gate or Mean Squared Error or it can be the result of
multiple differential equations. The learning rule is one of the factors which decides how fast or how
accurate the artificial network can be developed. Depending upon the process to develop the network
there are three main models of machine learning:
Supervised Learning

The learning algorithm would fall under this category if the desired output for the network is also
provided with the input while training the network. By providing the neural network with both an
input and output pair it is possible to calculate an error based on its target output and actual output.
It can then use that error to make corrections to the network by updating its weights.

Unsupervised Learning

In this paradigm the neural network is only given a set of inputs and it's the neural network's
responsibility to find some kind of pattern within the inputs provided without any external aid. This
type of learning paradigm is often used in data mining and is also used by many recommendation
algorithms due to their ability to predict a user's preferences based on the preferences of other similar
users it has grouped together.

Reinforcement Learning

Reinforcement learning is similar to supervised learning in that some feedback is given, however
instead of providing a target output a reward is given based on how well the system performed. The
aim of reinforcement learning is to maximize the reward the system receives through trial-and-error.
This paradigm relates strongly with how learning works in nature, for example an animal might
remember the actions it's previously taken which helped it to find food (the reward).

25
The possibility of learning has attracted the most interest in neural networks. Given a specific task to
solve, and a class of functions F, learning means using a set of observations to find f*є F which solves
the task in some optimal sense.

This entails defining a cost function C: F → R such that, for the optimal solution f*, C (f*) ≤ C (f),
every f*є F – i.e., no solution has a cost less than the cost of the optimal solution

The cost function C is an imperative idea in learning, as it is a measure of how far away a specific
solution is from an ideal solution to the issue to be fathomed. Learning algorithms seek through the
solution space to discover a function that has the littlest conceivable cost.

Let us see different learning rules in the neural network:

• Hebbian learning rule – It identifies, how to modify the weights of nodes of a network.
• Perceptron learning rule – Network starts its learning by assigning a random value to each weight.
• Delta learning rule – Modification in sympatric weight of a node is equal to the multiplication of
error and the input.
• Correlation learning rule – The correlation rule is the supervised learning.
• Outstar learning principle – We can utilize it when it assumes that nodes or neurons in a network
orchestrated in a layer.

26
Why Neural Networks?

Objective of this Chapter


At the end of this chapter, the reader should have learnt:
• Fundamentals of Artificial Neural Networks
• Activation function
• Adjustments of Weights or learning
• Recurrent Neural Network Extensions
• Long Short-Terms Memory
• Deep belief Networks
• Deep Reservoir Computing

27
Let Us Introduce

Neural networks adopt an alternate strategy to problem solving than that of conventional computers.
Conventional computers utilize an algorithmic approach i.e. the computer takes after a set of
instructions keeping in mind the end goal i.e. to take care of a problem. Unless the particular steps
that the computer needs to take after are known the computer can't tackle the problem. That limits
the problem solving capacity of conventional computers to problems that we as of now comprehend
and know how to solve. Be that as it may, computers would be a great deal and more helpful on the
off chance if they could do things that we don't precisely know how to do.

Neural networks process information comparably the human brain does. The network is made out of
an extensive number of much interconnected processing elements (neurons) working in parallel to
take care of a particular problem. Neural networks learn by example. They can't be programmed to
play out a particular assignment. The examples must be chosen carefully else important time is
squandered or far and away more terrible the network may function erroneously. The problem is that
there is no chance to get of knowing whether the system is broken or not, unless an error happens.

Neural nets are broadly utilized as a part of pattern recognition as a result of their capacity to sum up
and to respond to startling inputs/patterns. Amid training, neurons are instructed to perceive different
particular patterns and whether to fire or not when that pattern is found. If a pattern is received during
the execution stage that is not associated with an output, the neuron selects the output that
corresponds to the pattern from the set of patterns that it has been taught of, that is least different
from the input. This is called generalization.

For example:

A 4-input neuron is prepared to fire when the input is 1111 and not to fire when the input is 0000. In
the wake of applying the generalization rule the neuron will likewise fire when the input is 0111, 1011,
1101, 1110 or 1111 yet won't fire when the input is 0000, 0001, 0010, 0100 or 1000. Some other inputs
(like 0011) will create an irregular output since they are similarly far off from 0000 and 1111

Pattern reproduction is substantially more entangled and something that on conventional computers
is extremely hard to do. For pattern remaking feed-forward networks are insufficient. Feedback is
required with a specific end goal to make a dynamic system that will create the suitable pattern. The
output of every neuron is associated with the input of the neighboring neurons. These sort of networks
are called auto-associative networks.

A fascinating experiment was completed including a neural network controlling a vehicle. The
experiment was planned to contrast human driving conduct and neural network driving conduct. The
outcomes demonstrated a shocking likeness among them. As indicated by the outcomes neural nets
can inexact human driving conduct with a most extreme error of 5%.

28
Fundamentals of Artificial Neural Networks

Network Topology

A network topology is the course of action of a network alongside its nodes and interfacing lines. As
indicated by the topology, ANN can be named the accompanying sorts:

Feed forward Network

It is a non-repetitive network having processing units/nodes in layers and every one of the nodes in a
layer are associated with the nodes of the past layers. The connection has distinctive weights upon
them. There is no feedback circle implies the signal can just stream one way, from input to output. It
might be partitioned into the accompanying two types:

i. Single layer feed forward network:

The idea is of feed forward ANN having just a single weighted layer. At the end of the day, we can
state the input layer is completely associated with the output layer.

The most straightforward sort of neural network is a single-layer network, which comprises of a single
layer of output nodes; the inputs are fed specifically to the outputs by means of a progression of
weights. Along these lines it can be viewed as the least difficult sort of feed-forward network. The
sum of the results of the weights and the inputs is ascertained in every node, and if the value is over
some threshold (commonly 0) the neuron fires and takes the initiated value (normally 1); else it takes
the deactivated value (ordinarily - 1). Neurons with this sort of activation function are additionally

29
called artificial neurons or linear threshold units.

ii .Multilayer feed forward network:


The concept is of feed forward ANN having more than one weighted layer. As this network has one
or more layers between the input and the output layer, it is called hidden layers.

This class of networks consists of multiple layers of computational units, interconnected in a feed-
forward way. Each neuron in one layer has directed connections to the neurons of the subsequent
layer.
The first layer is an input layer, the last layer is the output layer and the layers between the input and
the output layers are hidden layers. This hidden layer is internal to the network and has no direct
connection with the external environment. There can be more than one hidden layers however
hypothetical work has demonstrated that one hidden layer is adequate to assess any complex nonlinear
function. The complexity of the network increments with the expansion in the quantity of hidden
layers. At the point when the quantity of hidden layers are substantial an effectiveness of output
response increases.
Multi-layer networks utilize an assortment of learning techniques, the most well-known being back-
propagation. Here, the output values are contrasted with the right answer with compute the value of
some predefined error-function. The algorithm changes the weights of every connection keeping in
mind the end goal to lessen the value of the error function by some little sum. Subsequent to rehashing
this process for an adequately vast number of training cycles, the network will for the most part
focalize to some state where the error of the computations is little. For this situation, one would state

30
that the network has learned a specific target function. To alter weights legitimately, one applies a
general strategy for non-linear optimization that is called gradient descent

Feedback Network

As the name proposes, a feedback network has feedback paths, which implies the signal can stream
in the two directions utilizing loops. This makes it a non-linear dynamic system, which changes
persistently until the point that it achieves a state of equilibrium. It might be partitioned into the
accompanying kinds:

Recurrent networks: They are feedback networks with closed loops. Following are the two sorts of
recurrent networks.

Single Layer Recurrent Network

In the event that feedback of the output of the processing components is guided back as input to the
processing components in a similar layer then it is called as parallel feedback. Recurrent networks are
feedback networks with closed loop. In recurrent network the output of the processing element can
be directed back to the processing element itself or to the other processing elements or both.

MultiLayer Recurrent Network

Generally, a Recurrent Multi-Layer Network consists of multiple layers of nodes. Each of these layers
is feed-forward except for the last layer, which can have feedback connections. Here the output of a
processing element can be directed back to the nodes in the preceding layer or the same layer or both.

31
This forms a multilayer recurrent network.

Some variants of the recurrent network are:

i. Fully recurrent network:


It is the simplest neural network architecture because all nodes are connected to all other nodes and
each node works as both input and output.

ii. Jordan network:


It is a closed loop network in which the output will go to the input again as feedback as shown in the
following diagram.

32
iii. Hopfield Network

Hopfield neural network consists of a single layer which contains one or more fully connected
recurrent neurons. The Hopfield network is commonly used for auto-association and optimization
tasks.

iv. Long Short Term Memory

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN,
capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber
(1997), and were refined and popularized by many people in following work.1 They work
tremendously well on a large variety of problems, and are now widely used.

33
LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering
information for long periods of time is practically their default behavior, not something they struggle
to learn!

v. Elman Network

vi. Hierarchical RNN, etc.

Activation Functions

It might be characterized as the additional power or exertion connected over the input to get a correct
output. In ANN, we can likewise apply activation functions over the input to get the correct output.
Followings are some activation functions of intrigue:

Linear Activation Function

It is also called the identity function as it performs no input editing. It can be defined as

F(x) = x

This activation function is unbounded. It is a generalized model to use signal functions other than
threshold function. The output remains the same as the input.

Sigmoid Activation Function

It is of two kind as takes after:

i. Binary sigmoidal function: This activation function performs input altering in the vicinity of 0 and
1. It is certain in nature. It is constantly limited, which implies its output can't be under 0 and more
than 1. It is likewise entirely expanding in nature, which implies more the input higher would be the
output. It can be defined as

ii. Bipolar sigmoidal function: This activation function performs input editing between -1 and 1. It
can be positive or negative in nature. It is always bounded, which means its output cannot be less than

34
-1 and more than 1. It is also strictly increasing in nature like sigmoid function. It can be defined as

Binary threshold signal function

The function is defined as

where xth represents the threshold value. The output of this function is binary i.e. either 0 or 1

Bipolar threshold signal function

The function is defined as

where xth represents the threshold value. The output of this function is bipolar i.e. either 0 or -1

Linear threshold (RAMP) signal function

This is a bounded version of the linear threshold function which is defined as:

Adjustments of Weights or Learning

35
Learning, in artificial neural network, is the strategy for adjusting the weights of connections between
the neurons of a predefined network. Learning in ANN can be ordered into three classifications in
particular supervised learning, unsupervised learning, and reinforcement learning.
Learning Paradigms

Supervised Learning

Supervised learning is an undertaking of deducing a function from labelled training data. The training
data comprise of a set of training examples. In supervised learning, every example is a couple
comprising of an input question (normally a vector) and the coveted output value (likewise called the
supervisory signal).

A supervised learning algorithm examines the training data and produces a deduced function, which
can utilized for mapping new examples. An ideal situation will consider the algorithm to accurately
decide the class names for inconspicuous examples. This requires the learning algorithm to generalize
from the training data to unseen situations in a “reasonable” way.

The majority of practical machine learning uses supervised learning.

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an
algorithm to learn the mapping function from the input to the output.

Y = f(X)

The objective is to approximate the mapping function so well that whenever there is a new input data
(x) that you can anticipate the output variables (Y) for that data.

It is called supervised learning on the grounds that the process of an algorithm learning from the
training dataset can be thought of as an educator overseeing the learning process. We know the right
answers, the algorithm iteratively makes predictions on the training data and is rectified by the
educator. Learning stops when the algorithm accomplishes an adequate level of performance.

Example for supervised learning


1. Logistic Regression
2. Decision trees
3. Support vector machine (SVM)
4. K-Nearest Neighbors
5. Naive Bayes
6. Random forest
7. Linear regression
8. Polynomial regression

36
9. SVM for regression

Unsupervised Learning

In data science world, the problem of an unsupervised learning task is endeavoring to discover hidden
structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or
reward signal to assess a potential solution.

Unsupervised learning is the place you just have input data (X) and no corresponding output variables.
The objective for unsupervised learning is to model the basic structure or appropriation in the data
with a specific end goal to learn more about the data.

These are called unsupervised learning in light of the fact that not at all like supervised learning above
there is no right answers and there is no instructor. Algorithms are left to their own devises to find
and present the fascinating structure in the data.

Unsupervised learning problems can be additionally gathered into clustering and association problems.

Clustering: A clustering problem is the place you need to find the inborn groupings in the data, for
example, gathering clients by obtaining conduct.

Association: An association rule learning problem is the place you need to find rules that portray
extensive segments of your data, for example, individuals that purchase X additionally tend to
purchase Y.

Some prominent examples of unsupervised learning algorithms are:

1. K-means for clustering problems.


2. Apriori algorithm for association rule learning problems.
3. Hierarchical clustering
4. Hidden Markov models

Semi-Supervised Machine Learning

Problems where you have a lot of input data (X) and just a portion of the data is labelled (Y) are called
semi-supervised learning problems.

These problems sit in the middle of both supervised and unsupervised learning.

A decent example is a photograph archive where just a portion of the pictures are labeled, (e.g. canine,
feline, individual, etc.) and the greater part are unlabeled.

37
Numerous real world machine learning problems fall into this zone. This is on account of it can be
costly or tedious to mark data as it might expect access to domain specialists. While unlabeled data is
shabby and simple to gather and store.

You can utilize unsupervised learning techniques to find and learn the structure in the input variables.

You can likewise utilize supervised learning techniques to make best figure predictions for the
unlabeled data, feed that data once more into the supervised learning algorithm as training data and
utilize.

Reinforcement Learning

In reinforcement learning, data x is usually not given, but generated by an agent’s interactions with the
environment. At each point in time t, the agent performs an action yt and the environment generates
an observation xt and an instantaneous cost ct, according to some (usually unknown) dynamics. The
aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost, i.e.
the expected cumulative cost. The environment’s dynamics and the long-term cost for each policy are
usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part
of the overall algorithm. Tasks that fall inside the worldview of reinforcement learning are control
problems, amusements and other sequential decision making tasks.

38
Major Variants of Artificial Neural Network

Objective of this Chapter


At the end of this chapter, the reader should have learnt:
• Multilayer perceptron
• Convolutional Neural Networks
• Recurrent Neural Networks
• Recurrent Neural Network Extensions
• Long Short-Terms Memory
• Deep belief Networks
• Deep Reservoir Computing

39
Multilayer Perceptron (MLP)

The field of artificial neural networks is frequently just called neural networks or multi-layer
perceptrons after maybe the most valuable kind of neural network. A perceptron is a single neuron
model that was an antecedent to bigger neural networks.
A multilayer perceptron (MLP) is a class of feedforward artificial neural network. A MLP comprises
of no less than three layers of nodes. With the exception of the input nodes, every node is a neuron
that uses a nonlinear activation function. MLP uses a supervised learning technique called back
propagation for training. Its numerous layers and non-linear activation differentiate MLP from a linear
perceptron. It can distinguish data that is not linearly separable.
An MLP is a network of simple neurons called perceptrons. The basic concept of a single perceptron
was introduced by Rosenblatt in 1958. The perceptron computes a single output from multiple real-
valued inputs by forming a linear combination according to its input weights and then possibly putting
the output through some nonlinear activation function.

Mathematically this can be written as

where ω denotes the vector of weights, x is the vector of inputs, b the bias and Ҩ is the activation
function.

An MLP (or Artificial Neural Network - ANN) with a single hidden layer can be represented
graphically as follows:

40
Multilayer perceptrons are sometimes colloquially referred to as "vanilla" neural networks, especially
when they have a single hidden layer.

Activation Function

On the off chance that a multilayer perceptron has a linear activation function in all neurons, that is,
a linear function that maps the weighted inputs to the output of every neuron, then linear algebra
shows that any number of layers can be lessened to a two-layer input-output model. In Multilayer
Perceptrons some of the artificial neurons use a nonlinear activation function. This nonlinear
activation function was developed to model the frequency of action potentials, or firing, of biological
neurons.
The two common activation functions that are both sigmoids are described by:

The first is a hyperbolic tangent that ranges from -1 to 1, while the other is the logistic function, which
is similar in shape but ranges from 0 to 1. Here yi is the output of the ith node (neuron) and vi is the
weighted sum of the input connections. Elective activation functions have been proposed, including
the rectifier and softplus functions. More particular activation functions incorporate radial basis
functions (utilized as a part of radial basis networks, another class of supervised neural network
models).

Layers

The Multilayer Perceptron consists of at least three or more layers as follows:


i. an input

41
ii. an output layer
iii. one or more hidden layers
of nonlinearly-activating nodes which is the reason it is a deep neural network (DNN). Since MLPs
are fully connected, each node in one layer connects with a certain weight wij to every node in the
accompanying layer.

Learning

Learning happens in the perceptron by changing connection weights after each bit of data is processed,
in view of the measure of error in the output contrasted with the normal outcome. This is an example
of supervised learning, and is brought out through back propagation, a speculation of the slightest
mean squares algorithm in the linear perceptron.

We present the error in output node j in the nth data point (training example) by

where d is the target value and y is the value delivered by the perceptron. The node weights are
balanced in light of corrections that limit the error in the whole output, given by

Using gradient descent, the change in each weight is

where yi is the output of the past neuron and ɳ is the learning rate, which is chosen to guarantee that
the weights rapidly merge to a response, without motions.

The subordinate to be calculated relies upon the initiated local field vj which itself shifts. It is anything
but difficult to demonstrate that for an output node this subordinate can be rearranged to

42
where ɸ’ is the derivative of the activation function described above, which itself does not vary. The
analysis is more difficult for the change in weights to a hidden node, but it can be shown that the
relevant derivative is

This relies upon the adjustment in weights of the kth nodes, which speak to the output layer. So to
change the hidden layer weights, the output layer weights change as per the derivative of the activation
function, thus this algorithm presents the back propagation of the activation function.

Terminology

"Multilayer perceptron" does not signify a single perceptron that has various layers. Or maybe, it
involves numerous perceptrons that are systematized into layers. An option is "multilayer perceptron
network". In addition, MLP "perceptrons" are not perceptrons in the severest conceivable logic.
Genuine perceptrons are formally a unique instance of artificial neurons that utilization a threshold
activation function, for example, the Heaviside step function. MLP perceptrons can utilize subjective
activation functions. A genuine perceptron performs binary classification (either one of the available
option), a MLP neuron is allowed to either perform classification or regression, contingent on its
activation function. The term multilayer perceptron" later was applied without respect to nature of
the nodes/layers, which can be a collection of haphazardly defined artificial neurons, and not
perceptrons explicitly. This interpretation avoids the loosening of the definition of "perceptron" to
mean an artificial neuron in common.

Applications

MLPs are beneficial in research for their capability to solve problems stochastically, which often
allows estimated solutions for tremendously difficult problems like fitness approximation.

43
MLPs are universal function approximation functions as showed by Cybenko's theorem, so they can
be utilized to create mathematical models by regression analysis. As classification is a precise case of
regression when the response variable is categorical, MLPs make decent classifier algorithms.

MLPs remained a popular machine learning solution in the 1980s, discovering applications in
miscellaneous fields such as speech recognition, image recognition, and machine translation software,
but subsequently faced strong competition from much simpler (and related) support vector machines.
Interest in back propagation networks returned due to the victories of deep learning.

Convolutional neural networks

There are three basic components as prerequisite to define a basic convolutional network.

1. The convolutional layer


2. The Pooling layer[optional]
3. The output layer

Let’s see all of these in a slightly additional details

The convolutional layer

Assume we have an image of dimension 6*6. We describe a weight matrix which excerpts certain
features from the images

We have

Initialized the weight as a 3*3 dimension matrix. This weight intend to now run through the image

44
such that all the pixels are enclosed at least once, to give a convolved output. The value 429 shown
above, is obtained by addition of the values acquired by element wise multiplication of the weight
Matrix and the highlighted 3*3 part of the input image. Similarly this weight matrix is passed over all
positions and a fresh 4*4 matrix will be created

The 6*6 image is now converted into a 4*4 image. Think of weight matrix like a paint brush painting
a wall. The brush first paints the wall horizontally and then comes down and paints the next row
horizontally. Pixel values are used again when the weight matrix moves along the image. This
essentially enables parameter sharing in a convolutional neural network.

45
Let’s see how this looks like in a real image.

The Pooling Layer

Occasionally when the images are excessively large, we would want to decrease the amount of trainable
parameters. It is then preferred to periodically introduce pooling layers in the middle of succeeding
convolution layers. Pooling is done for the lone purpose of decreasing the spatial size of the image.
Pooling is done autonomously on each depth dimension, thus the depth of the image remains
unaffected. The most common form of pooling layer generally applied is the max-pooling.

At this point we have taken stride as 2, whereas pooling size also as 2. The max operation is applied
to every depth dimension of the convolved output. As you can see, the 4*4 convolved output has
converted to 2*2 after the max pooling operation.

Let’s see how max-pooling looks on a real image.

46
As you can see I have taken convoluted image and have applied max pooling on it. The max pooled
image yet preserves the information that it’s a car on a street. If you observe carefully, the
dimensions if the image have been split fifty-fifty. This helps to diminish the parameters to a great
extent.

Similarly, other forms of pooling can also be applied like average pooling or the L2 norm pooling.

The Output Layer

After multiple layers of convolution and padding, we would need the output in the form of a class.
The convolution and pooling layers would merely be capable to excerpt features and decrease the
number of parameters from the actual images. On the other hand, to produce the final output we
need to apply a fully connected layer to produce an output identical to the number of classes we want.
It turn out to be tough to reach that number with just the convolution layers. Convolution layers
produce 3D activation maps whereas we just need the output as whether or not an image belongs to
a specific class. The output layer has a loss function like categorical cross-entropy, to calculate the
error in prediction. Once the forward pass is complete the back propagation initiates to update the
weight and biases for error and loss reduction.

Recurrent Neural Networks

A recurrent neural network (RNN) is a type of artificial neural network where connections between
elements (neurons) form a directed cycle. This permits it to reveal dynamic temporal behaviour.
Dissimilar from feed forward neural networks, RNNs can utilize their internal memory to process
random sequences of inputs. This makes them applicable to tasks such as disjoint, connected
handwriting recognition or speech recognition.

47
The idea behind RNNs is to make use of sequential information. In a traditional neural network we
assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very
bad idea. If you need to forecast the next word in a sentence you better know which words came
before it. RNNs are termed as recurrent as they perform the similar task for each element of a
sequence, with the output being dependent on the previous calculations. Alternative way to think
about RNNs is that they have a “memory” which captures information about what has been computed
so far. In theory RNNs are able to make use of information in haphazardly long sequences, but in
actual fact they are restricted to looking back only a limited steps. Here is what a usual RNN looks
like:

The overhead figure displays a RNN being unrolled (or unfolded) into a complete network. By
unrolling we basically mean that we carve out the network for the whole sequence. For example, if
the sequence we are interested in is a sentence of 5 words, the network would be unrolled into a 5-
layer neural network, one layer for every single word. The formulas that administer the calculation
happening in a RNN are as follows:

• x_t is the input at time step t. For example, x_1 could be a one-hot vector corresponding to the
second word of a sentence.

• s_t is the hidden state at time step t. It’s the “memory” of the network. s_t is calculated based on
the preceding hidden state and the input at the present step: s_t=f(Ux_t + Ws_(t-1). The function
f typically is a nonlinearity such as tanh or ReLU. s_-1, which is necessary to compute the initial
hidden state, is normally initialized to all zeroes.

• o_t is the output at step t. For example, if we required to predict the next word in a sentence it
would be a vector of probabilities across our vocabulary. o_t = softmax(Vs_t).

48
There are a small number of things to be noted here:

• You can consider the hidden state s_t as the memory of the network. s_t captures information
about what occurred in all the previous time steps. The output at step o_t is computed exclusively
on the grounds of the memory at time t. As briefly talked about above, it’s a bit more complex in
practice as s_t normally can’t capture information from too many time steps ago.

• Dissimilar to the traditional deep neural network, which utilizes diverse parameters at every single
layer, a RNN shares the same parameters (U, V, W above) through all the steps. This reflects the
fact that we are executing the same job at each step, just with diverse inputs. This greatly decreases
the total number of parameters we require to learn.

• The above figure has outputs at every single time step, but relying on the task this may not be
essential. For instance, when predicting the sentiment of a sentence we may simply care about the
final output, not the sentiment after every single word. Similarly, we may not require inputs at every
single time step. The key feature of an RNN is its hidden state, which captures some information
about a sequence.
RNNs have shown abundant success in many NLP tasks. At this point I should comment that the
most usually used type of RNNs are LSTMs, which are much better at capturing long-term
dependencies than vanilla RNNs are.

Recurrent Neural Network Extensions

As the years have passed by researchers have developed extra sophisticated forms of RNNs to deal
with certain shortcomings of the vanilla RNN model. We will cover them in further detail in a later
topic, but I want this unit to assist as a brief overview so that you are aware with the classification of
models.

Bidirectional RNNs are built on the idea that the output at time t may not only be subject to the
previous elements in the sequence, but also future elements. For example, to predict a missing word
in a sequence you want to observe at both the left and the right context. Bidirectional RNNs are quite
easy to understand. They are just two RNNs arranged on top of each other. The output is then
calculated based on the hidden state of both RNNs.

49
Deep (Bidirectional) RNNs are alike to Bidirectional RNNs, only that we now have numerous layers
per time step. In practice this gives us a greater learning capacity (but we also need a lot of training
data).

LSTM networks are pretty widely held these days and we briefly talked about them in the section

50
above. LSTMs don’t have a profoundly dissimilar architecture from RNNs, but they use a different
function to compute the hidden state. The memory in LSTMs are known as cells and you can consider
them as black boxes that take as input the previous state h_{t-1} and present input x_t. On the inside
these cells choose what to keep in (and what to wipe away from) memory. They then combine the
previous state, the present memory, and the input. It turns out that these types of units are very
resourceful at capturing long-term dependencies.

Long Short-Term Memory

Long short-term memory (LSTM) units (or blocks) are the building blocks of units for layers of a
recurrent neural network (RNN). An RNN consisting of LSTM units is often referred to as an "LSTM
network". A general LSTM unit is comprised of a cell, an input gate, an output gate and a forget gate.
The cell is in charge of "remembering" values over random time intervals; therefore the word
"memory" in LSTM. Each of the three gates can be assumed as a "conventional" artificial neuron, just
like the one in a multi-layer (or feed forward) neural network: that is, they calculate an activation (using
an activation function) of a weighted sum. Intuitively, they can be understood as regulators of the flow
of values that goes through the connections of the LSTM; hence the denotation "gate". There are
connections amid these gates and the cell.

The term long short-term denotes the fact that LSTM is a model for the short-term memory which
can last for a long age of time. An LSTM is well-suited to categorize, process and predict time series
given time lags of indefinite size and period between important events. LSTMs were created to deal
with the exploding and vanishing gradient problem when training traditional RNNs. Relative
insensitivity to gap length gives an benefit to LSTM over other RNNs, hidden Markov models (HMM)
and other sequence learning methods in several applications

There are numerous architectures of LSTM units. A general architecture consists of a memory cell, an
input gate, an output gate and a forget gate.

An LSTM (memory) cell stores a value (or state), for either a long or a short time periods. This is
attained by using an identity (or no) activation function for the memory cell. In this way, when an
LSTM network (that is an RNN composed of LSTM units) is trained with back propagation over
time, the gradient does not incline to vanish.

The LSTM gates calculate an activation, often using the logistic function. Intuitively, the input gate
controls the degree to which a new value flows into the cell, the forget gate controls the degree to
which a value remains in the cell and the output gate controls the degree to which the value in the cell
is used to calculate the output activation of the LSTM unit.

There are connections into and out of these gates. A slight number of connections are recurrent. The
weights of these connections, which are needed to be learned during training, of an LSTM unit are

51
utilized to direct the operation of the gates. Each of the gates has its own parameters which are weights
and biases, from perhaps other units outside the LSTM unit.
Deep Belief Networks

A deep belief network (DBN) is a procreative graphical model in Machine learning, or on the other
hand a class of deep neural network, consisting of several layers of latent variables ("hidden units"),
with connections between the layers but not between units within every single layer.

When trained on a set of examples without supervision, a DBN can learn to probabilistically
restructure its inputs. The layers then act as feature detectors. After this learning step, a DBN can be
further trained with supervision to accomplish classification.

DBNs can be seen as an arrangement of clear, unsupervised networks, for example, Restricted
Boltzmann machines (RBMs) or auto encoders, in which every last sub-network's hidden layer helps
as the visible layer for the accompanying one. A RBM is an undirected, procreative energy based model
with a "visible" input layer and a hidden layer and connections amidst yet not inside layers. This
arrangement prompts a quick, layer-by-layer unsupervised training technique, where contrastive
divergence is applied to each sub-network thusly, starting from the "most reduced" pair of layers (the
least obvious layer is a training set).

Teh's observation that DBNs can be trained greedily, one layer at a time, directed to one of the first
operational deep learning algorithms. Generally, there are many eye-catching implementations and
uses of DBNs in real-life applications and scenarios (e.g., electroencephalography, drug discovery).
Deep Reservoir Computing

Reservoir computing is a structure for computation that may be observed as an extension of neural
networks. Normally an input signal is fed into a fixed (arbitrary) dynamical system called a reservoir
and the dynamics of the reservoir map the input to a higher dimension. Then a modest readout
mechanism is trained to read (understand) the state of the reservoir and map it to the desired output.
The core benefit is that training is executed only at the readout stage and the reservoir is static. Liquid-
state machines and echo state networks are two chief categories of reservoir computing.

The extension of the reservoir computing framework towards Deep Learning, with the introduction
of deep reservoir computing and of the deep Echo State Network (deepESN) model permits to
cultivate efficiently trained models for hierarchical processing of temporal data, at the same time
permitting the investigation on the integral role of layered composition in recurrent neural networks.

Results illustrate that the sole CA(Cellular Automata) reservoir system produces similar results to up-
to-date work. The system encompassed of two layered reservoirs do show an evident development
compared to a single CA reservoir. This shows potential for further research and offers valuable
insight on how to design CA reservoir systems.

52
53
Tools and Technologies

Major libraries

Some of the Major libraries used in the implementation of neural networks are:

OpenNN – Open Neural Network:

OpenNN is an open source class library developed in C++ programming language which equips
neural networks, a key part of machine learning research. The library equips any number of layers of
non-linear processing units for supervised learning. This deep architecture permits the design of neural
networks with universal approximation properties. The chief benefit of OpenNN is its great
performance. It is written in C++ for improved memory management and greater processing speed,
and implements CPU parallelization by means of OpenMP and GPU acceleration with CUDA.

Neural Network Libraries by Sony:

Neural Network Libraries allows you to define a computation graph (neural network) subconsciously
with fewer extent of code. Dynamic computation graph usage permits flexible runtime network
construction. The Library can utilize both paradigms of static and dynamic graph. We create the
Library by keeping portability in mind. We run CIs for Linux and Windows. Most of the code of the
Library is written in C++11. By implementing C++11 core API, you could deploy it onto embedded
devices. We have a nice function abstraction as well as a code template generator for creating a new
function. Those permit the developers to write a new function with a smaller amount of coding. A
new device code can be added as a plugin without any alteration of the Library code. CUDA is in
reality employed as a plugin extension.

Theano – Latest verion: Theano 0.7

This is a very expandable neural network library for utilization with Python. It is proficient of working
on CPU and GPU. It is found to have the finest documentation out of the available neural network
libraries.

Torch - Torch | Scientific computing for LuaJIT.

This one too is a very flexible library. Often it has analogous capabilities and performance to Theano.
However, it is utilized with the Lua language which is one not a well-known one and lacks a lot of the
standard data processing libraries that languages like Python has.

54
Caffe – Caffe: A Deep Learning Framework

Written for users of C++ with CUDA, this library is particularly optimized for vision tasks. I believe
it is frequently among the fastest libraries when benchmarked on vision tasks.

TensorFlow – Link to detailed documentation of tensorflow ->


https://www.tensorflow.org/

Very recently open-sourced by Google, TensorFlow can be thought of as a more or less one of these
neural net libraries with a modular GUI on top of it. Some have disapproved it for not being as fast
as some of the other heavily-optimized libraries but the truth is not the same.

MXNet - MXNet with Documentation

A simple and modular way of building up a neural network and training it can be done using this
library. It is also often among the fastest libraries available. Conversely, I have found it to be lacking
in flexibility and short on documentation.

Keras

This library is a protruding open source library developed in Python for structuring Neural Networks.
It is proficient of running on top of MXNet, Deeplearning4j, Tensorflow, Microsoft Cognitive Toolkit
(CNTK) or Theano. The library comprises of abundant enactments of usually used neural network
building blocks such as layers, objectives, activation functions, optimizers, and a host of tools to make
working with image and text data easier.

Lasagne

This one is a Lightweight library to build and train artificial neural networks in Theano. It supports
Convolutional Neural Networks (CNNs), along with recurrent networks and also including Long
Short-Term Memory (LSTM). It provides transparent provision of CPUs and GPUs due to Theano’s
expression compiler. You can utilize this if you want the flexibility of Theano but don’t want to always
write neural network layers from scratch.

Blocks

Blocks is a framework that aids you in building neural network models on top of Theano.

Pylearn2

Pylearn2 is a library that encompasses a lot of models and training algorithms like the Stochastic
Gradient Descent that are usually used in Deep Learning. Its functional libraries are built on top of
Theano.

55
DeepPy

DeepPy is an alternative Python deep learning framework built on top of NumPy.

Deepnet

This is a GPU-based implementation of deep learning algorithms developed in python. It consists of


Feed-forward Neural Nets, Restricted Boltzmann Machines (RBM), Deep Belief Nets (DBN), Auto
encoders, Deep Boltzmann Machines and Convolutional Neural Nets.

Gensim

Gensim is a deep learning toolkit developed in python programming language. It was envisioned for
handling large text collections, using resourceful algorithms.

nolearn

This library contains a number of wrappers and abstractions around present neural network libraries.
As Keras wraps Theano and TensorFlow to provide a friendly API similarly nolearn is a wrappers and
abstractions for Lasagne, along with a small number of machine learning utility modules.

Passage

Passage is one of the finest suited library for text analysis with RNNs.

The Microsoft Cognitive Toolkit(CNTK)

The Microsoft Cognitive Toolkit (CNTK), is correspondingly a deep-learning toolkit that defines
neural networks as a series of computational steps by means of a directed graph. CNTK permits to
easily realize and combine popular model types such as feed-forward DNNs, convolutional nets
(CNNs), and recurrent networks (RNNs/LSTMs). It equips stochastic gradient descent (SGD, error
back propagation) learning with automatic differentiation and parallelization across several GPUs and
servers. CNTK has been existing under an open-source license since April 2015.

FANN

Fast Artificial Neural Network Library is a free open source neural network library, which equips
multilayer artificial neural networks in C language with support for both fully connected and sparsely
connected networks. Cross-platform execution in both fixed and floating point are sustained. It
comprises a framework for easy handling of training data sets. It is easy to use, multipurpose, well
documented, and fast. Bindings to more than 20 programming languages are available. An easy to read

56
introduction article and a reference manual escorts the library with examples and recommendations
on how to use the library. Numerous graphical user interfaces are also available for the library.

Programming language support

Python

Python is one of the most widely implemented programming languages in the AI zone of Artificial
Intelligence due to its simplicity. It can flawlessly be used with the data structures and other repeatedly
used AI algorithms.

The choice of Python for AI projects also stems from the fact that there are plenteously of beneficial
libraries that can be used in AI. For instance, Numpy provides scientific computation capability, Scypy
for advanced computing and Pybrain for machine learning in Python.

You will also have no difficulties learning Python for AI as there are tons of resources accessible
online.

Java

Java is also a good choice. It is an object-oriented programming language that emphases on providing
all the advanced features required to work on AI projects, it's portable, and it offers in-built garbage
collection. The Java community is also an advantageous point as there will be someone to help you
with your queries and problems.

Java is also a great choice as it offers an easy way to develop algorithms, and AI is full of algorithms,
be they search algorithms, natural language processing algorithms or neural networks algorithms. Not
to mention that Java also permits for scalability, which is a must-have feature for AI projects.

Lisp

Lisp gets along very well in the AI field because of its excellent prototyping capabilities and its
sustenance for symbolic expressions. It's a dominant programming language and is used in major AI
projects, such as Macsyma, DART, and CYC.

The Lisp language is frequently used in the Machine Learning/ ILP sub-field because of its usability
and symbolic structure. Peter Norvig, the well-known computer scientist who works broadly in the
AI field, and also the writer of the famous AI book, “Artificial Intelligence: A modern approach,”
describes why Lisp is one of the best programming languages for AI development in a Quora answer.

57
Prolog

Prolog stands together with Lisp when it comes to worth and usability. Agreeing to the writings,
Prolog Programming for Artificial Intelligence, Prolog is one of those programming languages for
some simple mechanisms, which can be enormously useful for AI programming. For instance, it offers
pattern matching, automatic backtracking, and tree-based data structuring mechanisms. Combining
these mechanisms offers a flexible framework to work with.

Prolog is widely used in expert systems for AI and is also useful for working on medical projects.

C++

C++ is the fastest programming language in the domain. Its capability to talk at the hardware level
allows developers to improve their program execution time. C++ is enormously useful for AI projects,
which are time-sensitive. Search engines, for example, can implement C++ extensively.

In AI, C++ can be used for statistical AI procedures like those found in neural networks. Algorithms
can likewise be written broadly in the C++ for speed execution, and AI in games is mostly coded in
C++ for faster execution and response time.

AIML

AIML - (meaning "Artificial Intelligence Markup Language") is an XML language for implementation
with Artificial Linguistic Internet Computer Entity (A.L.I.C.E.)- type chatterbots. The language has
classes showing a unit of knowledge; patterns of probable utterance addressed to a chatbot, and
templates of possible answers.

Information Processing Language (IPL) was the first language created for artificial intelligence. It
includes features envisioned to support programs that could accomplish general problem solving, such
as lists, associations, schemas (frames), dynamic memory allocation, data types, recursion, associative
retrieval, functions as arguments, generators (streams), and cooperative multitasking.

Smalltalk has been utilized comprehensively for simulations, neural networks, machine learning and
genetic algorithms. It uses the purest and most elegant kind of object-oriented programming using
message passing.

Stanford Research Institute Problem Solver (STRIPS) is a language for expressing automated planning
problem instances. It defines the initial state, the goal states, and a set of actions. For each action pre-
conditions (what must be established before the action is performed) and post-conditions (what is
established after the action is performed) are stated.

58
Planner is a hybrid language between procedural and logical languages. It gives a procedural
interpretation to logical sentences where implications are interpreted with pattern-directed inference.

POP-11 is a reflective, incrementally compiled programming language with countless features of an


interpreted language. It is the principal language of the Poplog programming environment developed
originally by the University of Sussex, and recently in the School of Computer Science at the University
of Birmingham which hosts the Poplog website, It is often used to introduce symbolic programming
methods to programmers of extra conventional languages like Pascal, who find POP syntax more used
to than that of Lisp. One of POP-11's features is that it supports first-class functions.

Haskell is also a very worthy programming language for AI. Lazy evaluation and the list and LogicT
monads make it stress-free to express non-deterministic algorithms, which is often the case. Endless
data structures are great for search trees. The language's features permit a compositional way of
expressing the algorithms. The only disadvantage is that working with graphs is a bit tougher at first
because of purity.

Wolfram Language contains a wide range of integrated machine learning capabilities, from highly
automated functions like Predict and Classify to functions based on specific methods and diagnostics.
The functions work on many types of data, including numerical, categorical, time series, textual, and
image.

Other Programming Languages that can be used for the purpose Artificial Neural Network
MATLAB
Perl
Julia

59
Practical implementations
Text Classification

Text Classification Using Neural Networks

Understanding the way chatbots' work is very important. An important part of machinery inside a
chat-bot is the text classifier. Let’s have a look at the inner mechanisms of an artificial neural network
(ANN) for text classification.

We’ll utilize 2 layers of neurons (1 hidden layer) and a “bag of words” tactic to organize our training
data.

Text classification comes in 3 flavors: pattern matching, algorithms, neural nets.

Although the algorithmic method using Multinomial Naive Bayes is unexpectedly effective, it suffers
from 3 fundamental flaws:
• The algorithm produces a score rather than a probability. We want a probability to overlook
predictions below certain threshold. This is similar to a ‘squelch’ dial on a VHF radio.
• The algorithm ‘learns’ from examples of what is in a class, but not what isn’t. This learning of
patterns of what does not belong to a class is frequently very significant.
• Classes with excessively large training sets can generate distorted classification scores, obliging the
algorithm to adjust scores relative to class size. This is not ideal.

As with its ‘Naive’ counterpart, this classifier isn’t endeavoring to understand the meaning of a
sentence, it’s trying to classify it. In fact so called “AI chat-bots” do not understand language, but
that’s not the topic for now.

Let’s examine our text classifier one sector at a time. We will take the following steps:
1. refer to libraries we need
2. provide training data
3. organize our data
4. iterate: code + test the results + tune the model
5. abstract

The code is below and we’re using IPython notebook which is a super productive way of working on
data science projects. The code syntax is Python.

We begin by importing our natural language toolkit nltk. We need a method to dependably tokenize
sentences into words and a method to stem words.

60
# use natural language toolkit

import nltk
from nltk.stem.lancaster import LancasterStemmer
import os
import json
import datetime
stemmer = LancasterStemmer()

And our training data, 12 sentences belonging to 3 classes (‘intents’).

# 3 classes of training data


training_data = []
training_data.append({"class":"greeting", "sentence":"how are you?"})
training_data.append({"class":"greeting", "sentence":"how is your day?"})
training_data.append({"class":"greeting", "sentence":"good day"})
training_data.append({"class":"greeting", "sentence":"how is it going
today?"})

training_data.append({"class":"goodbye", "sentence":"have a nice day"})


training_data.append({"class":"goodbye", "sentence":"see you later"})
training_data.append({"class":"goodbye", "sentence":"have a nice day"})
training_data.append({"class":"goodbye", "sentence":"talk to you soon"})

training_data.append({"class":"sandwich", "sentence":"make me a sandwich"})


training_data.append({"class":"sandwich", "sentence":"can you make a
sandwich?"})
training_data.append({"class":"sandwich", "sentence":"having a sandwich
today?"})
training_data.append({"class":"sandwich", "sentence":"what's for lunch?"})
print ("%s sentences in training data" % len(training_data))

12 sentences in training data


We can now organize our data structure for documents, classes and words.
words = []
classes = []
documents = []
ignore_words = ['?']
# loop through each sentence in our training data
for pattern in training_data:
# tokenize each word in the sentence
w = nltk.word_tokenize(pattern['sentence'])
# add to our words list
words.extend(w)
# add to documents in our corpus
documents.append((w, pattern['class']))
# add to our classes list
if pattern['class'] not in classes:
classes.append(pattern['class'])

61
# stem and lower each word and remove duplicates
words = [stemmer.stem(w.lower()) for w in words if w not in ignore_words]
words = list(set(words))

# remove duplicates
classes = list(set(classes))

print (len(documents), "documents")


print (len(classes), "classes", classes)
print (len(words), "unique stemmed words", words)

12 documents
3 classes ['greeting', 'goodbye', 'sandwich']
26 unique stemmed words ['sandwich', 'hav', 'a', 'how', 'for', 'ar', 'good',
'mak', 'me', 'it', 'day', 'soon', 'nic', 'lat', 'going', 'you', 'today',
'can', 'lunch', 'is', "'s", 'see', 'to', 'talk', 'yo', 'what']

Observe that each word is stemmed and lower-cased. Stemming benefits the machine equate words
like “have” and “having”. We don’t care about case.

Our training data is transformed into “bag of words” for each sentence.

# create our training data


training = []
output = []
# create an empty array for our output
output_empty = [0] * len(classes)

# training set, bag of words for each sentence


for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]

62
# stem each word
pattern_words = [stemmer.stem(word.lower()) for word in pattern_words]
# create our bag of words array
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)

training.append(bag)
# output is a '0' for each tag and '1' for current tag
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
output.append(output_row)

# sample training/output
i = 0
w = documents[i][0]
print ([stemmer.stem(word.lower()) for word in w])
print (training[i])
print (output[i])

['how', 'ar', 'you', '?']


[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0]
[1, 0, 0]

The above step is a classic in text classification: each training sentence is reduced to an array of 0’s
and 1’s against the array of unique words in the corpus.

['how', 'are', 'you', '?']

is stemmed:

['how', 'ar', 'you', '?']

then transformed to input: a 1 for each word in the bag (the ? is ignored)

[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

and output: the first class

[1, 0, 0]

Note that a sentence could be given multiple classes, or none.


Make sure the above makes sense and play with the code until you grow it.

Your initial step in machine learning is to have clean data.

63
Next we have our main functions for our 2-layer neural network.

We use numpy library because it makes our matrix multiplication to become fast.

We use a sigmoid function to normalize values and its derivative to measure the error rate. Iterating
and adjusting until our error rate is acceptably small.

64
Also below we implement our bag-of-words model function, transforming an input sentence into an
array of 0’s and 1’s. This matches exactly with our transform for training data. It is always crucial to
get this right.

import numpy as np
import time

# compute sigmoid nonlinearity


def sigmoid(x):
output = 1/(1+np.exp(-x))
return output

# convert output of sigmoid function to its derivative


def sigmoid_output_to_derivative(output):
return output*(1-output)

def clean_up_sentence(sentence):
# tokenize the pattern
sentence_words = nltk.word_tokenize(sentence)
# stem each word
sentence_words = [stemmer.stem(word.lower()) for word in sentence_words]
return sentence_words

# return bag of words array: 0 or 1 for each word in the bag that exists in
the sentence
def bow(sentence, words, show_details=False):
# tokenize the pattern
sentence_words = clean_up_sentence(sentence)
# bag of words
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
bag[i] = 1
if show_details:
print ("found in bag: %s" % w)

return(np.array(bag))

def think(sentence, show_details=False):


x = bow(sentence.lower(), words, show_details)
if show_details:
print ("sentence:", sentence, "\n bow:", x)
# input layer is our bag of words
l0 = x
# matrix multiplication of input and hidden layer
l1 = sigmoid(np.dot(l0, synapse_0))
# output layer
l2 = sigmoid(np.dot(l1, synapse_1))
return l2

And now we code our neural network training function to create synaptic
weights. Don’t get too excited, this is mostly matrix multiplication — from
middle-school math class.

65
def train(X, y, hidden_neurons=10, alpha=1, epochs=50000, dropout=False,
dropout_percent=0.5):

print ("Training with %s neurons, alpha:%s, dropout:%s %s" %


(hidden_neurons, str(alpha), dropout, dropout_percent if dropout else '') )
print ("Input matrix: %sx%s Output matrix: %sx%s" %
(len(X),len(X[0]),1, len(classes)) )
np.random.seed(1)

last_mean_error = 1
# randomly initialize our weights with mean 0
synapse_0 = 2*np.random.random((len(X[0]), hidden_neurons)) - 1
synapse_1 = 2*np.random.random((hidden_neurons, len(classes))) - 1

prev_synapse_0_weight_update = np.zeros_like(synapse_0)
prev_synapse_1_weight_update = np.zeros_like(synapse_1)

synapse_0_direction_count = np.zeros_like(synapse_0)
synapse_1_direction_count = np.zeros_like(synapse_1)

for j in iter(range(epochs+1)):

# Feed forward through layers 0, 1, and 2


layer_0 = X
layer_1 = sigmoid(np.dot(layer_0, synapse_0))

if(dropout):
layer_1 *=
np.random.binomial([np.ones((len(X),hidden_neurons))],1-dropout_percent)[0] *
(1.0/(1-dropout_percent))

layer_2 = sigmoid(np.dot(layer_1, synapse_1))

# how much did we miss the target value?


layer_2_error = y - layer_2

if (j% 10000) == 0 and j > 5000:


# if this 10k iteration's error is greater than the last
iteration, break out
if np.mean(np.abs(layer_2_error)) < last_mean_error:
print ("delta after "+str(j)+" iterations:" +
str(np.mean(np.abs(layer_2_error))) )
last_mean_error = np.mean(np.abs(layer_2_error))
else:
print ("break:", np.mean(np.abs(layer_2_error)), ">",
last_mean_error )
break

# in what direction is the target value?


# were we really sure? if so, don't change too much.
layer_2_delta = layer_2_error * sigmoid_output_to_derivative(layer_2)

# how much did each l1 value contribute to the l2 error (according to


the weights)?
layer_1_error = layer_2_delta.dot(synapse_1.T)

66
# in what direction is the target l1?
# were we really sure? if so, don't change too much.
layer_1_delta = layer_1_error * sigmoid_output_to_derivative(layer_1)

synapse_1_weight_update = (layer_1.T.dot(layer_2_delta))
synapse_0_weight_update = (layer_0.T.dot(layer_1_delta))

if(j > 0):


synapse_0_direction_count += np.abs(((synapse_0_weight_update >
0)+0) - ((prev_synapse_0_weight_update > 0) + 0))
synapse_1_direction_count += np.abs(((synapse_1_weight_update >
0)+0) - ((prev_synapse_1_weight_update > 0) + 0))

synapse_1 += alpha * synapse_1_weight_update


synapse_0 += alpha * synapse_0_weight_update

prev_synapse_0_weight_update = synapse_0_weight_update
prev_synapse_1_weight_update = synapse_1_weight_update

now = datetime.datetime.now()

# persist synapses
synapse = {'synapse0': synapse_0.tolist(), 'synapse1':
synapse_1.tolist(),
'datetime': now.strftime("%Y-%m-%d %H:%M"),
'words': words,
'classes': classes
}
synapse_file = "synapses.json"

with open(synapse_file, 'w') as outfile:


json.dump(synapse, outfile, indent=4, sort_keys=True)
print ("saved synapses to:", synapse_file)

We are now ready to build our neural network model, we will save this as a json structure to represent
our synaptic weights.
You should experiment with different ‘alpha’ (gradient descent parameter) and see how it affects the
error rate. This parameter helps our error adjustment find the lowest error rate:

synapse_0 += alpha * synapse_0_weight_update

67
We use 20 neurons in our hidden layer, you can adjust this easily. These parameters will vary depending
on the dimensions and shape of your training data, tune them down to ~10^-3 as a reasonable error
rate.

X = np.array(training)
y = np.array(output)

start_time = time.time()

train(X, y, hidden_neurons=20, alpha=0.1, epochs=100000, dropout=False,


dropout_percent=0.2)

elapsed_time = time.time() - start_time


print ("processing time:", elapsed_time, "seconds")

Training with 20 neurons, alpha:0.1, dropout:False


Input matrix: 12x26 Output matrix: 1x3
delta after 10000 iterations:0.0062613597435
delta after 20000 iterations:0.00428296074919
delta after 30000 iterations:0.00343930779307
delta after 40000 iterations:0.00294648034566
delta after 50000 iterations:0.00261467859609
delta after 60000 iterations:0.00237219554105
delta after 70000 iterations:0.00218521899378
delta after 80000 iterations:0.00203547284581
delta after 90000 iterations:0.00191211022401
delta after 100000 iterations:0.00180823798397
saved synapses to: synapses.json

68
processing time: 6.501226902008057 seconds

The synapse.json file contains all of our synaptic weights, this is our model.

This classify() function is all that’s needed for the classification once synapse weights have been
calculated: 15 lines of code.
The catch: if there’s a change to the training data our model will need to be re-calculated. For a very
large dataset this could take a non-insignificant amount of time.
We can now generate the probability of a sentence belonging to one (or more) of our classes. This is
super-fast because it’s dot-product calculation in our previously defined think() function.

# probability threshold
ERROR_THRESHOLD = 0.2
# load our calculated synapse values
synapse_file = 'synapses.json'
with open(synapse_file) as data_file:
synapse = json.load(data_file)
synapse_0 = np.asarray(synapse['synapse0'])
synapse_1 = np.asarray(synapse['synapse1'])

def classify(sentence, show_details=False):


results = think(sentence, show_details)

results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD ]


results.sort(key=lambda x: x[1], reverse=True)
return_results =[[classes[r[0]],r[1]] for r in results]
print ("%s \n classification: %s" % (sentence, return_results))
return return_results

classify("sudo make me a sandwich")


classify("how are you today?")
classify("talk to you tomorrow")
classify("who are you?")
classify("make me some lunch")
classify("how was your lunch today?")
print()
classify("good day", show_details=True)

sudo make me a sandwich


[['sandwich', 0.99917711814437993]]
how are you today?
[['greeting', 0.99864563257858363]]
talk to you tomorrow
[['goodbye', 0.95647479275905511]]
who are you?
[['greeting', 0.8964283843977312]]
make me some lunch
[['sandwich', 0.95371924052636048]]
how was your lunch today?
[['greeting', 0.99120883810944971], ['sandwich', 0.31626066870883057]]

69
Experiment with additional sentences and dissimilar probabilities, you can then add training data and
improve/expand the model. Notice the solid predictions with limited training data.

Certain sentences will produce multiple predictions (above a threshold). You will want to establish the
correct threshold level for your application. Not all text classification scenarios are the same: some
predictive situations require more confidence than others.

The last classification shows some internal details:


found in bag: good found in bag: day sentence: good day bow: [0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0] good day [['greeting', 0.99664077655648697]]

Notice the bag-of-words (bow) for the sentence, 2 words matched our corpus. The neural-net also
learns from the 0’s, the non-matching words.

A low-probability classification is simply shown by providing a sentence where ‘a’ (common word) is
the only match, for example:
found in bag: a sentence: a burrito! bow: [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] a burrito!
[['sandwich', 0.61776860634647834]]

Image Processing

Here, we will learn how to develop programs that identify objects in photos using deep learning. In
other words, we’re going to explain the black magic that allows Google Photos to search your photos
based on what is in the picture:

Recognizing Objects with Deep Learning

Any 3-year-old child can identify a photograph of a bird, but figuring out how to make a computer
identify objects has baffled the very greatest computer scientists for over 50 years.

In the previous couple of years, we’ve eventually found a decent method to object identification
utilizing deep convolutional neural networks. The concepts are completely understandable if you break
them down one by one.
So let’s do it — let’s write a program that can identify birds!

Starting Simple

70
Beforehand we learn how to identify pictures of birds, let’s learn how to identify something much
simpler   — the handwritten number “8”.
We see how neural networks can solve complex problems by chaining together lots of simple neurons.

We have also seen that the idea of machine learning is that the same generic algorithms can be used
again with diverse data to solve diverse problems. So let’s change this same neural network to identify
handwritten text. But to make the task really simple, we’ll only try to recognize one letter — the
numeral “8”.

Machine learning merely works when you not only have data  but preferably a lot of data. So we need
more and more of handwritten “8”s to get started. Fortunately, researchers have created the MNIST
data set of handwritten numbers for this very purpose. MNIST provides 60,000 images of handwritten
digits, each as an 18x18 image. Here are some “8”s from the data set:

Some 8s from the MNIST data set

If you ponder about it, everything is just numbers.


Now we need to process images with our neural network. How in the world do we feed images into
a neural network as an alternative to just numbers?
The answer is unbelievably simple. A neural network takes numbers as input. To a computer, an image
is actually just a grid of numbers that signify how dark each pixel is:

71
To feed an image into our neural network, we just treat the 18x18 pixel image as an array of 324
numbers:

To handle 324 inputs, we’ll just widen our neural network to have 324 input nodes:

72
Observe that our neural network also has two outputs now (rather than just one). The first output will
predict the probability that the image is an “8” and the second output will predict the probability it is
not an “8”. By having a distinct output for each type of object we want to identify, we can utilize a
neural network to classify objects into groups.
Our neural network is even bigger than the last time (324 inputs rather than 3!). But then again any
modern computer can handle a neural network with a small number of nodes like some hundred
nodes without blinking. This would even work as satisfactory on your cell phone.
All that is left is to train the neural network with images of “8”s and not-“8"s so it learns to differentiate
between them. When we input an “8”, we’ll tell it the likelihood the image is an “8” is 100% and the
likelihood it’s not an “8” is 0%. Vice versa for the opposite images.

Here is some selected images of our training data:

73
We could train this type of neural network in a few minutes on an up-to-date laptop. When the process
is complete, we’ll have a neural network that can identify pictures of “8”s with a pretty high accuracy.

Tunnel Vision

The better part is that our “8” identifier actually does work well on simple images where the letter is
right in the middle of the image:

But now the truly bad part of the news:


Our “8” identifier totally fails to work when the letter isn’t perfectly centered in the image. Just the
smallest amount of position change ruins everything:

74
This is because of the reason our network only learned the pattern of a perfectly-centered “8”. It has
completely no idea what an off-center “8” is. It knows accurately one pattern and one pattern only.
That’s not very worthwhile in the real world. Real world problems are at no time that clean and simple.
So we require to figure out how to make our neural network work in cases where the “8” isn’t
flawlessly centered.
Brute Force Idea #1: Searching with a Sliding Window
We already created a really good program for finding an “8” centered in an image. What if we just
scan all around the image for possible “8”s in smaller sections, one section at a time, until we find
one?

75
This approach called a sliding window. It’s the brute force solution. It works well in some limited
cases, but it’s really inefficient. You have to check the same image over and over looking for objects
of different sizes. We can do better than this!

Brute Force Idea #2: More data and a Deep Neural Net
When we trained our network, we only showed it “8”s that were perfectly centered. Imagine what
happens if we train it with even more data, as well as “8”s in all different positions and sizes all around
the image?
We don’t even require to collect some new training data. We can just develop a script to produce new
images with the “8”s in all kinds of diverse positions in the image:

We generated Synthetic Training Data by creating diverse versions of the training images we already
had. This is a very beneficial technique!
Using this technique, we can effortlessly create a boundless supply of training data.

Additional data makes the problem tougher for our neural network to solve, but we can reimburse for
that by making our network larger and thus able to learn even more complicated patterns.
To make the network larger, we just stack up layer upon layer of nodes:

76
We call this a “deep neural network” because it has even more layers than a old-fashioned neural
network.
This idea has been around since the late 1960s. But until recently, training this large of a neural network
was just too slow to be useful. But once we figured out how to use 3d graphics cards (which were
designed to do matrix multiplication really fast) instead of normal computer processors, working with
large neural networks suddenly became practical. In fact, the exact same NVIDIA GeForce GTX
1080 video card that you use to play Overwatch can be used to train neural networks incredibly quickly.

But even though we can make our neural network really big and train it quickly with a 3d graphics
card that still isn’t going to get us all the way to a solution. We are required to be cleverer about how
we process images into our neural network.
Think about it. It doesn’t make sense to train a network to identify an “8” at the top of a picture
separately from training it to recognize an “8” at the bottom of a picture as if those were two totally
dissimilar objects.
There should be a particular way to make the neural network smart enough to know that an “8”
someplace in the picture is the same thing without all that extra training. Luckily… there is!

The Solution is Convolution


As a human, you instinctively know that pictures have a hierarchy or conceptual structure. Consider
this picture:

77
As a human, you instantly recognize the hierarchy in this picture:
• The ground is covered in grass and concrete
• There is a child
• The child is sitting on a bouncy horse
• The bouncy horse is on top of the grass

Most importantly, we recognize the idea of a child no matter what surface the child is on. We don’t
have to re-learn the idea of child for every possible surface it could appear on.
But right now, our neural network can’t do this. It thinks that an “8” in some different portion of the
image is a completely dissimilar thing. It does not understand that moving an object around in the
image does not make it something unalike. This means it has to re-learn the identity of every single
object in every possible position, which is quite hectic.
We need to provide our neural network with an understanding of translation invariance — an “8” is
an “8” no matter where in the picture it is put up.
We’ll do this using a technique called Convolution. The idea of convolution is inspired partly by
computer science and partly by biology (i.e. mad scientists literally poking cat brains with weird probes
to figure out how cats process images).
How Convolution Works

78
Rather than feeding entire images into our neural network as one grid of numbers, we’re going to do
something a lot smarter that takes benefit of the idea that an object is the identical no matter where it
appears in an image.
Here is how the process is going to be in step by step manner:
Step 1: Break the image into overlapping image tiles
Similar to our sliding window search above, let’s pass a sliding window over the entire original image
and save each result as a separate, tiny picture tile:

The process resulted into converting our actual image into a 77 equally-sized tiny image tiles.

Step 2: Feed each image tile into a small neural network


Previously, we fed a single image into a neural network to check if it was an “8”. We’ll do the exact
similar thing here, but we’ll do it for each individual image tile:

79
Repeat this process 77 times, once for each tile.
Nevertheless, there’s one big twist: We’ll keep the same neural network weights for every single tile in
the original image. In other words, we are handling every image tile in the same way. If something
interesting appears in any given tile, we’ll mark that tile as an interesting tile.
Step 3: Store the results from each tile into a new array. We don’t want to lose track of the arrangement
of the original tiles. So we store the result from processing each tile into a grid in the same arrangement
as the original image. It looks like this:

80
In other words, we’ve started with a large original image and we ended with a slightly smaller array
that stores the information about which sections of our original image were the most interesting.

Step 4: Downsampling
The result of Step 3 was an array that maps out which parts of the original image are the most
interesting. But that array is still very large enough:

To decrease the size of this array, we downsample it using an algorithm called max pooling. It sounds
something advanced, but it isn’t at all.
We’ll just observe each 2x2 square of the array and keep the biggest number:

81
The technique used here is that if we find something interesting in any of the four input tiles that
makes up each 2x2 grid square, we’ll just save the most interesting bit. This minimizes the size of our
array while keeping the most important bits.
Final step: Make a prediction
So far, we’ve decreased a giant image down into a fairly small array.
This array too is just a bunch of numbers, so we can use that small array as input into another neural
network. This final neural network will choose if the image is or isn’t a match. To distinguish it from
the convolution step, we call it a “fully connected” network.
So from start to finish, our whole five-step pipeline looks like this:

Some more additional steps:


The image processing pipeline is a sequence of steps which are convolution, max-pooling, and finally
a fully-connected network.
While solving problems in the real world, these steps can be joined and piled as many times as you
want! You can have two, three or even more number of layers, like ten convolution layers. You can
toss in max pooling wherever you want to decrease the size of your data.
The plain idea is to start with a large image and continually boil it down, step-by-step, until you finally
have a single result. Higher the number of convolution steps you have, greater will be the complicated
features your network will be able to learn to recognize.
For instance, the initial convolution step might learn to identify sharp edges, the second convolution
step might identify beaks using its knowledge of sharp edges, the third step might recognize entire
birds using its knowledge of beaks, etc.
Here’s what a more realistic deep convolutional network (like you would find in a research paper)
looks like:

82
In this situation, they start with a 224 x 224 pixel image, apply convolution and max pooling two times,
apply convolution 3 more number of times, apply max pooling and then have two fully-connected
layers. The final result is that the image is categorized into one of 1000 categories!

Constructing the Right Network


How do you know which steps you need to combine to make your image classifier work properly?
Fairly, you have to reply to this by doing a lot of experimentation and testing. You might have to train
100 networks before you find the best structure and parameters for the problem you are solving.
Machine learning includes a lot of trial and error!

Building our Bird Classifier

At the moment as a final point we know enough to write a program that can decide if an image is a
bird or not.
As usual, we need some data to get on track. The free CIFAR10 data set contains 6,000 pictures of
birds and 52,000 pictures of things that are not birds. But to get even more data we’ll also add in the
Caltech-UCSD Birds-200–2011 data set that has another 12,000 bird pictures.

Here are a few of the birds from our combined data set:

83
And here are some of the 52,000 non-bird images:

This data set will do fine for our intentions, but 72,000 low-res images is still pretty small for real-
world applications. If you want Google-level performance, you require millions of large images. In
machine learning, having more amount of data is almost always more significant that having superior
algorithms. Now you know why Google is so happy to offer you limitless photo storage. They want
all of your image data.
To make our own classifier, we’ll use TFLearn. TFlearn is a wrapper around Google’s TensorFlow
deep learning library that disclosures a simplified API. It lets us build convolutional neural networks
as easy as writing a few lines of code to describe the layers of our network.
Now that we have a trained neural network, we can utilize it. Here’s a simple script that takes in a
single image file and predicts if it is a bird or not.

84
# -*- coding: utf-8 -*-

"""
Based on the tflearn example located here:
https://github.com/tflearn/tflearn/blob/master/examples/images/convnet_cifar1
0.py
"""
from __future__ import division, print_function, absolute_import

# Import tflearn and some helpers


import tflearn
from tflearn.data_utils import shuffle
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.estimator import regression
from tflearn.data_preprocessing import ImagePreprocessing
from tflearn.data_augmentation import ImageAugmentation
import pickle

# Load the data set


X, Y, X_test, Y_test = pickle.load(open("full_dataset.pkl", "rb"))

# Shuffle the data


X, Y = shuffle(X, Y)

# Make sure the data is normalized


img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()

# Create extra synthetic training data by flipping, rotating and blurring the
# images on our data set.
img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_rotation(max_angle=25.)
img_aug.add_random_blur(sigma_max=3.)

# Define our network architecture:

# Input is a 32x32 image with 3 color channels (red, green and blue)
network = input_data(shape=[None, 32, 32, 3],
data_preprocessing=img_prep,
data_augmentation=img_aug)

# Step 1: Convolution
network = conv_2d(network, 32, 3, activation='relu')

# Step 2: Max pooling


network = max_pool_2d(network, 2)

# Step 3: Convolution again


network = conv_2d(network, 64, 3, activation='relu')

# Step 4: Convolution yet again


network = conv_2d(network, 64, 3, activation='relu')

85
# Step 5: Max pooling again
network = max_pool_2d(network, 2)

# Step 6: Fully-connected 512 node neural network


network = fully_connected(network, 512, activation='relu')

# Step 7: Dropout - throw away some data randomly during training to prevent
over-fitting
network = dropout(network, 0.5)

# Step 8: Fully-connected neural network with two outputs (0=isn't a bird,


1=is a bird) to make the final prediction
network = fully_connected(network, 2, activation='softmax')

# Tell tflearn how we want to train the network


network = regression(network, optimizer='adam',
loss='categorical_crossentropy',
learning_rate=0.001)

# Wrap the network in a model object


model = tflearn.DNN(network, tensorboard_verbose=0, checkpoint_path='bird-
classifier.tfl.ckpt')

# Train it! We'll do 100 training passes and monitor it as it goes.


model.fit(X, Y, n_epoch=100, shuffle=True, validation_set=(X_test, Y_test),
show_metric=True, batch_size=96,
snapshot_epoch=True,
run_id='bird-classifier')

# Save model when training is complete to a file


model.save("bird-classifier.tfl")
print("Network trained and saved as bird-classifier.tfl!")

# Load the image file


img = scipy.ndimage.imread(args.image, mode="RGB")

# Scale it to 32x32
img = scipy.misc.imresize(img, (32, 32), interp="bicubic").astype(np.float32,
casting='unsafe')

# Predict
prediction = model.predict([img])

# Check the result.


is_bird = np.argmax(prediction[0]) == 1

if is_bird:
print("That's a bird!")
else:
print("That's not a bird!")

If you are training with a good video card with enough RAM (like an Nvidia GeForce GTX 980 Ti or
better), this will be done in less than an hour. If you are training with a normal CPU, it might take a
lot longer.

86
As it trains, the accuracy will increase. After the first pass, I got 75.4% accuracy. After just 10 passes,
it was already up to 91.7%. After 50 or so passes, it capped out around 95.5% accuracy and additional
training didn’t help, so I stopped it there.
Congrats! Our program can now recognize birds in images!

Testing Our Network

On the other hand to really see how effective our network is, we need to check it with lots of images.
The data set created by me held back 15,000 images for validation. On running those 15,000 images
through the network, it predicted the correct answer 95% of the time.

Let give the answer to how precise is 95% accuracy?


Our network titles to be 95% accurate. But such is not the case. This could mean all sorts of different
things.
For instance, what if 5% of our training images were birds and the other 95% were not birds? A
program that predicted “not a bird” every single time would be 95% accurate! Then it would also be
100% useless.
We need to observe more closely at the numbers than just the overall accuracy. To judge how upright
a classification system really is, we need to look closely at how it failed, not just the percentage of the
time that it failed.

Instead of thinking about our predictions as “right” and “wrong”, let’s break them down into four
separate categories
• First, here are specific of the birds that our network correctly identified as birds. We call
these True Positives:

Wow! Our network can recognize lots of different kinds of birds successfully!
• Second, here are images that our network correctly identified as “not a bird”. These are
called True Negatives:

87
Horses and trucks don’t fool us!
• Third, here are some images that we thought were birds but were not really birds at all.
These are our False Positives:

Lots of planes were mistaken for birds! That makes sense.


• And finally, here are some images of birds that we didn’t correctly recognize as birds. These
are our False Negatives:

Using our validation set of 15,000 images, here’s how many times our predictions fell into each
category:

88
The answer to "Why do we break our results down like this?" is because not all mistakes are created
equal.
Imagine if we were writing a program to detect cancer from an MRI image. If we were detecting
cancer, we’d rather have false positives than false negatives. False negatives would be the worst
possible case — that’s when the program told someone they definitely didn’t have cancer but they
actually did.
Instead of just looking at overall accuracy, we calculate Precision and Recall metrics. Precision and
Recall metrics give us a clearer picture of how well we did:

This tells us that 97% of the time we guessed “Bird”, we were right! But it also tells us that we only
found 90% of the actual birds in the data set. In other words, we might not find every bird but we are
pretty sure about it when we do find one!

89
Major NN projects

Recognition of Braille Alphabet Using Neural Networks

The main goal is to train neural network, to be able to recognize which character of Braille alphabet
is inputted. For testing we used Serbian Cyrillic Braille alphabet.
As each letter is represented by six dots, we will have a matrix of dimension 3x2. Each component of
the matrix signifies one input. We will have 6 inputs, for each dot.
As far as output is concerned, the number will vary depending on the architecture.
Shuttle Landing Control

Implementing Space Shuttle Landing Control mechanism using Neuroph framework by training the
neural network that uses Shuttle Landing Control data set.
Central goal of this experiment is to train neural network for predicting the conditions under which
an auto landing would be preferable to manual control of the spacecraft.
Music Classification by Genre Using Neural Networks

Music classification is a pattern identification problem which includes extraction features and
establishing classifier. Artificial neural network have found reflective success in the area of pattern
recognition, it can be trained to distinguish the standards used to classify, and can do so in a generalized
manner by repetitively showing a neural network inputs classified into groups. Neural network
provides a fresh solution for music classification, so a new music classification method is projected
based on BP neural network in this experiment.
Face Recognition Using Neural Network

The main goal is to train the neural network to identify a face from any picture. The neural network
takes some image's parameters for input and tries to predict a person who has this corresponding
characteristic.
Concept Learning and Classification - Hayes-Roth Data Set

A sample of a multivariate data type classification problem using Neuroph. In this assignment we test
Neuroph 2.4 with Hayes-Roth Data Set. Quite a lot of architectures are tried out, and decided which
ones represent a good solution to the problem, and which ones do not.

Predicting Poker Hands with Neural Networks

90
The core goal is to train the neural network to predict which poker hand we have on the basis of cards
we give as input attributes. The database was acquired from the Carleton University, Department of
Computer Science Intelligent Systems Research Unit in Canada.
The data set comprises more than 25000 instances but because of software limitations the project was
worked with shorter version of 1003 instances.
Predicting Relative Performance of Computer Processors with Neural Networks

The goal is to train the neural network to predict relative performance of a CPU using some features
that are used as input, and consequently comparing that result with existing performance that is
published and relative performance that is projected using linear regression method.
Predicting Survival of Patients Using Habermans Data Set

Foretelling survival of patients who had undergone surgery for breast cancer. The objective is to train
the neural network to predict whether a patient survived after breast cancer surgery, when it is given
other characteristics as input.
Predicting the Class of Breast Cancer with Neural Networks

The main goal is to train the neural network to predict whether a breast cancer is malicious or gentle,
when it is given other attributes as input.
Breast Tissue Classification Using Neural Networks

Train the neural network to predict to which cluster of six classes removed breast tissue belongs. The
objective is to train the neural network to predict which cluster of six classes of freshly removed tissue
the Breast Tissue belongs, when it is given other characteristics as input.
Classification of Animal Species Using Neural Networks

The purpose of this experiment is to study the feasibility of classification animal species using neural
networks. An animal class is made up of animal that are all alike in important ways. Hence we need to
train a neural network to make it able to predict which species fit to a particular set. Once we have
decided on a problem to solve using neural networks, we will want to gather data for training purposes.
The training data set includes a various variety of cases, each comprising values for a range of input
and output variables.
Another variant for this type of project is classification of animal species on the basis of 17 Boolean-
valued attributes.

Car Evaluation Using Neural Networks

This project is for testing Neuroph with Car Dataset which can be found here:
http://archive.ics.uci.edu/ml/datasets/Car+Evaluation.

91
Several architectures will be tried out, and it will be determined which ones represent a good solution
to the problem, and which ones does not. Car Evaluation Data set was obtained from a modest
hierarchical decision model.

The model evaluates cars according to the following concept structure:


o car acceptability
o overall price
o buying price, maint price of the maintenance
o comfort
o number of doors, persons capacity in terms of persons to carry, the size of luggage boot, safety
of the car
Lenses Classification Using Neural Networks

Neuroph framework is to be used to train the neural network that uses the Database for fitting contact
lenses (Lenses data set). The dataset used is taken from a paper by Cendrowska (1988) on the inductive
examination of a set of ophthalmic data. The lenses data set tries to predict whether a person will need
soft contact lenses, hard contact lenses or no contacts, by determining related features of the client.
The data set has 4 features (age of the patient, spectacle prescription, notion on astigmatism, and
information on tear production rate) along with an associated three-valued class that gives the suitable
lens prescription for patient (hard contact lenses, soft contact lenses, no lenses).
Balance Scale Classification Using Neural Networks

Using Neuroph framework for training the neural network that uses Balance Scale data set. Balance
Scale data set was generated to model psychological experimental results. Each example is classified
as having the balance scale equal to one of the following three: tip to the right, tip to the left, or be
balanced.
The characteristics are the left weight, the left distance, the right weight, and the right distance. The
correct method to find the class is the greater of (left-distance * left-weight) and (right-distance * right-
weight). If they are equivalent then it is balanced.
Main objective of this experiment is to train neural network to classify this 3 type of balance scale.
Blood Transfusion Service Center

Teach the neural network to predict whether a blood donor gave blood in March 2007 based on
features that are provided as input parameters.

Predicting the Result of Football Match with Neural Networks

The main goal of this problem is to produce and train neural network to predict whether home team
wins, visitor team wins or it will be draw in Barclays Premier League, given some characteristics as
input. First we need data set. For this problem we pick results of Premier League season 2011/12.

92
Because of great number of matches we haphazardly sampled 106 results. Each result has 8 input and
3 output attributes.

Input attributes are:


1. Home team goalkeeper rating
2. Home team defence rating
3. Home team midfield rating
4. Home team attack rating
5. Visitor team goalkeeper rating
6. Visitor team defence rating
7. Visitor team midfield rating
8. Visitor team attack rating

Output attributes are:


1. Home team wins
2. Draw
3. Visitor team wins
Predicting the Workability of High-Performance Concrete

These days, mix design of high-performance concrete is more complex because it involves many
variables and includes various mineral and chemical added mixtures. Up to the present time, the
construction industry had to rest on a relatively few human experts to give approvals in solving high
performance concrete mix design problem. This would usually need costly human expert. However
the situation may be improved with the implementation of artificial intelligence that manipulates the
human brain in the way of thinking and giving suggestion. The usefulness of artificial intelligence in
solving difficult problems has turn out to be recognized and their development is being followed in
many fields.
Concrete Compressive Strength Test

Concrete is the utmost important material in civil engineering. The concrete’s compressive strength is
an extremely nonlinear function of age and ingredients.
For this mission, we will use Neuroph framework and Concrete Compressive Strength dataset.
Glass Identification Using Neural Networks

The goal of this project is to use Neuroph framework for training the neural network that uses Glass
Identification data set to categorize the glass in some of the predefined classes. Glass Identification
data set was made to help in criminological investigation. At the scene of the crime, the glass left can
be used as evidence, but only if it is appropriately identified. Each example is classified as he following:
building_windows_float_processed, building_windows_non_float_processed,
vehicle_windows_float_processed, vehicle_windows_non_float_processed, containers, tableware
and headlamps.

93
The features are RI: refractive index, Na: Sodium (unit measurement: weight percent in corresponding
oxide, as are attributes 4-10), Mg: Magnesium, Al: Aluminum, Si: Silicon, K: Potassium, Ca: Calcium,
Ba: Barium, Fe: Iron.
Main aim of this experiment is to train neural network to classify into this 7 types of glass.
Teaching Assistant Evaluation

The main goal is to train the neural network with data, which can be found online, to classify the
quality of the teachers’ performance. The data set consists of evaluations of teaching performance
over three consistent semesters and two summer semesters of 164 teaching assistant (TA) assignments
at the Mathematics Department of the University of Wisconsin-Madison. The scores were distributed
into 3 roughly equal-sized classes ("low", "medium", and "high") to form the class variable.
Predicting Protein Localization Sites Using Neural Networks

The main aim of this project is to create and train neural network to predict protein localization sites.
The initial step to any Machine learning approach is to get the data set. Here we choose results for
Predicting Protein Localization Sites in Eukaryotic Cells.
Predicting the Religion of European States Using Neural Networks

The aim of this ML problem is to create and train neural network to predict the religion of European
countries, providing some features as input. As usual we require a data set. The data that we use in
this experiment can be found at Europe Data Center. Data that are collected referring to 49 European
countries. Each country has 26 input features and 1 output attribute that is the religion.

Input features are:


1. Region of Europe where the country is
2. Total Area that the country covers (in thousands of square km)
3. Population (in round millions)
4. Language of the country
5. Number of vertical bars in the flag
6. Number of horizontal stripes in the flag
7. Number of different colors in the flag
8. If red is present or not in the flag
9. If green is present or not in the flag
10. If blue is present or not in the flag
11. If gold is present or not in the flag
12. If yellow is present or not in the flag
13. If white is present or not in the flag
14. If black is present or not in the flag
15. If orange is present or not in the flag

94
16. .Major color in the flag (tie-breaks decided by taking the topmost shade, if that fails then the
most central shade, and if that fails the leftmost shade)
17. Number of circles in the flag
18. Number of upright crosses
19. Number of diagonal crosses
20. Number of sun or star symbols
21. If a crescent moon symbol is present or not
22. If any triangles are present or not
23. If an animated image (e.g., an eagle, a tree, a human hand) is present or not
24. If any letters or writing on the flag (e.g., a motto or slogan) is present or not
25. Witch color is in the top-left corner (moving right to decide tie-breaks)
26. Which color is in the bottom-right corner (moving left to decide tie-breaks)

Output attribute is: - Religions of each country

Predicting the Burned Area of Forest Fires Using Neural Networks

Our main goal here is to utilize the twelve input features (in the original data set) to predict the burned
area of forest fires. The output "area" was first transformed with a ln(x+1) function. At that moment,
numerous Data Mining methods were applied. Next, fitting the models, the outputs were post-
processed with the inverse of the ln(x+1) transform. Four dissimilar input setups were used. The
experiments were performed using a 10-fold (cross-validation) x 30 runs. Two regression metrics were
measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed with only 4 direct
weather conditions (temp, RH, wind and rain) obtained the best MAD value: 12.71 +- 0.01 (mean and
confidence interval within 95% using a t-student distribution). The best RMSE was attained by the
naive mean predictor. An analysis to the regression error curve (REC) indicates that the SVM model
predicts more examples within a lower known error. In effect, the SVM model predicts better small
fires, which are the majority.

Wine Classification Using Neural Networks

In this project we try to build a neural network that can classify wines from three wineries by thirteen
attributes:
1. Alcohol
2. Malic Acid
3. Ash
4. Ash Alcalinity
5. Magnesium
6. Total Phenols
7. Flavanoids
8. Nonflavanoid Phenols

95
9. Proanthocyanins
10. Color Intensity
11. Hue
12. OD280/OD315 of dedulted wines
13. Proline

This is a case of a pattern recognition problem, where inputs are associated with different classes, and
we would like to construct a neural network that not only classifies the known wines properly, but
also can generalize to accurately classify wines that were not used to design the solution. The thirteen
neighborhood attributes will act as inputs to a neural network, and the respective output for each will
be a 3-element row vector with a 1 in the position of the associated winery, #1, #2 or #3.
NeurophRM: Integration of the Neuroph Framework into RapidMiner

The learning of artificial neural networks (NN) is ubiquitous in the research literature, and covers its
application and interest in many research fields, including computer science, artificial intelligence,
optimization, data mining, statistics, even bioinformatics, medicine, and many more.

Despite some shortcomings that NNs have, like the lack of the interpretability of the built model, it
is still a broadly used technique and counted in most data analytics frameworks. Since the neural
network model is hard to understand, software packages, especially commercial ones, typically simplify
the NN model, reducing it to several parameters that users can modify. There are only few software
products that offer full range of neural network customizable models, and they require proficiency in
understanding the neural network paradigm. In open-source community, there are currently several
stable neural network frameworks that bid to experts the tool for full customization of NN models.

Since RapidMiner is an open-source framework, connection to one of these NN frameworks would


draw attention of more users, proposing a more customizable and powerful NN tool for managing
various data mining tasks. This is especially true for NN experts, who would certainly find RapidMiner
a useful tool for overall data analysis and all the logistic support for using NN models, including
preprocessing, evaluation, comparison with different algorithms, etc.
)

96
Open sources resources

Some of the resources for learning Artificial neural network are here while most of them are open
source some of them may cost you a bit:

• Coursera — Machine Learning (Andrew Ng)


• Coursera — Neural Networks for Machine Learning (Geoffrey Hinton)
• Udacity — Intro to Machine Learning (Sebastian Thrun)
• Udacity — Machine Learning (Georgia Tech)
• Udacity — Deep Learning (Vincent Vanhoucke)
• Machine Learning (mathematicalmonk)
• Practical Deep Learning For Coders (Jeremy Howard & Rachel Thomas)
• Stanford CS231n — Convolutional Neural Networks for Visual Recognition (Winter 2016)
(class link)
• Stanford CS224n — Natural Language Processing with Deep Learning (Winter 2017) (class
link)
• Oxford Deep NLP 2017 (Phil Blunsom et al.)
• Reinforcement Learning (David Silver)
• Practical Machine Learning Tutorial with Python (sentdex)

97
Issues and Challenges

Uncertainty

A drawback of Artificial Neural Networks is that the uncertainty in the predictions generated is rarely
computed. Failure to reason for such uncertainty makes it impossible to measure the quality of ANN
predictions, which harshly confines their efficiency. In an effort to report this, a few researchers have
applied Bayesian techniques to ANN training.

Lots and Lots of Data

Deep learning algorithms are trained to learn progressively using data. Large data sets are required to
make sure that the machine delivers anticipated results. As human brain requires a lot of experiences
to learn and infer information, the parallel artificial neural network needs abundant amount of data.
The more powerful abstraction you want, the more parameters need to be tweaked and more
parameters require more amount of data.
For instance, a speech recognition program would call for data from multiple languages, demographics
and time scales. Researchers feed terabytes of data for the algorithm to learn a single dialect. This is a
time-consuming process and necessitates marvelous data processing capabilities. To some level, the
scope of solving a problem through Deep Learning is subjected to availability of huge corpus of data
it would train on.
The complexity of a neural network can be expressed through the number of parameters. In the case
of deep neural networks, this number can be in the range of millions, tens of millions and in some
cases even hundreds of millions. Let’s call this number P. Since you want to be certain of the model’s
ability to generalize, a good rule of a thumb for the number of data points is at least P*P.
Overfitting in Neural Networks

At times, there is a sharp variance in error occurred in training data set and the error encountered in
a new unobserved data set. It happens in complex models, such as having too many parameters relative
to the number of observations. The efficacy of a model is judged by its capability to perform well on
an unobserved data set and not by its performance on the training data fed to it.

98
The Training error is in blue and the Validation error is in red (Overfitting) as a function of the number
of cycles. In general, a model is normally trained by make the most of its performance on a particular
training data set. The model thus memorizes the training examples but does not learn to generalize to
new situations or unseen observations of the data set.

Hyperparameter Optimization

Hyperparameters are the parameters whose values are defined prior to the beginning of the learning
process. Altering the value of such parameters by a minor amount can invoke a large change in the
performance of your model.
Depending on the default parameters and not performing Hyperparameter Optimization can have a
substantial impact on the model performance. Also, having too little hyperparameters and hand tuning
them rather than optimizing through proven techniques is also a performance driving aspect.

Requires High-Performance Hardware

Training a data set for a Deep Learning solution needs a lot of data. To accomplish a task to solve real
world problems, the machine needs to be equipped with satisfactory processing power. To guarantee

99
better efficiency and less time consumption, data scientists switch to multi-core high performing
GPUs and similar processing units. These processing units are expensive and consume a lot of power.

Facebook’s Oregon Data Center is shown in the figure above. Industry level Deep Learning systems
require high-end data centers while smart devices such as drones, robots other mobile devices need
small but efficient processing units. Deploying Deep Learning solution to the real world thus becomes
a costly and power consuming affair.

Neural Networks Are Essentially a Blackbox

We know our model parameters, we feed labeled data to the neural networks and also feed how they
are put together. But we usually do not recognize how they come to at a particular solution. Neural
networks are essentially Blackboxes and researchers have had a hard time understanding how they
infer conclusions. The deficiency of ability of neural networks for reason on an abstract level makes
it difficult to implement high-level cognitive functions. Also, their operation is mostly invisible to
humans, rendering them inappropriate for domains in which verification of process is important.
Conversely, Murray Shanahan, Professor of Cognitive Robotics at Imperial College London, has
produced a paper with his team which discusses Deep Symbolic Reinforcement Learning, which
platforms advancements in solving above-mentioned hurdles.
Lack of Flexibility and Multitasking
100
Deep Learning models, once trained, can deliver extremely efficient and perfect solution to a particular
problem. However, in the current scene, the neural network architectures are greatly specialized to
particular spheres of application.

Most of our systems works on this subject, they are extremely good at solving one problem. Even
solving a much related problem requires retraining and reevaluation. Researchers are working tough
in developing Deep Learning models which can multitask without the requirement of reworking on
the whole architecture.
Even though, there are small developments in this facet using Progressive Neural Networks. Also,
there is substantial progress towards Multi Task Learning (MTL). Researchers from Google Brain
Team and University of Toronto presented a paper on MultiModel, a neural network architecture that
lures from the victory of vision, language and audio networks to concurrently solve a number of
problems covering multiple domains, including image recognition, translation and speech recognition.

Deep Learning may be one the chief research domains for Artificial Intelligence, but it definitely is
not flawless. While discovering new and less explored territories of cognitive technology, it is very
usual to come across some hurdles and complications. Some is the case with any technological
progress. The future witnesses the answer for the question “Is Deep Learning our best solution
towards real AI?”

101
Applications of ANN
Speech Recognition

Speech inhabits a noticeable role in human-human interaction. Therefore, it is ordinary for people to
expect speech interfaces with computers. In the current era, for communication with machines,
humans still require sophisticated languages which are hard to learn and use. To ease this
communication barrier, a simple solution could be, communication in a spoken language that is
promising for the machine to understand.

Excessive progress has been made in this field, however, still such kinds of systems are facing the
problem of inadequate vocabulary or grammar along with the issue of retraining of the system for
dissimilar speakers in diverse conditions. ANN is playing a key role in this domain. Following ANNs
have been used for speech recognition −

• Multilayer networks
• Multilayer networks with recurrent connections
• Kohonen self-organizing feature map

The handiest network for this is Kohonen Self-Organizing feature map, which has its input as tiny
segments of the speech waveform. It will map the same kind of phonemes as the output array, called
as feature extraction technique. After extracting the features, with the assistance of some audio models
as back-end processing, it will identify the utterance.

Character Recognition

It is a fascinating problem which comes under the general domain of Pattern Recognition. Countless
neural networks have been developed for automatic recognition of handwritten characters, either
letters or digits. Following are some ANNs which have been used for character recognition −

• Multilayer neural networks such as Backpropagation neural networks.


• Neocognitron

However back-propagation neural networks have numerous hidden layers, the pattern of connection
from one layer to the next is localized. Similarly, neocognitron also has more than a few hidden layers
and its training is done layer by layer for such type of applications.

Signature Verification Application

102
Signatures are one of the most beneficial ways to authorize and authenticate a person in legal
transactions. Signature verification system is a non-vision based technique.

The leading approach in this application is to extract the feature or rather the geometrical feature set
representing the signature. With these feature sets, we have to train the neural networks using an
efficient neural network algorithm. This trained neural network will classify the signature as being
genuine or forged under the verification stage.

Human Face Recognition

It is one of the biometric approaches to recognize the given face. It is a distinctive task because of the
characterization of “non-face” images. Although, if a neural network is finely trained, then it can be
distributed into two classes namely images having faces and images that do not have faces.

Initially, each of the input images must be preprocessed. Then, the dimensionality of that image must
be condensed. And, finally it must be classified using neural network training algorithm. Following
neural networks are used for training purposes with preprocessed image −

Fully-connected multilayer feed-forward neural network trained with the help of back-propagation
algorithm.

For dimensionality reduction, Principal Component Analysis (PCA) is used.


Image Compression

Neural networks can accept and process massive amounts of information at once, making them
convenient in image compression. With the Internet outburst and more sites using more images on
their contents, using neural networks for image compression is worth a look.
Stock Market Prediction

The everyday business of the stock market is really complicated. Several factors weigh in whether a
given stock will go up or down on any particular day. Since neural networks can inspect a lot of
information quickly and sort it all out, they can be used to predict stock prices easily.

Traveling Saleman's Problem

Amusingly enough, neural networks can solve the traveling salesman problem, but only to a certain
notch of approximation.
Medicine, Electronic Nose, Security, and Loan Applications - These are certain applications that are
in their proof-of-concept stage, with the acceptation of a neural network that will decide whether or
not to grant a loan, something that has already been used more successfully than many humans.

103
Future in NN

All current NN technologies will most likely be vastly improved upon in the future. Everything from
handwriting and speech recognition to stock market prediction will become more sophisticated as
researchers develop better training methods and network architectures.
NNs might, in the future, allow:

• Robots that can see, feel, and predict the world around them
• Improved stock prediction
• Common usage of self-driving cars
• Composition of music
• Handwritten documents to be automatically transformed into formatted word processing
documents
• Trends found in the human genome to aid in the understanding of the data compiled by the
human genome project
• Self-diagnosis of medical problems using neural networks
• And much more!

104
Deep Learning: What & Why?

Chapter Introduction

If you have ever heard of machine learning or artificial intelligence, then you have a general
understanding of what deep learning is. Essentially, deep learning is a subfield of machine learning.
This particular learning system is unlike other programs, where “learning” is completed using factual
algorithms and set representations. Instead, a deep learning network comprises algorithms that have
been inspired by the structure and function of the human brain. As a result, deep learning machines
have the ability to learn new information and to use it.

The networks on which deep learning is processed are called “artificial neural networks” because of
the fact that they represent true neural pathways in the human brain. This might sound confusing right
now, but as we progress through this chapter and this book you will begin to understand exactly what
deep learning is, how it works, and why it is an important system in today’s world. If you continue to
feel confused, however, fret not. The processes of deep learning are still being studied and understood
by scientists, so not everything is completely known or understood, yet. However, they have done a
lot in regard to shedding light on what deep learning truly is, and so we can learn a great deal about it
and how it works.

105
Deep Learning and Artificial Neural Networks

Artificial neural networks (ANNs) are the basis for deep learning, but the development of deep
learning networks goes far beyond artificial neural networks. Since the development of artificial neural
networks, scientists and researchers have been focusing on developing and using deep neural
networks. These systems are extremely similar to artificial neural networks; however, they have some
key differences. They are the future of deep learning, and as the deep learning system progresses it is
expected that deep neural networks will eventually replace artificial neural networks entirely.
As you know, artificial neural networks are shallow networks that have been developed in an attempt
to mimic the animal brain. However, they do have many disadvantages, part of which is that they are
large and require many systems in order to work, and that they can only complete a small number of
specialized tasks. As far as deep learning goes, the goal is to eventually create an entirely pure artificial
intelligence that requires little to no input from a human in order to function effectively and learn and
grow on its own. In order to attempt to draw us closer to that, the deep neural network was created.
The deep neural network is an advanced version of the artificial neural network. It features smaller
units and less “stuff”, but has the ability to perform the same complex functions as artificial neural
networks. The architects of these networks have focused on a few basic approaches as an opportunity
to develop a newer and more useable program that allows the deep learning network to perform
effectively and efficiently whilst taking up significantly less space.

To understand the benefits and purpose of this activity, one might consider the evolution of
computers themselves. A few decades ago, computers were massive machines that took up entire
rooms and operated extremely slowly. Now, however, they are small and speedy devices that can fit
inside of our pocket. Just like computers have evolved in this way, scientists and researchers are
seeking to get artificial neural networks to become significantly smaller so that they are more useable
and can operate more efficiently. Naturally, this requires them to refine the techniques and develop
new systems that will allow the device to complete the same functions with higher quality results while
also taking up less space. This is why the deep neural network was created.

106
What Is Deep Learning, Exactly?

Deep learning is essentially a system of neural networks that are used to help try and make learning
algorithms much easier to use as well as much better in their abilities. They are also used to make
revolutionary advances in artificial intelligence, as well as machine learning, and ultimately to create
what will eventually be a true and complete artificial intelligence.
In the past, computing systems were too slow and we didn’t have enough information to create the
networks required to make a true artificial intelligence. What that means is that we were not advanced
enough to create an artificial intelligence that would be able to completely act, think, and live on its
own. Until now, we have only been able to create partial artificial intelligences that are able to do
certain things on their own based on a set of metrics that have been programmed into their intelligence
network.
With the introduction of deep learning, however, we have been able to begin developing large neural
network systems that are capable of taking in information and responding to it on their own. The end
goal will be to create neural networks that mimic the human brain and operate as their own
intelligence, completely independent and capable of learning new things and advancing all on their
own without human manipulation. Essentially, we would create an artificial species and introduce it
into the world.
At the very foundation of all of these advances and progressions in artificial intelligence lies deep
learning. In the past, the algorithms used were slow. While they could process a large quantity of data,
their performance was limited. Deep learning combines performance with data intake to create an
optimal and advanced system that out-competes anything we have ever used before.

What Is Learned?

Currently, deep learning is used in practice under conditions of supervised learning. This means that
the deep learning machines can only learn from labelled data that they have has been exposed to. They
have not yet been left unsupervised, and likely won’t be for quite some time. At the very least, they
won’t be unsupervised until they are reliable and entirely understood.
Over time, the goal is to open up the deep learning networks and artificial intelligences to unsupervised
learning, where they will be able to learn anything they want without restriction on what they can or
cannot access. When they are able to access this data without restriction, artificial intelligences will
then become complete and will be an artificial intelligence in the truest sense of the word.
Feature Learning

Because deep learning is based on large neural networks, it has the ability to be scaled to incredible
sizes. These networks can also be trained with “feature learning”, however, which means that they can
perform automatic features based on raw data. Essentially, the artificial intelligences will be able to
intake raw data and have pre-programmed feature responses that will help them to respond efficiently
and in a positive manner. This means that they will be able to learn data, but then turn it into their

107
own learning process, growing away from human-inputted features so that they can operate entirely
on their own without any human implications or manipulations.
When we learn about things as humans, we often build on concepts, allowing us to learn more and
more. That is how we are able to learn virtually everything: we start small and grow from there. The
idea with artificial intelligences is to teach hierarchy feature learning, which means that the artificial
intelligences will be able to take something they’ve learned, such as human-inputted information, and
grow on that concept to create their own concept. They will also be able to advance similarly to how
humans are presently advancing: by building on concepts and moving forward through the natural
learning process.
Why is it Called “Deep Learning”?

In some of the earliest literature on deep learning, Geoffrey Hinton was a co-publisher of the first
paper written on the backpropagation algorithm. In that paper, he was said to have used the word
“deep” to represent the development of large artificial neural networks. It is likely that he meant that
they were large and went much “deeper” than the networks they were using to date in the field of
artificial intelligence.
Later, in 2006 he co-authored a paper where he stated: “Using complimentary priors, we derive a fast,
greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top
two layers form an undirected associative memory.”
He has also written more papers where he uses the word deep to describe the abilities of the algorithm
and the learning network and how these are used to grow and advance from things these are taught
in the beginning. He has been cited hundreds, if not thousands, of times, using deep as a descriptive
factor for how the network operates and its abilities. As a result, the term has been promoted into the
title and the entire program is now called “deep learning” instead of “artificial neural networks”.
Essentially, “deep learning” is the name of the system and “artificial neural networks” is what they are
composed of.

Deep learning permits computational models that consist of various preparing layers to learn
portrayals of information with various levels of reflection. These techniques have drastically enhanced
the cutting edge in speech and visual recognition, and object detection as well as numerous different
fields, such as drug disclosure and genomics.

108
History of Deep Learning

Deep learning is a popular topic today, as organizations attempt to discover how to utilize progressed
computational strategies to discover helpful data inside enormous swaths of information. While the
field of artificial intelligence is decades old, leaps forward in the field of simulated neural systems are
driving the sudden increase in deep learning.
Attempts to make computerized reasoning go back decades. In the wake of World War II, the English
mathematician and code breaker Alan Turning penned a definition for genuine computerized
reasoning. He named it the Turing Test, whereby a conversational machine would need to persuade a
human that they were conversing with another human.
After 60 years, a computer finally breezed through the Turing Test in 2014, when automated bot
created by the University of Reading named "Eugene" persuaded 33% of the judges gathered by the
Royal Society in London that it was genuine. It was the first occasion that the 30% limit had been
surpassed.

From that point forward, the field of deep learning and AI has begun expanding as computers are
moving closer nearer to conveying human-level capacities. Consumers interact with a variety of bots

109
like Apple's Siri, Amazon's Alexa, and Microsoft's Cortana, which utilize regular language and machine
in order to work out how to answer questions.
Presently, multiple organizations are hoping to use their huge informational indexes as a preparation
ground to create better AI programs that can associate with the world in regular ways, while
concentrating helpful data in ways that have not yet been determined. Scientists have discovered that
amix of cutting-edge neural systems; prepared accessibility of immense masses of information, and,
to great degree, effective circulated GPU-based frameworks can give us the plans to make shrewd,
self-learning machines that can begin to equal people in their cognition.
A definitive wellspring of deep learning's energy is the manner in which it empowers computers to
separate the contribution of essential bits and after that construct a deliberation layer above it that
relates to our impression of reality.
Deep Learning Timeline

Below is a timeline that will help you see how deep learning has been established and the growth it
has accumulated along the way. This timeline will give you insight to some of the biggest developments
in deep learning technology and the people who have played a key role in establishing these
developments. This will help you understand how for how long deep learning has been around, and
just how far it has come in a few short decades.

1965: Akexey Ivakhnenko and Lapa publish a working learning algorithm that uses supervised, deep,
feedforward, multilayer perceptrons.

1971: A paper is published that describes the algorithm as an 8-layer-deep network that is trained by
the group method of data handling algorithm.

1980: A deep learning algorithm is published as the “Neocognitron” by Kunihiko Fukushima.

1986: Rina Dechter introduces the phrase “Deep Learning”, and the concept becomes an official topic
of interest for computer scientists.

1989: The standard back propaganda algorithm is applied to a deep neural network by Yahn LeCun,
who used it as a system to read and recognize handwritten ZIP codes on pieces of mail.

1990: Igor Aizenberg coins the phrase “Artificial Neural Networks”.

1991: The same piece of machinery used to read ZIP codes on mail was advanced and developed so
that it could begin recognizing 3-D objects. This was completed by matching 2-D images with
handcrafted 3-D object models.

1992: The Cresceptron was introduced by Weng. This was a publication that was used to perform 3-
D recognition and could work even in a cluttered place where there wasn’t a single isolated object to

110
recognize. Similar to the Neocognitron, which was introduced in 1980, the Cresceptron was developed
using multiple layers. Unlike the Neocognitron, it did not require supervision or human interaction
and manipulation to complete various functions. In other words, it could perform many functions
and operations on its own.

1994: A multi-layer Boolean neural network was introduced by Andre C. P. L. F. de Carvalho,


Fairhurst and Bisset. The neural network module was independently trained and created a completely
weightless neural network that was composed of self-organizing feature extractions.

1995: A network comprising six fully connected layers was trained in two days by Brendan Frey. The
network had several hundred hidden units within it, which were each trained during this period. He
proved that using the wake-sleep algorithm, you could train the network to do several different
functions. At the time, however, the speed was rather slow.

1997: Over the years, the deep learning network has struggled with what was called the “vanishing
gradient problem”. This prevented machine learning systems from retaining information in gradient-
based learning methods. In 1997, Hochreiter and Schmidhuber published the long short-term memory
(LSTM) network, which was capable of avoiding the vanishing gradient problem. It did so by
maintaining memories of events that had happened several times over. This was important in the
development of speech recognition.

2006: Hinton and Salakhutdinov successfully showed how many-layer feedforward neural network
systems were capable of being pre-trained one layer at a time. Each layer was treated as an
unsupervised but restricted machine. Afterward, they used supervised backpropagation to fine-tune
the system so that it worked the way they desired it to.

2009: Hinton and Li Deng worked together to apply deep learning algorithms to speech recognition.
Together, they co-organized a workshop that took place in late 2009 called “Workshop on Deep
Learning for Speech Recognition”. Here, they learned how they could deepen the generative models
of speech and potentially use more capable hardware and large-scale datasets so that deep neural nets
would become possible.

2010: Computer scientists and researchers extend deep learning to encompass larger vocabulary
speech recognition.

2012: The “Merck Molecular Activity Challenge” was won by Dahl and his team when they used a
multi-task deep neural network to predict the biomolecular target of a drug.

2014: The “Tox21 Data Challenge” was won by a team lead by Hochreiter, who used deep learning
as a system to detect off-target and toxic effects of environmental chemicals that can be found in
nutrients, drugs, and various household products.

111
There have been many discoveries and advances in deep learning over the years. The concept was
originally introduced in 1965, and since then scientists and researchers have developed it to the extent
that there is now a real and impending possibility that a true and complete artificial intelligence will
soon exist. At this point, we are no longer concerned with “if” it will happen, but rather are wondering
“when” it will. It is clear that research has shown that deep learning has the power to complete many
incredible actions, each of which will contribute to an eventual artificial intelligence that will be able
to completely function, learn, and advance on its own without requiring human input, interaction, or
manipulation.

112
Our Future with Deep Learning Applied

Chapter Introduction

While the future of deep learning itself is rather incredible, it may be beneficial to further consider
what our future could look like with the integration of deep learning machines. Deep learning
technology has already been integrated into our world to a degree, and it is rapidly growing. As a result,
we can expect to continue seeing the integration of deep learning-based technologies in our society.
In this chapter we are going to explore some of those technologies and what they may look like given
what we already know about deep learning and deep neural networks.

113
Medical Technology

As you have learned, deep learning technology has the power to perform many incredible tasks in the
medical community. The more they explore deep learning machinery, the further these technologies
will go. There is the potential that we could end up seeing every lab in the world being outfitted with
advanced machines that are capable of performing many functions that humans are presently
performing, only better. For example, we may see the introduction of machines that can read blood
tests and DNA samples to help doctors provide more accurate diagnoses regarding patient’s medical
concerns. We may also be able to choose more effective drugs for each unique patient as the machine
can factor in many pieces of data, such as the individual’s DNA sequencing, the illness, and the
parameters of unique drugs. The machines may be able to predict negative or severe reactions in
unique individuals to certain drugs, meaning that they can choose more effective ones that will treat
the illness with less negative or harsh side effects for the patient.
Another type of machine that may be developed is one that can help with the development of new
drugs. Drugs are constantly being created to help treat existing illnesses more efficiently and to treat
new illnesses as they emerge. If it were created properly, a deep learning machine could potentially
provide new ideas for drug creations and provide predictions as to what the effects and adverse
reactions to these new drugs could be. In essence, it could play a key role in the development of new
medications.
Biomechanics

Elaborating further on the medical industry, we can address the idea of biomechanics. Already, there
are many incredible biomechanical limbs and parts that are being tested and used on patients. As we
continue to explore the capabilities of deep learning, however, we may be able to discover how to
create biomechanical organs and limbs that essentially learn to function exactly as they should in order
for the person to operate as though they had never lost a limb or an organ in the first place. These
biomechanical pieces could be taught to not only help the individual accomplish basic tasks but also
relearn how to accomplish fine motor skills they may have once had or been interested in developing
prior to the loss of their limb.
Fully Automated Smart Homes

Smart homes are already in existence, but as deep learning machines and systems develop, we learn
about more and more that can be done within the home. Already we can accomplish tasks, such as
turning on lights, changing the music or the channels on the TV, ordering clothes from websites,
ordering food from the store, and more. There are many incredible tasks that can be accomplished
with smart homes.
However, there is a lot more than could be done using deep learning technology. Many developers
from large organizations such as Facebook are already attempting to create a fully automated smart
home that could literally do everything for you. It would essentially become a self-sustaining smart
home that could do everything for itself. You would never have to cook, clean, pick up groceries, or
do virtually anything else with a fully automated smart home. Furthermore, they are attempting to

114
create one that can not only do all of this and take commands, but also carry an intelligent conversation
and essentially build a “relationship” with the people who live in the home. It would be able to recall
important information about you and do things such as wish you a happy birthday, ask how your day
was, and interact with you based on your response as well as otherwise hold intelligent and unique
conversations that were not facilitated by pre-recorded responses on the device.
Having a smart home could be incredible in that you would no longer have to do much in order to
manage your home. Instead, you could cast aside such tasks as cleaning and cooking and spend more
time enjoying your home. This particular level of smart home is likely not going to be introduced for
quite some time, and will only be available to those who are wealthy to begin with, but it is not unlikely
that we may eventually have smart homes like this as an average everyday home for consumers.
Advanced Mobile Technology

If you have watched the evolution of cellphones, then you have seen them go from simple phones in
your pocket that could merely be used to call people to mini-computers that are capable of performing
many incredible functions. Social media, texting, managing your calendar, taking high quality photos,
and many other things can be completed using current smart phones.
As they continue developing deep learning technology and refining deep neural networks, it is likely
that we will continue to see the smart phone become even smarter. Over time, these devices may
become personal managers that we carry around in our pockets. We would be able to accomplish
many of the same things we already complete without having to take them out or interact with them
in any way beyond speaking. We would be able to communicate with the device and it would do
everything from make or cancel dinner reservations to order new outfits or products straight to our
homes. They could communicate with our friends for us to establish plans or help us communicate
with clients if it were to be used as a business device.
Essentially, anything that is typically done by a personal manager could be done by your phone one
day. Having a personal manager would no longer be reserved for those who are rich and famous, as
we would all have one in the form of a mobile device.
Automated Commercial Use Programs

In addition to managing our personal lives, deep learning machines could be used to manage
businesses and organizations as well. They could organize and arrange business meetings, reserve
tables or spaces for the meetings to be held, and send out automated reminders to everyone who is
required to show up without anyone ever having to be present for the arranging. One person could
communicate something with the device and every other person would have that information and
would be able to automatically perform any activities that needed to be accomplished in advance.
These devices could also be used to manage advertisements. They could be responsible for collecting
data and creating advertisement campaigns for various products or businesses based on what was
presently taking place in the company. For example, they could determine the target audience and
how to target them, what parameters would be included in the campaign to ensure that they were
reached, and what graphics should be used in order to appeal to the consumer. They would also be
able to measure any metrics and data that was returned by the campaign, including how well it

115
performed, how much money each lead costed, the value of each lead, and what could be done to
make the campaigns even more effective going forward. Everything would be entirely done by a device
and nothing would ever have to be done by an individual person.

There are many other ways that deep learning machines could be great for businesses as well. They
could be used to oversee the payments made to and from companies, order products or services,
schedule employees, hire and fire employees, and more. There are truly limitless opportunities when
it comes to learning about what could be done in a corporate or commercial world using deep learning
machinery.
Partial Artificial Intelligences

Although the focus often remains on completely artificial intelligence, it is worth paying attention to
the idea of partial artificial intelligence as well. This would be a system that is more advanced than
specialized machines but not quite a complete artificial intelligence. For example, it could be a human-
sized and shaped figure that was your personal manager and was capable of taking commands and
completing many functions but was not entirely independent. Instead, it still relied somewhat on
human interaction to complete many functions. This device could be responsible for helping you plan
your days, schedule commitments, order stuff for your house or yourself, and otherwise perform daily
activities. It could also help you manage other partial artificial intelligences you may have.
Yes, in addition to a personal manager-style artificial intelligence, you could have one that was
responsible for physically taking care of your house, one that was responsible for managing your
employees, one that was responsible for driving your car, and other such things. These would be
partial robots capable of performing complex and advanced tasks, but not completely capable of
operating on their own with zero assistance from a human. They would be an incredible addition to
help people perform daily activities without having to do them on their own, but they would be
manageable and programmable by humans.

One reason why a partial intelligence like this is useful, even one that could perform all the
aforementioned tasks, is because it helps reduce or even eliminate the fear of having a complete
artificial intelligence that could take over the world. These ones would be manageable and could be
stopped the instant they attempted to cause any harm. They would not be completely independent
and therefore they could be prevented from wreaking havoc on society the way a completely artificial
intelligence might be able to. It would help to reduce many of the fears that several people have around
having a complete artificial intelligence while still offering all of the benefits and convenience of an
artificial intelligence that was capable of performing complex tasks for its owners.
Complete Artificial Intelligences

Once again, we discuss the potential of creating a completely artificial intelligence. Complete artificial
intelligences would be literal superhumans. They would be capable of doing anything a human could
do. They could fulfill and execute complex actions and functions, carry on intelligent conversations,
learn, grow, and do virtually anything a human could.

116
While the idea of creating an complete artificial intelligence is staggering, this is one aspect that needs
to be further considered by developers. A completely artificial intelligence may be beneficial overall,
but it could also pose many threats and dangers to society. For example, they could create their own
completely artificial intelligent replicas and develop their own society with as many or as few as they
wanted. They would be able to create armies, potentially destroy society, and even wipe humans out
entirely. If we were to have complete artificial intelligences, it could even be that one were hacked or
otherwise turned “evil” and could thus create a large amount of damage in society. Since they would
be created to be strong and virtually perfect, the “imperfect” humans would be capable of stopping
them. They would have to outsmart them, and since the intelligence would be created to be smarter
than us, that would be virtually impossible. We would not have the strength to stop them and would
require the involvement of other artificial intelligences which could ultimately lead to a war between
superhuman machines and humans.
Although the idea of creating a completely artificial intelligence is incredible, it is also highly unlikely.
At least, not at any time in the near future. Having a completely artificial intelligence would require us
to know basically everything there is to know about the human brain in order to recreate it in a machine
form. We would also have to know how to recreate it in machine form to not only function like our
own but to function better. This is all highly unlikely, or potentially impossible. If it were to happen,
it’s likely that it wouldn’t happen for another few generations at least. And when it did, these machines
would have to be created in such a way that they could be outsmarted, which would then take away
from them being entirely complete artificial intelligences. So, unless developers and the government
are willing to jeopardize society as a whole, it is extremely unlikely that we will ever see a true and
complete artificial intelligence.

The future of deep learning is vast and incredible. There are so many things that could take place that
it is hard to dream up every single one. There are truly so many routes that this technology could go
and the only way we will ever know where it ends up is by watching and waiting. Deep learning
networks are essentially our key to creating the sci-fi world that we have all dreamt about at one time
or another, or watched or read about in pop culture. There are so many potentially incredible, and
devastating, things that could happen as a result of deep learning. In order for us to continue advancing
as a society without having a significantly negative impact from deep learning machines, we must take
our time and roll things out slowly.
Additionally, major developments must be monitored and thoroughly understood before they are ever
exposed to the general public. If something were not fully comprehended and were to be released too
soon, it could wreak serious havoc on society. The reality is, even a human with bad intentions could
turn basic deep learning against society. It is important that we take our time and really investigate
each part of the process and continue learning more and more about deep learning and deep neural
networks before we ever plan to use them as a large and important part of society.
Still, it is extremely fun to dream about where we may end up with the inventions that are based in
deep learning. There are so many possibilities and the idea of having machines that are capable of
fulfilling many of the functions that deep learning machines could potentially fulfill one day is
incredible. Could you imagine coming home, asking your kitchen to cook you lobster and steak for

117
dinner and then sitting on your couch and having a conversation with your home while it helps you
pick out a show on TV? For some it may seem over the top, but for others it sounds like a great piece
of luxury and perhaps the best relationship yet.

118
Summary

A brief history of the neural networks give us a basic understanding of how it all started from the
1980s. Further how the concept was developed into a practical model in the early stages was seen.

The differences between the Artificial Neural Network and the Biological Neural Network covered
up topics to get a gist of how Real Biological Neurons and how they differ from Artificial Neurons.

The above topics were followed by a detailed study of Artificial Neural Network. Any beginner in the
field of artificial neural network will understand the concepts like different Layers of Artificial neural
network and how it is structured.
The Learning process includes different types of Learning mechanisms used in the ANN.

The reason for using Artificial Neural Networks can be seen in the following topics:

Fundamentals of ANN
Network Topology – The type of connections used by neurons to form an Artificial Neural
network.
The two topologies of ANN are Feed forward networks and Feedback networks
Weights tuning and Learning lets us improve the network performance
Activations functions allows us to refine the networks.

Learning paradigms include four types of learning mechanisms as follows:


Supervised learning, Unsupervised learning, Semi-Supervised Machine Learning, Reinforcement
learning
Major Variants of ANN are Multilayer perceptron (MLP), Convolutional neural networks,
recurrent neural network, long short-term memory, Deep reservoir computing, deep belief networks,
etc.

Tools and Technologies used for ANNS are mentioned in the following sections:
Major libraries and Programming language support
Practical implementations show us various fields in which ANNs are widely used like Text
Classification, Image Processing, etc.
Major NN projects show us real life implementations of ANNS in different sectors.
We can learn about ANNs in some of the Open sources resources mentioned in the documents.

There are many issues and challenges face by the ANN users nowadays, a brief content shows these.

The Applications of ANNs are large in number and varieties. Some are mentioned in the document.

119
120
Thank you !
Thank you for buying this book! It is intended to help you understanding machine learning. If you
enjoyed this book and felt that it added value to your life, we ask that you please take the time to
review it.
Your honest feedback would be greatly appreciated. It really does make a difference.
Click to the link below to write a quick review
https://www.amazon.com/dp/B07DGLYMLX

We are a very small publishing company and our survival depends on your reviews.
Please, take a minute to write us an honest review.

Click to the link below to write a quick review


https://www.amazon.com/dp/B07DGLYMLX

121