You are on page 1of 6

Introduction to Deep Learning: What Is Deep Learning?

Deep learning is getting lots of attention lately, and for good reason. Its making a big
impact in areas such as computer vision and natural language processing.

In this video series, well help you understand why its become so popular, and we'll
address three concepts:
What deep learning is
How it is used in the real world
How you can get started

So, what is deep learning?

Deep learning is a machine learning technique that learns features and tasks directly
from data. Data can be images, text, or sound. In this video, Ill be using images, but
these concepts can be used for other types of data, too.

Machine learning teaches computers to do what comes naturally to humans: learn


from experience. Most machine learning workflows use training images that represent
examples of what the system is trying to learn.

Computers learn from these images through feature extraction. This is where
information like edges, texture, and color is extracted to form a reduced representation
that easily describes the object. These features are then used by a machine learning
classifier to learn a task. Deep learning is often referred to as end-to-end learning.

Lets look at an example:

Say I have a set of images, and I want to recognize which category of objects each
image belongs to cars, trucks, or boats.

I start with a labeled set of images, or training data. The labels correspond to the
desired output of the task. The deep learning algorithm needs these labels, as they tell
the algorithm about the specific features and objects in the image.

The deep learning algorithm then learns how to classify input images into the desired
categories. We use the term "end-to-end learning" because the task is learned directly
from the data.

Another example is a robot learning how to control the movement of its arm to pick
up a specific object. In this case, the task being learned is how to pick up an object
given an input image.
Many of the techniques used in deep learning today have been around for decades.
For example, deep learning has been used to recognize handwritten postal codes in the
mail service since the 1990s.

The use of deep learning has surged over the last 5 years. This is primarily due to
three factors:
1. Deep learning methods are now more accurate than people are at classifying
images.
2. GPUs enable us to now train deep networks in less time.
3. Large amounts of labeled data required for deep learning has become accessible
over the last few years.

Most deep learning methods use neural network architectures. This is why you often
hear deep learning models referred to as deep neural networks. One of the most
popular types of deep neural networks is known as a convolutional neural network. It
is especially well suited for working with image data. The term deep usually refers
to the number of hidden layers in the neural network. Traditional neural networks
only contain 2-3 hidden layers, while some recent deep networks have as many as
150.

Here are a few examples that you can try with MATLAB:
Recognizing or classifying objects into categories as seen here, where a deep
network classifies objects on my desk.
Detecting or locating objects of interest in an image like in this example, where we
use deep learning to detect a stop sign in an image.

I hope you found this overview helpful. To find out more, you can visit our website at
mathworks.com/deep-learning.
Introduction to Deep Learning: Machine Learning vs. Deep Learning
Deep learning and machine learning both offer ways to train models and classify data.
This video compares the two, and it offers ways to help you decide which one to use.

Lets start by discussing the classic example of distinguishing cats from dogs. Now, in
this picture, do you see a cat or a dog? How were you able to answer that? The
chances are youve seen many cats and dogs over time, and so youve learned how to
identify them. This is essentially what were trying to get a computer to do: Learn
from, and recognize, examples. Also, keep in mind that sometimes even humans can
get identification wrong. So we might expect a computer to make similar errors.

To have a computer do classification using a standard machine learning approach,


wed manually select the relevant features of an image, such as edges or corners, in
order to train the machine learning model. The model then references those features
when analyzing and classifying new objects.

This is an example of object recognition. However, these techniques can also be used
for scene recognition and object detection.

When solving a machine learning problem, you follow a specific workflow. You start
with an image, and then extract relevant features from it. Then you create a model that
describes or predicts the object. On the other hand, with deep learning, you skip the
manual step of extracting features from images. Instead, you feed images directly into
the deep learning algorithm, which then predicts the objects.

So, deep learning is a sub type of machine learning. It deals directly with images, and
it is often more complex. For the rest of the video, when I mention machine learning,
I mean anything not in the deep learning category.

When choosing between machine learning and deep learning, you should ask yourself
whether you have a high-performance GPU and lots of labeled data. If you dont have
either of those things, youll have better luck using machine learning over deep
learning. This is because deep learning is generally more complex, so youll need at
least a few thousand images to get reliable results. Youll also need a
high-performance GPU so the model spends less time analyzing all those images.

If you choose machine learning, you have the option to train your model on many
different classifiers. You may also know which features to extract that will produce
the best results. Plus, with machine learning, you have the flexibility to choose a
combination of approaches. Use different classifiers and features to see which
arrangement works best for your data. You can use MATLAB to try these
combinations quickly. Also, keep in mind that if you are looking to do things like face
detection, you can use out-of-the-box MATLAB examples.
As we mentioned before, you need less data with machine learning than with deep
learning. And you can get to a trained model faster, too. However, deep learning has
become very popular recently because it can be highly accurate. You dont have to
understand which features are the best representation of the object. These are learned
for you. But in a deep learning model, you need a large amount of data, which means
the model can take a long time to train. You are also responsible for many of the
parameters. And because the model is a black box, if something isnt working
correctly, it may be hard to debug it.

So, in general, deep learning is more computationally intensive, while machine


learning techniques are often simpler to apply.

One last point to keep in mind. There is a way to combine these two approaches. By
using deep learning as a feature extractor and machine learning to classify the features,
you can get an accurate and flexible model.

So, in summary, the choice between machine learning or deep learning depends on
your data and the problem youre trying to solve. MATLAB can help you with both of
these techniques - either separately or as a combined approach.

To find out more, visit mathworks.com/deep-learning


Introduction to Deep Learning: What Are Convolutional Neural Networks
A convolutional neural network, or CNN, is a network architecture for deep learning.
It learns directly from images.

A CNN is made up of several layers that process and transform an input to produce an
output. You can train a CNN to do image analysis tasks including scene classification,
object detection and segmentation, and image processing.

In order to understand how CNNs work, well cover three concepts:


1) Local receptive fields
2) Shared weights and biases, and
3) Activation and pooling
Finally, well briefly discuss the three ways to train CNNs for image analysis.

Lets start with the concept of local receptive fields.

In a typical neural network, each neuron in the input layer is connected to a neuron in
the hidden layer. However, in a CNN, only a small region of input layer neurons
connect to neurons in the hidden layer. These regions are referred to as local receptive
fields. The local receptive field is translated across an image to create a feature map
from the input layer to the hidden layer neurons. You can use convolution to
implement this process efficiently. Thats why it is called a convolutional neural
network.

The second concept well discuss is about shared weights and biases.

Like a typical neural network, a CNN has neurons with weights and biases. The
model learns these values during the training process, and it continually updates them
with each new training example. However, in the case of CNNs, the weight and bias
values are the same for all the hidden neurons in a given layer. This means that all the
hidden neurons are detecting the same feature, such as an edge or a blob, in different
regions of the image. This makes the network tolerant to the translation of objects in
an image. For example, a network trained to recognize cats will be able to do so
wherever the cat is in the image.

Our third and final concept is activation and pooling. The activation step applies a
transformation to the output of each neuron by using activation functions. Rectified
Linear Unit, or ReLU, is an example of a commonly used activation function. It takes
the output of a neuron and maps it to the highest positive value. Or, if the output is
negative, the function maps it to zero.

You can further transform the output of the activation step by applying a pooling step.
Pooling reduces the dimensionality of the feature map by condensing the output of
small regions of neurons into a single output. This helps simplify the following layers
and reduces the number of parameters that the model needs to learn.

Now, lets put it all together. Using these three concepts, we can configure the layers
in a CNN. A CNN can have tens or hundreds of hidden layers that each learn to detect
different features of an image.

In this feature map, we can see that every hidden layer increases the complexity of the
learned image features. For example, the first hidden layer learns how to detect edges,
and the last learns how to detect more complex shapes.

Just like in a typical neural network, the final layer connects every neuron from the
last hidden layer to the output neurons. This produces the final output. There are three
ways to use CNNs for image analysis. The first method is to train the CNN from
scratch. This method is highly accurate, although it is also the most challenging, as
you might need hundreds of thousands of labeled images and significant
computational resources.

The second method relies on transfer learning, which is based on the idea that you can
use knowledge of one type of problem to solve a similar problem. For example, you
could use a CNN model that has been trained to recognize animals to initialize and
train a new model that differentiates between cars and trucks. This method requires
less data and fewer computational resources than the first.

With the third method, you can use a pre-trained CNN to extract features for training
a machine learning model.

For example, a hidden layer that has learned how to detect edges in an image is
broadly relevant to images from many different domains. This method requires the
least amount of data and computational resources.

For more information, see mathworks.com/deep-learning

You might also like