You are on page 1of 2

Convolutional neural network

A convolutional neural network, or CNN, is a network architecture for deep learning.


It learns directly from images.

A CNN is made up of several layers that process and transform an input to produce
an output. You can train a CNN to do image analysis tasks including scene
classification, object detection and segmentation, and image processing.

In order to understand how CNNs work, we’ll cover three concepts:

1) Local receptive fields

2) Shared weights and biases, and

3) Activation and pooling

Finally, we’ll briefly discuss the three ways to train CNNs for image analysis.

Let’s start with the concept of local receptive fields.

In a typical neural network, each neuron in the input layer is connected to a neuron
in the hidden layer. However, in a CNN, only a small region of input layer neurons
connects to neurons in the hidden layer. These regions are referred to as local
receptive fields. The local receptive field is translated across an image to create a
feature map from the input layer to the hidden layer neurons. You can use
convolution to implement this process efficiently. That’s why it is called a
convolutional neural network.

The second concept we’ll discuss is about shared weights and biases.

Like a typical neural network, a CNN has neurons with weights and biases. The
model learns these values during the training process, and it continually updates
them with each new training example. However, in the case of CNNs, the weight and
bias values are the same for all the hidden neurons in a given layer. This means that
all the hidden neurons are detecting the same feature, such as an edge or a blob, in
different regions of the image. This makes the network tolerant to the translation of
objects in an image. For example, a network trained to recognize cats will be able to
do so wherever the cat is in the image.

Our third and final concept is activation and pooling. The activation step applies a
transformation to the output of each neuron by using activation functions. Rectified
Linear Unit, or ReLU, is an example of a commonly used activation function. It takes
the output of a neuron and maps it to the highest positive value. Or, if the output is
negative, the function maps it to zero.

You can further transform the output of the activation step by applying a pooling step.
Pooling reduces the dimensionality of the feature map by condensing the output of
small regions of neurons into a single output. This helps simplify the following layers
and reduces the number of parameters that the model needs to learn.

Now, let’s put it all together. Using these three concepts, we can configure the layers
in a CNN. A CNN can have tens or hundreds of hidden layers that each learn to
detect different features of an image.

In this feature map, we can see that every hidden layer increases the complexity of
the learned image features. For example, the first hidden layer learns how to detect
edges, and the last learns how to detect more complex shapes.

Just like in a typical neural network, the final layer connects every neuron from the
last hidden layer to the output neurons. This produces the final output. There are
three ways to use CNNs for image analysis. The first method is to train the CNN from
scratch. This method is highly accurate, although it is also the most challenging, as
you might need hundreds of thousands of labelled images and significant
computational resources.

The second method relies on transfer learning, which is based on the idea that you
can use knowledge of one type of problem to solve a similar problem. For example,
you could use a CNN model that has been trained to recognize animals to initialize
and train a new model that differentiates between cars and trucks. This method
requires less data and fewer computational resources than the first.

With the third method, you can use a pre-trained CNN to extract features for training
a machine learning model.

For example, a hidden layer that has learned how to detect edges in an image is
broadly relevant to images from many different domains. This method requires the
least amount of data and computational resources.

You might also like