Machine Learning - Intuitive Understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overﬂow
Home
Intuitive understanding of 1D, 2D, and 3D Convolutions Ask Question
PUBLIC
in Convolutional Neural Networks
Stack Overflow
Tags Can anyone please clearly explain the difference between

1D, 2D, and 3D convolutions in CNN (Deep Learning) with
Users examples?
61
Jobs
machine-learning deep-learning signal-processing
conv-neural-network convolution
Teams
Q&A for work 67
edited Oct 12 '18 at 2:13
Learn More kmario23
16k 4 58 69
asked Mar 19 '17 at 6:20

xlax
618 1 8 13
1 Answer
I want to explain with picture from C3D.
In a nutshell, convolutional direction & output shape is

245 important!
By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service.
https://stackoverﬂow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 1/11
↑↑↑↑↑ 1D Convolutions - Basic ↑↑↑↑↑
just 1-direction (time-axis) to calculate conv

input = [W], filter = [k], output = [W]
ex) input = [1,1,1,1,1], filter = [0.25,0.5,0.25], output =
[1,1,1,1,1]
output-shape is 1D array
example) graph smoothing
tf.nn.conv1d code Toy Example
import tensorflow as tf
import numpy as np
sess = tf.Session()
ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1
in_1d = tf.constant(ones_1d, dtype=tf.float32)

filter_1d = tf.constant(weight_1d, dtype=tf.float32)
in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])
input_1d = tf.reshape(in_1d, [1, in_width, 1])

kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1]
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1
padding='SAME'))
print sess.run(output_1d)
2-direction (x,y) to calculate conv

output-shape is 2D Matrix
input = [W, H], filter = [k,k] output = [W,H]
example) Sobel Egde Fllter
tf.nn.conv2d - Toy Example
ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_height = int(in_2d.shape[1])
filter_height = int(filter_2d.shape[1])
input_2d = tf.reshape(in_2d, [1, in_height, in_width

kernel_2d = tf.reshape(filter_2d, [filter_height, filt

padding='SAME'))
3-direction (x,y,z) to calcuate conv

output-shape is 3D Volume
input = [W,H,L], filter = [k,k,d] output = [W,H,M]
d < L is important! for making volume output
example) C3D
tf.nn.conv3d - Toy Example
ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_depth = int(in_3d.shape[2])
filter_depth = int(filter_3d.shape[2])
input_3d = tf.reshape(in_3d, [1, in_depth, in_height

kernel_3d = tf.reshape(filter_3d, [filter_depth, filte
1])

padding='SAME'))
↑↑↑↑↑ 2D Convolutions with 3D input - LeNet, VGG, ...,

↑↑↑↑↑
Eventhough input is 3D ex) 224x224x3, 112x112x32

output-shape is not 3D Volume, but 2D Matrix
because filter depth = L must be matched with input
channels = L
2-direction (x,y) to calcuate conv! not 3D
input = [W,H,L], filter = [k,k,L] output = [W,H]
what if we want to train N filters (N is number of filters)
then output shape is (stacked 2D) 3D = 2D x N matrix.
conv2d - LeNet, VGG, ... for 1 filter
in_channels = 32 # 3 for RGB, 32, 64, 128, ...

ones_3d = np.ones((5,5,in_channels)) # input is 3d, in
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels))
strides_2d = [1, 1, 1, 1]


padding='SAME'))
conv2d - LeNet, VGG, ... for N filters

out_channels = 64 # 128, 256, ...
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]


out_channels])
#output stacked shape is 3D = 2D x N matrix

output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=
↑↑↑↑↑ Bonus 1x1 conv in CNN - GoogLeNet, ..., ↑↑↑↑↑
1x1 conv is confusing when you think this as 2D

image filter like sobel
for 1x1 conv in CNN, input is 3D shape as above

picture.
it calculate depth-wise filtering
input = [W,H,L], filter = [1,1,L] output = [W,H]
output stacked shape is 3D = 2D x N matrix.
tf.nn.conv2d - special case 1x1 conv

out_channels = 64 # 128, 256, ...
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]


out_channels])
#output stacked shape is 3D = 2D x N matrix

Animation (2D Conv with 3D-inputs)
- Original Link : LINK

- The author: Martin Görner
- Twitter: @martin_gorner
- Google +: plus.google.com/+MartinGorne
Bonus 1D Convolutions with 2D input
↑↑↑↑↑ 1D Convolutions with 1D input ↑↑↑↑↑
↑↑↑↑↑ 1D Convolutions with 2D input ↑↑↑↑↑
Eventhough input is 2D ex) 20x14

By using our site, you acknowledge that you have
output-shape read
is not 2Dand understand
, but our Cookie Policy, Privacy Policy, and our Terms of Service.
1D Matrix
because filter height = L must be matched with input

height = L
1-direction (x) to calcuate conv! not 2D
input = [W,L], filter = [k,L] output = [W]
what if we want to train N filters (N is number of filters)
then output shape is (stacked 1D) 2D = 1D x N matrix.
Bonus C3D
in_channels = 32 # 3, 32, 64, 128, ...

out_channels = 64 # 3, 32, 64, 128, ...
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]

in_depth = int(in_4d.shape[2])
filter_depth = int(filter_5d.shape[2])
input_4d = tf.reshape(in_4d, [1, in_depth, in_height

kernel_5d = tf.reshape(filter_5d, [filter_depth, filte
in_channels, out_channels])

sess.close()
Input & Output in Tensorflow
Summary
edited Nov 23 '17 at 4:27
answered Jun 19 '17 at 10:22

runhani
2,469 1 4 5
5 Considering your labor and clarity in the explanations, upvotes
of 8 are too less. – user3282777 Sep 19 '17 at 13:21
1 The 2d conv with 3d input is a nice touch. I would suggest an

edit to include 1d conv with 2d input (e.g. a multi-channel
array) and compare the difference thereof with a 2d conv with
2d input. – SumNeuron Nov 12 '17 at 18:24
Thank you for your comment. ^^ I updated! – runhani Nov 23

'17 at 4:28
1 Amazing answer! – Ben Jan 30 '18 at 13:49
Why is the conv direction in 2d ↲. I have seen sources that

claim that the direction is → for row 1 , then → for row
1+stride . Convolution itself is shift invariant, so why does
the direction of convolution matter? – Minh Triet Mar 19 '18 at
14:11

Machine Learning - Intuitive Understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning - Intuitive Understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

Uploaded by

Copyright:

Available Formats

1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overﬂow

Tags Can anyone please clearly explain the difference between

asked Mar 19 '17 at 6:20

I want to explain with picture from C3D.

In a nutshell, convolutional direction & output shape is

↑↑↑↑↑ 1D Convolutions - Basic ↑↑↑↑↑

just 1-direction (time-axis) to calculate conv

tf.nn.conv1d code Toy Example

in_1d = tf.constant(ones_1d, dtype=tf.float32)

input_1d = tf.reshape(in_1d, [1, in_width, 1])

↑↑↑↑↑ 2D Convolutions - Basic ↑↑↑↑↑

2-direction (x,y) to calculate conv

tf.nn.conv2d - Toy Example

in_2d = tf.constant(ones_2d, dtype=tf.float32)

input_2d = tf.reshape(in_2d, [1, in_height, in_width

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2

↑↑↑↑↑ 3D Convolutions - Basic ↑↑↑↑↑

3-direction (x,y,z) to calcuate conv

tf.nn.conv3d - Toy Example

in_3d = tf.constant(ones_3d, dtype=tf.float32)

input_3d = tf.reshape(in_3d, [1, in_depth, in_height

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3

↑↑↑↑↑ 2D Convolutions with 3D input - LeNet, VGG, ...,

Eventhough input is 3D ex) 224x224x3, 112x112x32

conv2d - LeNet, VGG, ... for 1 filter

in_channels = 32 # 3 for RGB, 32, 64, 128, ...

in_3d = tf.constant(ones_3d, dtype=tf.float32)

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3

conv2d - LeNet, VGG, ... for N filters

in_channels = 32 # 3 for RGB, 32, 64, 128, ...

in_3d = tf.constant(ones_3d, dtype=tf.float32)

input_3d = tf.reshape(in_3d, [1, in_height, in_width

#output stacked shape is 3D = 2D x N matrix

↑↑↑↑↑ Bonus 1x1 conv in CNN - GoogLeNet, ..., ↑↑↑↑↑

1x1 conv is confusing when you think this as 2D

for 1x1 conv in CNN, input is 3D shape as above

tf.nn.conv2d - special case 1x1 conv

in_channels = 32 # 3 for RGB, 32, 64, 128, ...

in_3d = tf.constant(ones_3d, dtype=tf.float32)

input_3d = tf.reshape(in_3d, [1, in_height, in_width

#output stacked shape is 3D = 2D x N matrix

Animation (2D Conv with 3D-inputs)

- Original Link : LINK

Bonus 1D Convolutions with 2D input

↑↑↑↑↑ 1D Convolutions with 1D input ↑↑↑↑↑

↑↑↑↑↑ 1D Convolutions with 2D input ↑↑↑↑↑

Eventhough input is 2D ex) 20x14

because filter height = L must be matched with input

in_channels = 32 # 3, 32, 64, 128, ...

in_4d = tf.constant(ones_4d, dtype=tf.float32)

input_4d = tf.reshape(in_4d, [1, in_depth, in_height

output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=

Input & Output in Tensorflow

edited Nov 23 '17 at 4:27

answered Jun 19 '17 at 10:22

of 8 are too less. – user3282777 Sep 19 '17 at 13:21

1 The 2d conv with 3d input is a nice touch. I would suggest an

Thank you for your comment. ^^ I updated! – runhani Nov 23

1 Amazing answer! – Ben Jan 30 '18 at 13:49

Why is the conv direction in 2d ↲. I have seen sources that

You might also like