Deep
neural networks and Deep Learning are powerful and popular algorithms.
And a lot of their success lays in the careful design of the neural
network architecture.
I wanted to revisit the history of neural network design in the last few years and in the context of Deep Learning.
For
a more in-depth analysis and comparison of all the networks reported
here, please see our recent article. One representative figure from this
article is here:
Reporting
top-1 one-crop accuracy versus amount of operations required for a
single forward pass in multiple popular neural network architectures.
LeNet5
It
is the year 1994, and this is one of the very first convolutional
neural networks, and what propelled the field of Deep Learning. This
pioneering work by Yann LeCun was named LeNet5 after many previous
successful iterations since they year 1988!
The
LeNet5 architecture was fundamental, in particular the insight that
image features are distributed across the entire image, and convolutions
with learnable parameters are an effective way to extract similar
features at multiple location with few parameters. At the time there was
no GPU to help training, and even CPUs were slow. Therefore being able
to save parameters and computation was a key advantage. This is in
contrast to using each pixel as a separate input of a large multi-layer
neural network. LeNet5 explained that those should not be used in the
first layer, because images are highly spatially correlated, and using
individual pixel of the image as separate input features would not take
advantage of these correlations.
LeNet5 features can be summarized as:
convolutional
neural network use sequence of 3 layers: convolution, pooling,
non-linearity –> This may be the key feature of Deep Learning for
images since this paper!
use convolution to extract spatial features
subsample using spatial average of maps
non-linearity in the form of tanh or sigmoids
multi-layer neural network (MLP) as final classifier
sparse connection matrix between layers to avoid large computational cost
In overall this network was the origin of much of the recent architectures, and a true inspiration for many people in the field.
The gap
In
the years from 1998 to 2010 neural network were in incubation. Most
people did not notice their increasing power, while many other
researchers slowly progressed. More and more data was available because
of the rise of cell-phone cameras and cheap digital cameras. And
computing power was on the rise, CPUs were becoming faster, and GPUs
became a general-purpose computing tool. Both of these trends made
neural network progress, albeit at a slow rate. Both data and computing
power made the tasks that neural networks tackled more and more
interesting. And then it became clear…
Dan Ciresan Net
In
2010 Dan Claudiu Ciresan and Jurgen Schmidhuber published one of the
very fist implementations of GPU Neural nets. This implementation had
both forward and backward implemented on a a NVIDIA GTX 280 graphic
processor of an up to 9 layers neural network.
AlexNet
In
2012, Alex Krizhevsky released AlexNet which was a deeper and much
wider version of the LeNet and won by a large margin the difficult
ImageNet competition.
AlexNet
scaled the insights of LeNet into a much larger neural network that
could be used to learn much more complex objects and object hierarchies.
The contribution of this work were:
use of rectified linear units (ReLU) as non-linearities
use of dropout technique to selectively ignore single neurons during training, a way to avoid overfitting of the model
overlapping max pooling, avoiding the averaging effects of average pooling
use of GPUs
NVIDIA GTX 580
to reduce training time
At
the time GPU offered a much larger number of cores than CPUs, and
allowed 10x faster training time, which in turn allowed to use larger
datasets and also bigger images.
The
success of AlexNet started a small revolution. Convolutional neural
network were now the workhorse of Deep Learning, which became the new
name for “large neural networks that can now solve useful tasks”.
Overfeat
In
December 2013 the NYU lab from Yann LeCun came up with Overfeat, which
is a derivative of AlexNet. The article also proposed learning bounding
boxes, which later gave rise to many other papers on the same topic. I
believe it is better to learn to segment objects rather than learn
artificial bounding boxes.