In 2014, Ian Goodfellow and his colleagues at the University of Montreal published a stunning paper introducing the world to GANs, or generative adversarial networks.
Through an innovative combination of computational graphs and game
theory they showed that, given enough modeling power, two models
fighting against each other would be able to co-train through plain old
backpropagation.
The models play two distinct (literally, adversarial) roles. Given some real data set R, G is the generator, trying to create fake data that looks just like the genuine data, while D is the discriminator, getting data from either the real set or G and labeling the difference. Goodfellow’s metaphor (and a fine one it is) was that G was like a team of forgers trying to match real paintings with their output, while D was the team of detectives trying to tell the difference. (Except that in this case, the forgers G never get to see the original data — only the judgments of D. They’re like blind forgers.)
In the ideal case, both D and G would get better over time until G had essentially become a “master forger” of the genuine article and D was at a loss, “unable to differentiate between the two distributions.”
In practice, what Goodfellow had shown was that G would be able to perform a form of unsupervised learning
on the original dataset, finding some way of representing that data in a
(possibly) much lower-dimensional manner. And as Yann LeCun famously
stated, unsupervised learning is the “cake” of true AI.
This powerful technique seems like it must require a metric ton of code just to get started, right? Nope. Using PyTorch, we can actually create a very simple GAN in under 50 lines of code. There are really only 5 components to think about:
R: The original, genuine data set
I: The random noise that goes into the generator as a source of entropy
G: The generator which tries to copy/mimic the original data set
D: The discriminator which tries to tell apart G’s output from R
The actual ‘training’ loop where we teach G to trick D and D to bewareG.
1.) R: In our case, we’ll start with the simplest possible R — a
bell curve. This function takes a mean and a standard deviation and
returns a function which provides the right shape of sample data from a
Gaussian with those parameters. In our sample code, we’ll use a mean of
4.0 and a standard deviation of 1.25.
2.) I:
The input into the generator is also random, but to make our job a
little bit harder, let’s use a uniform distribution rather than a normal
one. This means that our model G can’t simply shift/scale the input to copy R, but has to reshape the data in a non-linear way.
3.) G: The generator is a standard feedforward graph — two hidden layers, three linear maps. We’re using an ELU (exponential linear unit) because they’re the new black, yo. G is going to get the uniformly distributed data samples from I and somehow mimic the normally distributed samples from R.
4.) D: The discriminator code is very similar to G’s generator code; a feedforward graph with two hidden layers and three linear maps. It’s going to get samples from either R or G
and will output a single scalar between 0 and 1, interpreted as ‘fake’
vs. ‘real’. This is about as milquetoast as a neural net can get.
5.) Finally, the training loop alternates between two modes: first training D on real data vs. fake data, with accurate labels (think of this as Police Academy); and then training G to fool D, with inaccurate labels (this is more like those preparation montages from Ocean’s Eleven). It’s a fight between good and evil, people.