专栏名称: 机器学习研究会

机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织，旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外，协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。

【学习】PyTorch递归神经网络

机器学习研究会 · 公众号 · AI · 2017-04-12 19:16

正文

点击上方“机器学习研究会”可以订阅哦

摘要

转自：爱可可-爱生活

From Siri to Google Translate, deep neural networks have enabled breakthroughs in machine understanding of natural language. Most of these models treat language as a flat sequence of words or characters, and use a kind of model called a recurrent neural network (RNN) to process this sequence. But many linguists think that language is best understood as a hierarchical tree of phrases, so a significant amount of research has gone into deep learning models known as recursive neural networks that take this structure into account. While these models are notoriously hard to implement and inefficient to run, a brand new deep learning framework called PyTorch makes these and other complex natural language processing models a lot easier.

While recursive neural networks are a good demonstration of PyTorch’s flexibility, it is also a fully-featured framework for all kinds of deep learning with particularly strong support for computer vision. The work of developers at Facebook AI Research and several other labs, the framework combines the efficient and flexible GPU-accelerated backend libraries from Torch7 with an intuitive Python frontend that focuses on rapid prototyping, readable code, and support for the widest possible variety of deep learning models.

SPINNing Up

This post walks through the PyTorch implementation of a recursive neural network with a recurrent tracker and TreeLSTM nodes, also known as SPINN—an example of a deep learning model from natural language processing that is difficult to build in many popular frameworks. The implementation I describe is also partially batched, so it’s able to take advantage of GPU acceleration to run significantly faster than versions that don’t use batching.

This model, which stands for Stack-augmented Parser-Interpreter Neural Network, was introduced in Bowman et al. (2016) as a way of tackling the task of natural language inference using Stanford’s SNLI dataset.

The task is to classify pairs of sentences into three categories: assuming that sentence one is an accurate caption for an unseen image, then is sentence two (a) definitely, (b) possibly, or (c) definitely not also an accurate caption? (These classes are called entailment, neutral, and contradiction, respectively). For example, suppose sentence one is “two dogs are running through a field.” Then a sentence that would make the pair an entailment might be “there are animals outdoors,” one that would make the pair neutral might be “some puppies are running to catch a stick,” and one that would make it a contradiction could be “the pets are sitting on a couch.”

In particular, the goal of the research that led to SPINN was to do this by encoding each sentence into a fixed-length vector representation before determining their relationship (there are other ways, such as attentional models that compare individual parts of each sentence with each other using a kind of soft focus).

The dataset comes with machine-generated syntactic parse trees, which group the words in each sentence into phrases and clauses that all have independent meaning and are each composed of two words or sub-phrases. Many linguists believe that humans understand language by combining meanings in a hierarchical way as described by trees like these, so it might be worth trying to build a neural network that works the same way. Here’s an example of a sentence from the dataset, with its parse tree represented by nested parentheses:

    ( ( The church ) ( ( has ( cracks ( in ( the ceiling ) ) ) ) . ) )

One way to encode this sentence using a neural network that takes the parse tree into account would be to build a neural network layer Reduce that combines pairs of words (represented by word embeddings like GloVe) and/or phrases, then apply this layer recursively, taking the result of the last Reduce operation as the encoding of the sentence:

X = Reduce(“the”, “ceiling”)
Y = Reduce(“in”, X)
... etc.

But what if I want the network to work in an even more humanlike way, reading from left to right and maintaining sentence context while still combining phrases using the parse tree? Or, what if I want to train a network to construct its own parse tree as it reads the sentence, based on the words it sees? Here’s the same parse tree written a slightly different way:

    The church ) has cracks in the ceiling ) ) ) ) . ) )

Or a third way, again equivalent:

WORDS:  The church   has cracks in the ceiling         .
PARSES: S   S      R S   S      S  S   S       R R R R S R R

All I did was remove open parentheses, then tag words with “S” for “shift” and replace close parentheses with “R” for “reduce.” But now the information can be read from left to right as a set of instructions for manipulating a stack and a stack-like buffer, with exactly the same results as the recursive method described above:

Place the words into the buffer.
Pop “The” from the front of the buffer and push it onto stack, followed by “church”.
Pop top two stack values, apply Reduce, then push the result back to the stack.
Pop “has” from buffer and push to stack, then “cracks”, then “in”, then “the”, then “ceiling”.
Repeat four times: pop top two stack values, apply Reduce, then push the result.
Pop “.” from buffer and push onto stack.
Repeat two times: pop top two stack values, apply Reduce, then push the result.
Pop the remaining stack value and return it as the sentence encoding.

I also want to maintain sentence context to take into account information about the parts of the sentence the system has already read when performing Reduce operations on later parts of the sentence. So I’ll replace the two-argument Reduce function with a three-argument function that takes a left child phrase, a right child phrase, and the current sentence context state. This state is created by a second neural network layer, a recurrent unit called the Tracker. The Tracker produces a new state at every step of the stack manipulation (i.e., after reading each word or close parenthesis) given the current sentence context state, the top entry b in the buffer, and the top two entries s1, s2 in the stack:

context[t+1] = Tracker(context[t], b, s1, s2)

You could easily imagine writing code to do these things in your favorite programming language. For each sentence to be processed it would load the next word from the buffer, run the Tracker, check whether to push onto the stack or perform a Reduce, do that operation, then repeat until the sentence is complete. Applied to a single sentence, this process constitutes a large and complex deep neural network with two trainable layers applied over and over in ways determined by the stack manipulation. But if you’re familiar with traditional deep learning frameworks like TensorFlow or Theano, it’s difficult to implement a dynamic procedure like this. It’s worth stepping back and spending a little while exploring why that’s the case, and what PyTorch does differently.

链接：

https://devblogs.nvidia.com/parallelforall/recursive-neural-networks-pytorch/

原文链接：

http://weibo.com/1402400261/EE3vF3vQr?ref=home&rid=11_0_8_2676203304825082947&type=comment#_rnd1491990107465

“完整内容”请点击【阅读原文】

↓↓↓