专栏名称: 机器学习研究会
机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织,旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外,协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。
目录
相关文章推荐
新智元  ·  历史分水岭:DeepSeek ... ·  20 小时前  
芯东西  ·  创历史!DeepSeek获16大国产AI芯片 ... ·  23 小时前  
芯东西  ·  创历史!DeepSeek获16大国产AI芯片 ... ·  23 小时前  
宝玉xp  ·  GitHub ... ·  昨天  
黄建同学  ·  2025 AI ... ·  2 天前  
51好读  ›  专栏  ›  机器学习研究会

【推荐】通过LSTM/CNN对抗训练生成文本

机器学习研究会  · 公众号  · AI  · 2017-01-17 19:21

正文


点击上方 “机器学习研究会” 可以订阅哦
摘要

转自:爱可可-爱生活

There was a really cute paper at the GAN workshop this year, Generating Text via Adversarial Training by Zhang, Gan, and Carin.  In particular, they make a couple of unusual choices that appear important.  (Warning: if you are not familiar with GANs, this post will not make a lot of sense.)

  1. They use a convolutional neural network (CNN) as a discriminator, rather than an RNN.  In retrospect this seems like a good choice, e.g. Tong Zhang has been crushing it in text classification with CNNs.  CNNs are a bit easier to train than RNNs, so the net result is a powerful discriminator with a relatively easy optimization problem associated with it.

  2. They use a smooth approximation to the LSTM output in their generator, but actually this kind of trick appears everywhere so isn't so remarkable in isolation.

  3. They use a pure moment matching criterion for the saddle point optimization (estimated over a mini-batch).  GANs started with a pointwise discrimination loss and more recent work has augmented this loss with moment matching style penalties, but here the saddle point optimization is pure moment matching.  (So technically the discriminator isn't a discriminator.  They actually refer to it as discriminator or encoder interchangeably in the text, this explains why.)

  4. They are very smart about initialization.  In particular the discriminator is pre-trained to distinguish between a true sentence and the same sentence with two words swapped in position.  (During initialization, the discriminator is trained using a pointwise classification loss).  This is interesting because swapping two words preserves many of the n " role="presentation"> n

  5. -gram statistics of the input, i.e., many of the convolutional filters will compute the exact same value.  (I've had good luck recently using permuted sentences as negatives for other models, now I'm going to try swapping two words.)

  6. They update the generator more frequently than the discriminator, which is counter to the standard folklore which says you want the discriminator to move faster than the generator.  Perhaps this is because the CNN optimization problem is much easier than the LSTM one; the use of a purely moment matching loss might also be relevant.

The old complaint about neural network papers was that you couldn't replicate them.  Nowadays it is often easier to replicate neural network papers than other papers, because you can just fork their code on github and run the experiment.  However, I still find it difficult to ascertain the relative importance of the various choices that were made.  For the choices enumerated above: what is the sensitivity of the final result to these choices?  Hard to say, but I've started to assume the sensitivity is high, because when I have tried to tweak a result after replicating it, it usually goes to shit.  (I haven't tried to replicate this particular result yet.)

Anyway this paper has some cool ideas and hopefully it can be extended to generating realistic-looking dialog.







请到「今天看啥」查看全文