专栏名称: VALSE
VALSE(Vision and Learning Seminar) 年度研讨会的主要目的是为计算机视觉、图像处理、模式识别与机器学习研究领域内的中国青年学者提供一个深层次学术交流的舞台。
目录
相关文章推荐
募格学术  ·  大学副校长,任副市长! ·  3 天前  
国际新闻界  ·  ChatGPT语境下版权正当性的再反思 ·  3 天前  
PaperWeekly  ·  北京内推 | ... ·  5 天前  
51好读  ›  专栏  ›  VALSE

【17-13期VALSE Webinar活动】

VALSE  · 公众号  ·  · 2017-06-30 17:36

正文



 

报告嘉宾1:董胤蓬(清华大学)

报告时间:2017年07月05日(星期三)晚上20:00-20:20(北京时间)

报告题目:Improving Interpretability of Deep Neural Networks with Semantic Information

主持人:苏航(清华大学)


报告摘要: 

深度学习的可解释性(Interpretability)问题在深度神经网络的发展过程中发挥着至关重要的作用。可解释学习能够使我们了解模型的优势和缺陷、预知模型未来的行为、修复模型中可能存在的问题。但是由于深度学习的黑箱特性,我们很难了解深度神经网络内部的工作机理。为了解决深度神经网络不可解释的问题,我们提出利用人类描述中丰富的语义信息来提升深度神经网络的可解释性。在自动生成视频描述(Video Captioning)的任务中,我们首先从描述中提取主题表示作为有意义的语义信息,然后通过可解释性误差(Interpretive Loss)将其纳入到网络的训练过程中。我们提出了预测差异最大化(Prediction Difference Maximization)的方法解释神经元学习的特征。实验结果展示出通过这种方法,我们可以得到可解释的特征,同时也可以帮助提高任务性能。通过充分理解深度神经网络内部的特征表示,用户通过人在环(Human-in-the-loop)过程可以简单地修复模型的错误。


报告人简介:

董胤蓬,清华大学计算机科学与技术系人工智能研究所,博士一年级,导师为朱军副教授。研究方向为机器学习、深度学习及其在计算机视觉中的应用,目前关注于深度神经网络的可解释性与鲁棒性分析。2016年6月至9月赴美国卡耐基梅隆大学Fernando教授组里访问。


Yinpeng Dong is a first-year Ph.D. student of TSAIL Group in the Department of Computer Science and Technology, Tsinghua University, advised by Prof. Jun Zhu. His research interest includes machine learning, deep learning and their applications in computer vision. Recently, he is interested in interpretability and robustness of deep neural networks. He was a visiting student from June, 2016 to September, 2016 in the Robotics Institute, Carnegie Mellon University, advised by Prof. Fernando De la Torre.


个人主页:http:/ml.cs.tsinghua.edu.cn/~yinpeng 


 

报告嘉宾2:贾奎(华南理工大学、电子与信息学院)

报告时间:2017年07月05日(星期三)晚上20:20-20:40(北京时间)

报告题目:深度神经网络卷积核优化探究(Improving training of deep neural networks via Singular Value Bounding)

主持人:苏航(清华大学)


报告摘要:

In this work, we focus on investigation of the network solution properties that can potentially lead to good performance. Our research is inspired by theoretical and empirical results that use orthogonal matrices to initialize networks, but we are interested in investigating how orthogonal weight matrices perform when network training converges. To this end, we propose to constrain the solutions of weight matrices in the orthogonal feasible set during the whole process of network training, and achieve this by a simple yet effective method called Singular Value Bounding (SVB). In SVB, all singular values of each weight matrix are simply bounded in a narrow band around the value of 1. Based on the same motivation, we also propose Bounded Batch Normalization (BBN), which improves Batch Normalization by removing its potential risk of ill-conditioned layer transform. We present both theoretical and empirical results to justify our proposed methods. Experiments on benchmark image classification datasets show the efficacy of our proposed SVB and BBN. In particular, we achieve the state-of-the-art results of 3.06% error rate on CIFAR10 and 16.90% on CIFAR100, using off-the-shelf network architectures (Wide ResNets). Our preliminary results on ImageNet also show the promise in large-scale learning.


报告人简介:

贾奎,博士,华南理工大学电子与信息学院教授,博士生导师,主动感知与结构智能团队负责人,国家青年千人。2001年于西北工业大学获得学士学位,2004年于新加坡国立大学获得硕士学位,2007年于伦敦大学玛丽女王学院获得计算机科学博士学位。博士毕业后,曾先后于中科院深圳先进技术研究院、香港中文大学、伊利诺伊大学香槟分校新加坡高等研究院、及澳门大学从事教学和科研工作。贾奎博士的主要研究方向是计算机视觉、机器学习、图像处理、和模式识别等。在包括TPAMI,IJCV, TSP, TIP, ICCV, CVPR等在内的计算机视觉和模式识别顶级期刊和会议发表论文50余篇。


 

报告嘉宾3:周博磊(麻省理工学院)

报告时间:2017年07月05日(星期三)晚上20:40-21:00(北京时间)

报告题目:Network Dissection: Quantifying Interpretability of Deep Visual Representations.

主持人:苏航(清华大学)


报告摘要:

We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.


报告人简介:

Bolei Zhou is the 5th-year Ph.D. Candidate in Computer Science and Artificial Intelligence Laboratory at MIT, working with Prof. Antonio Torralba. His research is on computer vision and machine learning, with particular interest in visual scene understanding and network interpretability. He is the award recipient of the Facebook Fellowship,Microsoft Research Asia Fellowship,MIT Greater China Fellowship. More details about his research work is at the homepage:http://people.csail.mit.edu/bzhou/.



 

报告嘉宾4:许洪腾(佐治亚理工(Georgia Tech)电气与计算机工程博士毕业生)

报告时间:2017年07月05日(星期三)晚上21:00-21:20(北京时间)

报告题目:Fractal Dimension Invariant Filtering and Its CNN-based Implementation 

主持人:严骏驰(IBM,华东师范大学)


报告摘要:

分形图像模型与分析技术在计算机视觉领域,尤其在纹理图像综合与分析问题中具有非常成功的应用。其核心思想是通过分形维数的双李普希兹不变性构建具有良好鲁棒性的特征对图像进行表示和描述。但是,分形维数的不变性往往不能在卷积或滤波操作下得到保持,限制了这一技术在深度学习大背景下的应用。


针对这一问题,我们设计了一种具有分形维数不变性的非线性滤波技术(FDIF)。该滤波器先通过各向异性的自适应滤波保持了局部分形维数统计意义上的不变性,再通过非线性变换对滤波后的图像测度进行调整,进而保证在不同测度下(即滤波前后)的局部分形维数不变。该滤波技术可以利用现有的卷积神经网络架构近似实现,建立了特定神经网络结构与分形图像分析技术的联系,给出了特定神经网络的基于分形的几何解释。我们将该技术应用于低信噪比材料图像的分析问题中,实现了对材料复杂结构的鲁棒提取。


报告人简介:

许洪腾,佐治亚理工电气与计算机工程系,博士毕业生(将于2017年8月毕业)。主要从事机器学习及其在信号处理、数据挖掘和计算机视觉等方面应用的研究,重点包括点过程模型,流形学习和序列分析技术。获得6项发明专利,发表国际期刊和会议论文近30篇(包括ICML、CVPR、ICCV、IJCAI、AAAI、TKDE、TIP、TMM、TPAMI等)。基于研究成果获得了ICCV 2013和ICML 2016的旅行补助,并进入2016年度百度奖学金终选名单。


Hongteng Xu is a Ph.D. candidate in School of Electrical and Computer Engineering, Geor- gia Tech. His primary research interest is machine learning and its applications of signal processing, data mining and computer vision, especially point process models and learning algorithms for synchronous/asynchronous event sequences analysis and prediction. Hongteng has received several achievements for his research, including 6 patents and near 30 publications on top conferences (ICML, IJCAI, AAAI, CVPR, ICCV) and journals (TKDE, TIP, TMM, TPAMI), traveling award of ICCV 2013 and ICML 2016, and entering the finalist of Baidu Fellowship 2016. 


个人主页:https://sites.google.com/site/htxu313/


 

报告嘉宾5:任洲(Snap Research)

报告时间:2017年07月05日(星期三)晚上21:20-21:40(北京时间)

报告题目:Deep Reinforcement Learning-based Image Captioning with Embedding Reward

主持人:严骏驰(IBM,华东师范大学)


报告摘要:

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a “policy network” and a “value network” to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.


报告人简介:

任洲,Snap Research的研究科学家。2016年在加州大学洛杉矶分校获博士学位,师从Alan Yuille教授。2012年在南洋理工大学获研究硕士学位,师从袁浚菘教授。主要从事joint image-text representation learning, multi-modal content understanding, image classification, hand gesture recognition and shape analysis等方面的研究。发表国际期刊和会议论文近15篇(包括CVPR, ICCV, ACM Multimedia, BMVC, TPAMI, TMM等),4项发明专利。研究工作被引用超过1100次。研究成果获2016 IEEE Transaction on Multimedia Prize Paper Award (Best Paper Award), CVPR 2017 Best Student Paper Nomination。


Zhou Ren is now a Research Scientist at Snap Research. He received his Ph.D. degree from UCLA in 2016, working with Prof. Alan Yuille. Before joining UCLA, he obtained his M.Eng. degree from Nanyang Technological University in 2012, working with Prof. Junsong Yuan. His research interests include joint image-text representation learning, multi-modal content understanding, image classification, hand gesture recognition, shape analysis, etc. His research work have been cited over 1100 times, and been filed 4 patents. His work has won the “2016 IEEE Trans. on Multimedia Prize Paper Award (Best Paper Award)”, and has been nominated to the “CVPR 2017 Best Student Paper Award”.


个人主页:http://web.cs.ucla.edu/~zhou.ren/



特别鸣谢本次Webinar主要组织者:

VOOC责任委员:苏航(清华大学),高陈强(重庆邮电大学),

VODB协调理事:曹汛(南大)


活动参与方式:

1、VALSE Webinar活动全部网上依托VALSE QQ群的“群视频”功能在线进行,活动时讲者会上传PPT或共享屏幕,听众可以看到Slides,听到讲者的语音,并通过文字或语音与讲者交互;

2、为参加活动,需加入VALSE QQ群,目前A、B、C、D、E群已满,除讲者等嘉宾外,只能申请加入VALSE F群,群号:594312623 。申请加入时需验证姓名、单位和身份,缺一不可。入群后,请实名,姓名身份单位。身份:学校及科研单位人员T;企业研发I;博士D;硕士M

3、为参加活动,请下载安装Windows QQ最新版,群视频不支持非Windows的系统,如Mac,Linux等,手机QQ可以听语音,但不能看视频slides;

4、在活动开始前10分钟左右,主持人会开启群视频,并发送邀请各群群友加入的链接,参加者直接点击进入即可;

5、活动过程中,请勿送花、棒棒糖等道具,也不要说无关话语,以免影响活动正常进行;

6、活动过程中,如出现听不到或看不到视频等问题,建议退出再重新进入,一般都能解决问题;

7、建议务必在速度较快的网络上参加活动,优先采用有线网络连接。