专栏名称: 机器学习研究会
机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织,旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外,协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。
目录
相关文章推荐
新智元  ·  历史分水岭:DeepSeek ... ·  昨天  
宝玉xp  ·  这篇刊登在自然杂志对 Deep ... ·  昨天  
机器之心  ·  感谢DeepSeek,ChatGPT开始公开 ... ·  昨天  
51好读  ›  专栏  ›  机器学习研究会

【学习】实时自适应图像压缩

机器学习研究会  · 公众号  · AI  · 2017-05-19 19:53

正文



点击上方 “机器学习研究会” 可以订阅哦
摘要

转自:网路冷眼

Image compression is an important step towards our long-term goals at WaveOne. We are excited to share some early results of our research which will appear at ICML 2017. You can find the paper on arXiv .

As of today, WaveOne image compression outperforms all commercial codecs and research approaches known to us on standard datasets where comparison is available. Furthermore, with access to a GPU our codec runs orders of magnitude faster than other recent ML-based solutions: for example, we typically encode or decode the Kodak dataset at over 100 images per second.

Here is how different image codecs compress an example 480x480 image to a file size of 2.3kB (~0.08 bits per pixel):

JPEG

JPEG 2000

WebP

WaveOne


Now is the time to rethink compression

Even though over 70% of internet traffic today is digital media, the way images and video are represented and transmitted has not evolved much in the past 20 years (apart from Pied Piper 's Middle-Out algorithm). It has been challenging for existing commercial compression algorithms to adapt to the growing demand and the changing landscape of transmission settings and requirements. The available codecs are ''one-size-fits-all'': they are hard-coded, and cannot be customized to particular use cases beyond high-level hyperparameter tuning.

At the same time, in the last few years, deep learning has revolutionized a variety of fields, such as machine translation, object recognition/detection, speech recognition, and photo-realistic image generation. Even though the world of compression seems a natural domain for deep learning, it has not yet fully benefited from these advancements, for two main reasons:

  1. Our deep learning primitives, in their raw forms, are not well-suited for constructing compact representations. Vanilla autoencoder architectures cannot compete with the carefully engineered and highly tuned traditional codecs.

  2. It has been particularly challenging to develop a deep learning approach which can run in real-time in environments constrained by computation power, memory footprint and battery life.

Fortunately, the ubiquity of deep learning has been catalyzing the development of hardware architectures for neural network acceleration. In the years ahead, we foresee dramatic improvements in speed, power consumption and widespread availability of neural network hardware, and this will enable the proliferation of codecs based on deep learning.

In the paper, we present progress on both performance and computational feasibility of ML-based image compression.


WaveOne compression performance

Here are some more detailed results from our paper (click to enlarge):


Performance on the Kodak PhotoCD dataset measured in the RGB domain (dataset and colorspace chosen to enable comparison to other approaches; see the paper for more extensive results). The plot on the left presents average reconstruction quality as function of the number of bits per pixel fixed for each image. The plot on the right shows average compressed file sizes relative to ours for different target MS-SSIM values.


Real-time performance is critically important to us, and we carefully designed and optimized our architecture to make things fast. Here's the raw encode/decode performance:

Average times to encode and decode images from the RAISE-1k 512×768 dataset using the WaveOne codec on the
NVIDIA GTX 980 Ti GPU and batch size of 1.

While we are slightly faster than JPEG (libjpeg) and significantly faster than JPEG 2000, WebP and BPG, our codec runs on a GPU and traditional codecs do not — so we do not show this comparison. We also don't have access to the full code of recent ML approaches to compare.


Domain-adaptive compression

Here is an interesting tidbit not discussed in the paper. Since our compression algorithm is learned rather than hard-coded, we can easily train codecs custom-tailored to specific domains. This enables capturing particular structure that the traditional one-size-fits-all codecs would not be able to characterize.

To illustrate this, we created a custom codec specifically to compress aerial views. To do this, we only replaced our training dataset: we preserved exactly the same architecture and training procedure we followed for our generic image codec. This aerial codec was able to achieve another 20% boost in compression over our generic codec on our aerial view test set.

JPEG

JPEG 2000

WaveOne Aerial







请到「今天看啥」查看全文