专栏名称: 机器学习研究会

机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织，旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外，协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。

【推荐】深度学习卫星图像分割(Kaggle竞赛第四名)

机器学习研究会 · 公众号 · AI · 2017-04-16 19:05

正文

点击上方 “机器学习研究会” 可以订阅哦

摘要

转自：爱可可-爱生活

In the recent Kaggle competition Dstl Satellite Imagery Feature Detection our deepsense.io team won 4th place among 419 teams. We applied a modified U-Net – an artificial neural network for image segmentation. In this blog post we wish to present our deep learning solution and share the lessons that we have learnt in the process with you.

Competition

The challenge was organized by the Defence Science and Technology Laboratory (Dstl), an Executive Agency of the United Kingdom’s Ministry of Defence on Kaggle platform. As a training set, they provided 25 high-resolution satellite images representing 1 km ² areas. The task was to locate 10 different types of objects:

Buildings
Miscellaneous manmade structures
Roads
Tracks
Trees
Crops
Waterway
Standing water
Large vehicles
Small vehicles

Sample image from the training set with labels.

These objects were not completely disjoint – you can find examples with vehicles on roads or trees within crops. The distribution of classes was uneven: from very common, such as crops (28% of the total area) and trees (10%), to much smaller such as roads (0.8%) or vehicles (0.02%). Moreover, most images only had a subset of classes.

Correctness of prediction was calculated using Intersection over Union (IoU, known also as Jaccard Index) between predictions and the ground truth. A score of 0 meant complete mismatch, whereas 1 – complete overlap. The score result was calculated for each class separately and then averaged. For our solution the average IoU was 0.46, whereas for the winning solution it was 0.49.

Preprocessing

For each image we were given three versions: grayscale, 3-band and 16-band. Details are presented in the table below:

Type	Wavebands	Pixel resolution	#channels	Size
grayscale	Panchromatic	0.31 m	1	3348 x 3392
3-band	RGB	0.31 m	3	3348 x 3392
16-band	Multispectral	1.24 m	8	837 x 848
16-band	Short-wave infrared	7.5 m	8	134 x 136

We resized and aligned 16-band channels to match those from 3-band channels. Alignment was necessary to remove shifts between channels. Finally all channels were concatenated into single 20-channels input image.

Model

Our fully convolutional model was inspired by the family of U-Net architectures, where low-level feature maps are combined with higher-level ones, which enables precise localization. This type of network architecture was especially designed to effectively solve image segmentation problems. U-Net was the default choice for us and other competitors. If you would like more insights into architecture we suggest that you read the original paper . Our final architecture is depicted below: