专栏 | 【从零开始学习YOLOv3】7. 教你在YOLOv3模型中添加Attention机制

AI开发者 · 公众号 · AI · 2020-02-29 00:02

正文

点击上方“蓝字”关注“AI开发者”

本文来自 @BBuf 的社区专栏 GiantPandaCV ，文末扫码即可订阅专栏。

前言：【从零开始学习YOLOv3】系列越写越多，本来安排的内容比较少，但是在阅读代码的过程中慢慢发掘了一些新的亮点，所以不断加入到这个系列中。之前都在读YOLOv3中的代码，已经学习了cfg文件、模型构建等内容。本文在之前的基础上，对模型的代码进行修改，将之前Attention系列中的SE模块和CBAM模块集成到YOLOv3中。

1. 规定格式

正如 [convolutional] , [maxpool] , [net] , [route] 等层在cfg中的定义一样，我们再添加全新的模块的时候，要规定一下cfg的格式。做出以下规定：

在SE模块（具体讲解见: 【cv中的Attention机制】最简单最易实现的SE模块）中，有一个参数为 reduction ,这个参数默认是16，所以在这个模块中的详细参数我们按照以下内容进行设置：

[se]
reduction=16

在CBAM模块（具体讲解见: 【CV中的Attention机制】ECCV 2018 Convolutional Block Attention Module ）中，空间注意力机制和通道注意力机制中一共存在两个参数： ratio 和 kernel_size , 所以这样规定CBAM在cfg文件中的格式：

[cbam]
ratio=16
kernelsize=7

2. 修改解析部分

由于我们添加的这些参数都是自定义的，所以需要修改解析cfg文件的函数，之前讲过，需要修改 parse_config.py 中的部分内容：

def parse_model_cfg(path):
    # path参数为: cfg/yolov3-tiny.cfg
    ifnot path.endswith('.cfg'):
        path += '.cfg'
    ifnot os.path.exists(path) and \
    	   os.path.exists('cfg' + os.sep + path):
        path = 'cfg' + os.sep + path

    with open(path, 'r') as f:
        lines = f.read().split('\n')

    # 去除以#开头的，属于注释部分的内容
    lines = [x for x in lines if x andnot x.startswith('#')]
    lines = [x.rstrip().lstrip() for x in lines]
    mdefs = []  # 模块的定义
    for line in lines:
        if line.startswith('['):  # 标志着一个模块的开始
            '''
            eg:
            [shortcut]
            from=-3
            activation=linear
            '''
            mdefs.append({})
            mdefs[-1]['type'] = line[1:-1].rstrip()
            if mdefs[-1]['type'] == 'convolutional':
                mdefs[-1]['batch_normalize'] = 0
        else:
            key, val = line.split("=")
            key = key.rstrip()

            if'anchors'in key:
                mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2))
            else:
                mdefs[-1][key] = val.strip()

    # Check all fields are supported
    supported = ['type', 'batch_normalize', 'filters', 'size',\
                 'stride', 'pad', 'activation', 'layers', \
                 'groups','from', 'mask', 'anchors', \
                 'classes', 'num', 'jitter', 'ignore_thresh',\
                 'truth_thresh', 'random',\
                 'stride_x', 'stride_y']

    f = []  # fields
    for x in mdefs[1:]:
        [f.append(k) for k in x if k notin f]
    u = [x for x in f if x notin supported]  # unsupported fields
    assertnot any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)

    return mdefs

以上内容中，需要改的是supported中的字段，将我们的内容添加进去：

supported = ['type', 'batch_normalize', 'filters', 'size',\
            'stride', 'pad', 'activation', 'layers', \
            'groups','from', 'mask', 'anchors', \
            'classes', 'num', 'jitter', 'ignore_thresh',\
            'truth_thresh', 'random',\
            'stride_x', 'stride_y',\
            'ratio', 'reduction', 'kernelsize']

3. 实现SE和CBAM

具体原理还请见【cv中的Attention机制】最简单最易实现的SE模块和【CV中的Attention机制】ECCV 2018 Convolutional Block Attention Module 这两篇文章，下边直接使用以上两篇文章中的代码：

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

CBAM

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3,7), "kernel size must be 3 or 7"
        padding = 3if kernel_size == 7else1

        self.conv = nn.Conv2d(2,1,kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avgout = torch.mean(x, dim=1, keepdim=True)
        maxout, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avgout, maxout], dim=1)
        x = self.conv(x)
        return self.sigmoid(x)
    
class ChannelAttention(nn.Module):
    def __init__(self, in_planes, rotio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.sharedMLP = nn.Sequential(
            nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False), nn.ReLU(),
            nn.Conv2d(in_planes // rotio, in_planes, 1, bias=False))
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avgout = self.sharedMLP(self.avg_pool(x))
        maxout = self.sharedMLP(self.max_pool(x))
        return

专栏 | 【从零开始学习YOLOv3】7. 教你在YOLOv3模型中添加Attention机制

正文

本文来自 @BBuf 的社区专栏 GiantPandaCV ，文末扫码即可订阅专栏。

1. 规定格式

2. 修改解析部分

3. 实现SE和CBAM

请到「今天看啥」查看全文