专栏名称: 新机器视觉

最前沿的机器视觉与计算机视觉技术

使用HybridSN进行高光谱图像分类

新机器视觉 · 公众号 · AI 科技媒体 · 2024-12-08 23:43

主要观点总结

文章介绍了混合网络（HybridSN）在高光谱图像（HSI）分类中的应用，这种模型结合了三维和二维卷积，通过提取空间-光谱特征并学习更多抽象层次的空间特征，以提高分类性能。文章还讨论了高光谱图像的特点，如多通道、丰富的目标信息以及高光谱成像技术的广泛应用。此外，文章还探讨了使用注意力机制和Batch Normalization对模型性能的影响，并得出结论：虽然注意力机制和Batch Normalization可以提高模型性能，但它们的组合可能会产生互斥效应，导致性能下降。最后，文章对HybridSN模型进行了深入的分析和讨论，并给出了进一步的实验方向。

关键观点总结

关键观点1: 混合网络（HybridSN）在高光谱图像分类中的应用

HybridSN结合了三维和二维卷积，提高了分类性能并降低了计算复杂度。

关键观点2: 高光谱图像的特点

高光谱图像具有多通道、丰富的目标信息，以及广泛应用的高光谱成像技术。

关键观点3: 注意力机制和Batch Normalization对模型性能的影响

注意力机制和Batch Normalization可以提高模型性能，但它们的组合可能会产生互斥效应，导致性能下降。

关键观点4: 对HybridSN模型的进一步分析和讨论

HybridSN模型通过三维和二维卷积的混合使用，实现了HSI分类，并通过实验验证了其性能和优点。

关键观点5: 实验方向

未来可以尝试调整模块顺序、改变参数设置，以探索更高效的HSI分类模型。

正文

一、前言

高光谱图像（Hyperspectral image，以下简称HSI）分类广泛应用于遥感图像的分析，随着深度学习和神经网络的兴起，越来越多的人使用二维CNN对HSI进行分类，而HSI分类性能却高度依赖于空间和光谱信息，由于计算复杂度增加，很少有人将三维CNN应用于HSI分类中。

这篇 Exploring 3-D–2-D CNN Feature Hierarchy for Hyperspectral Image Classification构建一种 混合网络(HybridSN) 解决了HSI分类所遇到的问题，它首先 用三维CNN提取空间-光谱的特征 ，然后在三维CNN基础上 进一步使用二维CNN学习更多抽象层次的空间特征 ，这与单独使用三维CNN相比，混合的CNN模型既降低了复杂性，也提升了性能。经实验证明，使用HybridSN进行HSI分类，能够获得非常不错的效果。

二、高光谱图像

在进行高光谱图像分类之前，有必要了解什么是高光谱图像。从计算机的角度来说，高光谱图像（Hyperspectral image）就是 由多通道（几十甚至几百个）的数组构成的图像 ，每个像素点都有很多的数来描述， 单个通道上的“灰度值”反映了被拍摄对象对于某一波段的光的反射情况 。

我们知道，常见的RGB彩色图像只有三个通道，而高光谱图像有几十甚至几百个通道，所以高光谱图像包含包含更多的目标信息，利用高光谱图像进行目标的分类识别也必然比采用RGB图像具有更高的准确度。

如下图所示，利用高光谱相机可以拍摄出由不同波长组成的空间立方体图像（即高光谱图像），用一个光谱曲线将其显示出来， 横轴表示波长，纵轴表示反射系数 ，由于同一物体对不同波长的光反射因子不一样，因此利用高光谱图像更能反映出不同物体的差异性。

其实高光谱成像技术在很早以前就已经被广泛应用了，天上的卫星拍摄到的就是高光谱图像，通过分析每个像素点的光谱曲线，可以把不同地面目标对应的像素点分类，从而在高光谱图像中把地面、建筑物、草坪、江河等等区分开。

三、HybridSN模型

对于HSI分类问题，我们在提取空间信息的同时，也希望能获取到不同波长的光谱信息，而二维CNN是无法处理光谱信息的，也就无法提取到更具有判别性的特征图。

幸运的是，三维CNN能够同时提取光谱和空间的特征，但代价是增加计算复杂度。为了充分发挥二维和三维CNN的优势，Swalpa Kumar Roy等人提出了HSI分类模型HybridSN，其模型图如下图所示，它 由三个三维卷积、一个二维卷积和三个全连接层组成 。

模型的详细参数配置如下表所示，可以看出，第一个FC层（即dense1)参数量最多，最后一个全连接层（dense3）的输出为16，这是因为Indian Pines (IP)数据集的 类别数为16 。HybridSN中可训练的权重参数总数为5122176，所有参数都是随机初始化的，使用Adam优化器，交叉熵损失函数，学习率为0.001，batch大小为128，训练100个epoch。

下面是实现的 HybridSN模型 ：

class_num = 16
class HybridSN(nn.Module):  
  def __init__(self, in_channels=1, out_channels=class_num):
    super(HybridSN, self).__init__()
    self.conv3d_features = nn.Sequential(
        nn.Conv3d(in_channels,out_channels=8,kernel_size=(7,3,3)),
        nn.ReLU(),
        nn.Conv3d(in_channels=8,out_channels=16,kernel_size=(5,3,3)),
        nn.ReLU(),
        nn.Conv3d(in_channels=16,out_channels=32,kernel_size=(3,3,3)),
        nn.ReLU()
    )

    self.conv2d_features = nn.Sequential(
        nn.Conv2d(in_channels=32 * 18, out_channels=64, kernel_size=(3,3)),
        nn.ReLU()
    )

    self.classifier = nn.Sequential(
        nn.Linear(64 * 17 * 17, 256),
        nn.ReLU(),
        nn.Dropout(p=0.4),
        nn.Linear(256, 128),
        nn.ReLU(),
        nn.Dropout(p=0.4),
        nn.Linear(128, 16)
    )
 
  def forward(self, x):
    x = self.conv3d_features(x)
    x = x.view(x.size()[0],x.size()[1]*x.size()[2],x.size()[3],x.size()[4])
    x = self.conv2d_features(x)
    x = x.view(x.size()[0],-1)
    x = self.classifier(x)
    return x

带有Batch Normalization的HybridSN模型：

class HybridSN_BN(nn.Module):  
  def __init__(self, in_channels=1, out_channels=class_num):
    super(HybridSN_BN, self).__init__()
    self.conv3d_features = nn.Sequential(
        nn.Conv3d(in_channels,out_channels=8,kernel_size=(7,3,3)),
        nn.BatchNorm3d(8),
        nn.ReLU(),
        nn.Conv3d(in_channels=8,out_channels=16,kernel_size=(5,3,3)),
        nn.BatchNorm3d(16),
        nn.ReLU(),
        nn.Conv3d(in_channels=16,out_channels=32,kernel_size=(3,3,3)),
        nn.BatchNorm3d(32),
        nn.ReLU()
    )

    self.conv2d_features = nn.Sequential(
        nn.Conv2d(in_channels=32 * 18, out_channels=64, kernel_size=(3,3)),
        nn.BatchNorm2d(64),
        nn.ReLU()
    )

    self.classifier = nn.Sequential(
        nn.Linear(64 * 17 * 17, 256),
        nn.ReLU(),
        nn.Dropout(p=0.4),
        nn.Linear(256, 128),
        nn.ReLU(),
        nn.Dropout(p=0.4),
        nn.Linear(128, 16)
    )
 
  def forward(self, x):
    x = self.conv3d_features(x)
    x = x.view(x.size()[0],x.size()[1]*x.size()[2],x.size()[3],x.size()[4])
    x = self.conv2d_features(x)
    x = x.view(x.size()[0],-1)
    x = self.classifier(x)
    return x

上面我实现了两种模型，一种是原始的HybridSN模型，另一种是带有Batch Normalization的HybridSN模型，下面还会再实现另外两种模型。

四、注意力机制

为了提升HSI分类模型的性能，也实现了带有注意力机制的HybridSN模型进行训练，这里采用CBAM: Convolutional Block Attention Module的空间注意力和通道注意力机制。

（1）Channnel attetion module(通道注意力模块)

通道注意力模是解决 look what 的问题，主要是探索不同通道之间特征图的关系，通过分配各个卷积通道上的资源，使模型更应该注意哪一部分特征。通道注意力的过程如下：

首先使用MaxPool和AvgPool聚合两个空间维度上的特征，实现时可以用 AdaptiveAvgPool2d 和 AdaptiveMaxPool2d 保证尺寸不变
然后通过共享的MLP层，即FC+Relu+FC层，学习每个通道的权重，再将两个特征图相加，后接一个sigmoid函数。
最后将结果与未经channel attention的原始输入相乘，从而得到的新的特征图。

Channnel attetion module实现如下：

# 参考 https://github.com/luuuyi/CBAM.PyTorch
class ChannelAttention(nn.Module):
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.fc1   = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False)
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
        max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
        out = avg_out + max_out
        return self.sigmoid(out)

（2）Spatial attention module(空间注意力模块)

空间注意力模块解决的是 look where 的问题，通过对特征图每个位置进行二维调整（即attention调整），使模型关注到值得更多关注的区域上。空间注意力的过程如下：

首先对不同特征图上相同位置的像素值进行全局的MaxPooling和AvgPooling操作，分别得到两个spatial attention map。
将这两个特征图concatenate，通过7*7的卷积核对这个feature map进行卷积操作，后接一个sigmoid函数。
最后把得到的空间注意力特征图与未经Spatial attention的原始输入相乘，得到的新的特征图。

Spatial attention module实现如下：

# 参考 https://github.com/luuuyi/CBAM.PyTorch
class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()

        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1

        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        return self.sigmoid(x)

加上注意力机制的HybridSN模型如下：

class_num = 16
class HybridSN_Attention(nn.Module):  
  def __init__(self, in_channels=1, out_channels=class_num):
    super(HybridSN_Attention, self).__init__()
    self.conv3d_features = nn.Sequential(
        nn.Conv3d(in_channels,out_channels=8,kernel_size=(7,3,3)),
        nn.ReLU(),
        nn.Conv3d(in_channels=8,out_channels=16,kernel_size=(5,3,3)),
        nn.ReLU(),
        nn.Conv3d(in_channels=16,out_channels=32,kernel_size=(3,3,3)),
        nn.ReLU()
    )
	# 通道和空间注意力
    self.ca = ChannelAttention(32 * 18)
    self.sa = SpatialAttention()

    self.conv2d_features = nn.Sequential(
        nn.Conv2d(in_channels=32 * 18, out_channels=64, kernel_size=(3,3)),
        nn.ReLU()
    )

    self.classifier = nn.Sequential(
        nn.Linear(64 * 17 * 17, 256),
        nn.ReLU(),
        nn.Dropout(p=0.4),
        nn.Linear(256, 128),
        nn.ReLU(),
        nn.Dropout(p=0.4),
        nn.Linear(128, 16)
    )
 
  def forward(self, x):
    x = self.conv3d_features(x)
    x = x.view(x.size()[0],x.size()[1]*x.size()[2],x.size()[3],x.size()[4])

    x = self.ca(x) * x
    x = self.sa(x) * x

    x = self.conv2d_features(x)
    x = x.view(x.size()[0],-1)
    x = self.classifier(x)
    return x

加上Batch Normalization、注意力机制的HybridSN模型如下：

class HybridSN_BN_Attention(nn.Module):  
  def __init__(self, in_channels=1, out_channels=class_num):
    super(HybridSN_BN_Attention, self).__init__()
    self.conv3d_features = nn.Sequential(
        nn.Conv3d(in_channels,out_channels=8,kernel_size=(7,3,3)),
        nn.BatchNorm3d(8),
        nn.ReLU(),
        nn.Conv3d(in_channels=8,out_channels=16,kernel_size=(5,3,3)),
        nn.BatchNorm3d(16),
        nn.ReLU(),
        nn.Conv3d(in_channels=16,out_channels=32,kernel_size=(3,3,3)),
        nn.BatchNorm3d(32),
        nn.ReLU()
    )

    self.ca = ChannelAttention(32 * 18)
    self.sa = SpatialAttention()

    self.conv2d_features = nn.Sequential(
        nn.Conv2d(in_channels=32 * 18, out_channels=64, kernel_size=(3,3)),
        nn.BatchNorm2d(64),
        nn.ReLU()
    )


    self.classifier = nn.Sequential(
        nn.Linear(64 * 17 * 17, 256),
        nn.ReLU(),
        nn.Dropout(p=0.4),
        nn.Linear(256, 128),
        nn.ReLU(),
        nn.Dropout(p=0.4),
        nn.Linear(128, 16)
    )
 
  def forward(self, x):
    x = self.conv3d_features(x)
    x = x.view(x.size()[0],x.size()[1]*x.size()[2],x.size()[3],x.size()[4])

    x = self.ca(x) * x
    x = self.sa(x) * x

    x = self.conv2d_features(x)
    x = x.view(x.size()[0],-1)
    x = self.classifier(x)
    return x

五、开始实验

上面，共实现了四种HybridSN模型，分别是：

原始的HybridSN模型
加上Batch Normalization的HybridSN模型：HybridSN_BN
加上通道和空间注意力机制的HybridSN模型：HybridSN_Attention
加上Batch Normalization、通道和空间注意力机制的HybridSN模型：HybridSN_BN_Attention

下面将分别用这四种模型 测试Indian Pines数据集 ，并分析结果。

5.1 下载数据集

Indian Pines 是最早的用于HSI分类的数据集，该数据集有尺寸为 145×145 的空间图像和 224 个波长范围为 400～2500nm 的光谱反射谱带，由于第 104~108、150-163 和第 220 个波段不能被水反射，因此一般使用的是剔除了这 20 个波段 后剩下的 200 个波段作为测试的对象。该数据集共有 16类庄稼 ，用不同的颜色标出。可通过如下方式下载数据集：

1
2
3

! wget http://www.ehu.eus/ccwintco/uploads/6/67/Indian_pines_corrected.mat
! wget http://www.ehu.eus/ccwintco/uploads/c/c4/Indian_pines_gt.mat
! pip install spectral

5.2 PCA降维

下面是 PCA降维及3D-patch的实现过程 ：

# 对高光谱数据 X 应用 PCA 变换
def applyPCA(X, numComponents):
    newX = np.reshape(X, (-1, X.shape[2]))
    pca = PCA(n_components=numComponents, whiten=True)
    newX = pca.fit_transform(newX)
    newX = np.reshape(newX, (X.shape[0], X.shape[1], numComponents))
    return newX

# 对单个像素周围提取 patch 时，边缘像素就无法取了，因此，给这部分像素进行 padding 操作
def padWithZeros(X, margin=2):
    newX = np.zeros((X.shape[0] + 2 * margin, X.shape[1] + 2* margin, X.shape[2]))
    x_offset = margin
    y_offset = margin
    newX[x_offset:X.shape[0] + x_offset, y_offset:X.shape[1] + y_offset, :] = X
    return newX

# 在每个像素周围提取 patch ，然后创建成符合 keras 处理的格式
def createImageCubes(X, y, windowSize=5, removeZeroLabels = True):
    # 给 X 做 padding
    margin = int((windowSize - 1) / 2)
    zeroPaddedX = padWithZeros(X, margin=margin)
    # split patches
    patchesData = np.zeros((X.shape[0] * X.shape[1], windowSize, windowSize, X.shape[2]))
    patchesLabels = np.zeros((X.shape[0] * X.shape[1]))
    patchIndex = 0
    for r in range(margin, zeroPaddedX.shape[0] - margin):
        for c in range(margin, zeroPaddedX.shape[1] - margin):
            patch = zeroPaddedX[r - margin:r + margin + 1, c - margin:c + margin + 1]   
            patchesData[patchIndex, :, :, :] = patch
            patchesLabels[patchIndex] = y[r-margin, c-margin]
            patchIndex = patchIndex + 1
    if removeZeroLabels:
        patchesData = patchesData[patchesLabels>0,:,:,:]
        patchesLabels = patchesLabels[patchesLabels>0]
        patchesLabels -= 1
    return patchesData, patchesLabels
def splitTrainTestSet(X, y, testRatio, randomState=345):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=testRatio, random_state=randomState, stratify=y)
    return X_train, X_test, y_train, y_test

然后， 创建数据集加载类 ：

''' Training dataset'''
class TrainDS(torch.utils.data.Dataset): 
    def __init__(self):
        self.len = Xtrain.shape[0]
        self.x_data = torch.FloatTensor(Xtrain)
        self.y_data = torch.LongTensor(ytrain)        
    def __getitem__(self, index):
        # 根据索引返回数据和对应的标签
        return self.x_data[index], self.y_data[index]
    def __len__(self): 
        # 返回文件数据的数目
        return self.len

''' Testing dataset'''
class TestDS(torch.utils.data.Dataset): 
    def __init__(self):
        self.len = Xtest.shape[0]
        self.x_data = torch.FloatTensor(Xtest)
        self.y_data = torch.LongTensor(ytest)
    def __getitem__(self, index):
        # 根据索引返回数据和对应的标签
        return self.x_data[index], self.y_data[index]
    def __len__(self): 
        # 返回文件数据的数目
        return self.len

5.3 训练模型

为了更高效的分析训练结果，创建了一个训练和测试的方法，然后将上面提到的四种模型作为参数进行训练。

（1）训练方法

def train(net):
  current_loss_his = []
  current_Acc_his = []

  best_net_wts = copy.deepcopy(net.state_dict())
  best_acc = 0.0

  criterion = nn.CrossEntropyLoss()
  optimizer = optim.Adam(net.parameters(), lr=0.001)

  # 开始训练
  total_loss = 0
  for epoch in range(100):
      net.train()  # 将模型设置为训练模式
      for i, (inputs, labels) in enumerate(train_loader):
          inputs = inputs.to(device)
          labels = labels.to(device)
          # 优化器梯度归零
          optimizer.zero_grad()
          # 正向传播 +　反向传播 + 优化 
          outputs = net(inputs)
          loss = criterion(outputs, labels)
          loss.backward()
          optimizer.step()
          total_loss += loss.item()

      net.eval()   # 将模型设置为验证模式
      current_acc = test_acc(net)
      current_Acc_his.append(current_acc)

      if current_acc > best_acc:
        best_acc = current_acc
        best_net_wts = copy.deepcopy(net.state_dict())

      print('[Epoch: %d]   [loss avg: %.4f]   [current loss: %.4f]  [current acc: %.4f]' %(epoch + 1, total_loss/(epoch+1), loss.item(), current_acc))
      current_loss_his.append(loss.item())

  print('Finished Training')
  print('Best Acc:%.4f' %(best_acc))

  # load best model weights
  net.load_state_dict(best_net_wts)

  return net,current_loss_his,current_Acc_his

（2）测试方法

def test_acc(net):
  count = 0
  # 模型测试
  for inputs, _ in test_loader:
      inputs = inputs.to(device)
      outputs = net(inputs)
      outputs = np.argmax(outputs.detach().cpu().numpy(), axis=1)
      if count == 0:
          y_pred_test =  outputs
          count = 1
      else:
          y_pred_test = np.concatenate( (y_pred_test, outputs) )

  # 生成分类报告
  classification = classification_report(ytest, y_pred_test, digits=4)
  index_acc = classification.find('weighted avg')
  accuracy = classification[index_acc+17:index_acc+23]
  return float(accuracy)

5.4 可视化结果

HybridSN、HybridSN_BN、HybridSN_Attention、HybridSN_BN_Attention的训练结果如下：

（1）四种模型的Loss下降曲线

（2）四种模型的Accuracy变化曲线

（3）四种模型最佳Precision、Recall、F1-Score

模型	Accuracy	Recall	F1-Score
HybridSN	0.9790	0.9788	0.9786
HybridSN_BN	0.9897	0.9888	0.9888
HybridSN_Attention	0.9807	0.9806	0.9805
HybridSN_BN_Attention	0.9885	0.9884