专栏名称: 深度学习与计算机视觉

深度学习与计算机视觉碰撞出了新的火花，本公众号将坚持分享原创计算机视觉技术相关文章。主要分为实战教程、视觉领域最新咨询、国内外最新论文翻译三类。欢迎志同道合的朋友关注。

完整实现迷你VGG网络进行图像识别

深度学习与计算机视觉 · 公众号 · · 2024-05-29 18:28

正文

VGG网络是最流行的图像识别技术之一的基础。学习它是值得的，因为它打开了许多可能性。要理解VGGNet，你需要了解卷积神经网络（CNN）。

在本文中，我们将仅关注VGGNet的实现部分。因此，我们将在这里迅速进行。

关于VGG网络

VGGNet是一种能够更成功地提取特征的卷积神经网络（CNN）。在VGGNet中，我们堆叠多个卷积层。VGGNet可以是浅层或深层。

在浅层VGGNet中，通常只添加两组四个卷积层，我们将很快看到。而在深层VGGNet中，可以添加超过四个卷积层。两种常用的深层VGGNet是使用16层的VGG16和使用19层的VGG19。我们可以添加批量归一化层或避免添加。但是在本教程中，我将使用它。

你可以在此链接中了解更多关于该架构的信息：

https://viso.ai/deep-learning/vgg-very-deep-convolutional-networks

今天我们将致力于迷你VGGNet。因此，它将更简单，更容易运行，但对于许多用例仍然很强大。

迷你VGGNet的一个重要特征是，它使用所有的3x3过滤器。这就是它能够很好地概括的原因。让我们开始构建一个在Keras和TensorFlow中的迷你VGGNet。

我在这里使用了Google Colaboratory笔记本并启用了GPU。否则，训练速度会很慢。

迷你VGG网络的开发、训练和评估

是时候开始工作了。我们将通过一些实验来演示如何使用它。

这些是必要的导入：

import tensorflow as tf
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras import backend as K
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.optimizers import SGD
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

这是许多导入！

我们将使用TensorFlow中的cifar-10数据集，这是TensorFlow库中的一个公共数据集。

我尝试了两个不同的网络，只是作为一个实验。第一个是比较流行的网络。我说它很受欢迎，是因为我在Kaggle和其他一些教程中找到了这个架构。

class MiniVGGNet:
 @staticmethod
 def build(width, height, depth, classes):
  # initialize the model along with the input shape to be
  # "channels last" and the channels dimension itself
  model = Sequential()
  inputShape = (height, width, depth)
  chanDim = -1

  if K.image_data_format() == "channels_first":
   inputShape = (depth, height, width)
   chanDim = 1

  # first CONV => Activation => CONV => Activation => POOL layer set
  model.add(Conv2D(32, (3, 3), padding="same",
   input_shape=inputShape))
  model.add(Activation("relu"))
  model.add(BatchNormalization(axis=chanDim))
  model.add(Conv2D(32, (3, 3), padding="same"))
  model.add(Activation("relu"))
  model.add(BatchNormalization(axis=chanDim))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Dropout(0.25))

  # second CONV => Activation => CONV => Activation => POOL layer set
  model.add(Conv2D(64, (3, 3), padding="same"))
  model.add(Activation("relu"))
  model.add(BatchNormalization(axis=chanDim))
  model.add(Conv2D(64, (3, 3), padding="same"))
  model.add(Activation("relu"))
  model.add(BatchNormalization(axis=chanDim))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Dropout(0.25))

  # Dense Layer
  model.add(Flatten())
  model.add(Dense(512))
  model.add(Activation("relu"))
  model.add(BatchNormalization())
  model.add(Dropout(0.5))

  # softmax classifier
  model.add(Dense(classes))
  model.add(Activation("softmax"




    
))

  # return the constructed network architecture
  return model

让我们加载并准备我们的cifar-10数据集。

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype("float") / 255.0 
x_test = x_test.astype("float") / 255.0

cifar-10数据集有10个标签。这是cifar-10数据集中的标签：

labelNames = ["airplane", "automobile", "bird", "cat", "deer",
 "dog", "frog", "horse", "ship", "truck"]

使用LabelBinarizer对标签进行二值化：

lb = LabelBinarizer()
y_train = lb.fit_transform(y_train)
y_test = lb.transform(y_test)

在此编译模型。评估指标是“准确性”，我们将运行10个时期。

optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01, decay=0.01/40, momentum=0.9,
                                           nesterov=True)
model = miniVGGNet.build(width = 32, height = 32, depth = 3, classes=10)
model.compile(loss='categorical_crossentropy', optimizer = optimizer,
              metrics=['accuracy'])
h = model.fit(x_train, y_train, validation_data=(x_test, y_test),
              batch_size = 64, epochs=10, verbose=1)

这是结果：

Epoch 1/10
782/782 [==============================] - 424s 539ms/step - loss: 1.6196 - accuracy: 0.4592 - val_loss: 1.4083 - val_accuracy: 0.5159
Epoch 2/10
782/782 [==============================] - 430s 550ms/step - loss: 1.1437 - accuracy: 0.6039 - val_loss: 1.0213 - val_accuracy: 0.6505
Epoch 3/10
782/782 [==============================] - 430s 550ms/step - loss: 0.9634 - accuracy: 0.6618 - val_loss: 0.8495 - val_accuracy: 0.7013
Epoch 4/10
782/782 [==============================] - 427s 546ms/step - loss: 0.8532 - accuracy: 0.6998 - val_loss: 0.7881 - val_accuracy: 0.7215
Epoch 5/10
782/782 [==============================] - 425s 543ms/step - loss: 0.7773 - accuracy: 0.7280 - val_loss: 0.8064 - val_accuracy: 0.7228
Epoch 6/10
782/782 [==============================] - 421s 538ms/step - loss: 0.7240 - accuracy: 0.7451 - val_loss: 0.6757 - val_accuracy: 0.7619
Epoch 7/10
782/782 [==============================] - 420s 537ms/step - loss: 0.6843 - accuracy: 0.7579 - val_loss: 0.6564 - val_accuracy: 0.7715
Epoch 8/10
782/782 [==============================] - 420s 537ms/step - loss: 0.6405 - accuracy: 0.7743 - val_loss: 0.6833 - val_accuracy: 0.7706
Epoch 9/10
782/782 [==============================] - 422s 540ms/step - loss: 0.6114 - accuracy: 0.7828 - val_loss: 0.6188 - val_accuracy: 0.7848
Epoch 10/10
782/782 [==============================] - 421s 538ms/step - loss: 0.5799 - accuracy: 0.7946 - val_loss: 0.6166 - val_accuracy: 0.7898

经过10个时期，准确性在训练数据上为79.46%，在验证数据上为78.98%。

谨记这一点，我想在这个网络中改变一些东西并看到结果。让我们重新定义上面的网络。我在整个网络中使用了64个过滤器，密集层中有256个神经元，最后一个dropout层中有40%的丢失率。

这是新的迷你VGG网络：

class miniVGGNet:
  @staticmethod 

  def build(width, height, depth, classes):
    model = Sequential()
    inputShape = (height, width, depth)
    chanDim = -1 

    if K.image_data_format() == "channels_first":
      inputShape = (depth, height, width)
      chanDim = 1

    # first Conv => Activation => Conv => Activation => Pool layer set
    model.add(Conv2D(64, (3, 3), padding="same",
   input_shape=inputShape))
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=chanDim))
    model.add(Conv2D(64, (3, 3), padding="same"))
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=chanDim))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

  # second Conv => Activation => Conv => Activation => Pool layer set

完整实现迷你VGG网络进行图像识别

正文

关于VGG网络

迷你VGG网络的开发、训练和评估

请到「今天看啥」查看全文