VGG

研究背景：

小卷积核
多尺度
加深网络深度

单尺度测试结果对比结论：

LRN对网络性能提升没有帮助
对于同一个网络结构多尺度训练可以提高网络精度
一定程度加深网络可以提高网络精度

多尺度测试结果对比结论：

对比单尺度预测，多尺度综合预测，能够提升预测的精度
尺度抖动scale jittering(多尺度训练、多尺度测试)有利于网络性能的提升

感受野

定义：卷积神经网络每一层输出的特征图上的像素点在输入图片上映射的区域大小，即特征图上的一个点跟原图上有关系的点的区域。

原始输入：7x7

conv1：3x3 stride=1 valid

　　(input-k+2padding)/stride+1

feature：（7-3）/1+1=5

conv1：5x5，图上的一个点由原始输入的3x3个点卷积得到，所以感受野为3

conv2：3x3 stride=1 valid

feature：（5-3）/1+1=3

conv2：3x3，图上的一个点由conv1的3x3卷积得到，而conv1上的一个点由原始输入的3X3个点得到，步长为1，横向观察，可以发现conv2的一个点可以由原始输入的5x5个点得到，故conv2层的感受野为5

如果conv1：5X5 stride=1,对应conv1层的感受野为5

一个5x5的卷积核感受野的大小与两个3x3卷积核的感受野等效

一个7x7的卷积核感受野的大小与三个3x3卷积核的感受野等效

感受野计算公式：

网络结构：

5个卷积块，小卷积核堆积，通道数量依次增加一倍，卷积结束后进行maxpool，三个全连接层

为什么使用3*3卷积核？

深度更深并增加了非线性
参数量减少参数量=(kernel*kenel*channel)*卷积核个数

1*1卷积核的作用？

一种为决策增加非线性因素的方式
因为卷积后边通常跟着一层relu

训练数据处理：

各向同性的缩放（等比缩放：图像长宽缩放比例相同）训练图像最小边

resize：各项异性缩放，不考虑物体是否会发生形变

先将图片进行等比变化，最小边长度为256，然后对等比变化后的图像进行随机裁剪成224*224，之后将裁剪得到的图像块进行随机水平翻转与RGB颜色转换

目的：增加数据量，使网络不容易发生过拟合，可以提高网络准确度和泛化能力

import numpy as np
from PIL import Image
S = 256
crop_size = 224
image_path = ''
or_image = Image.open('image_path')#加载图片
image_w, image_h = or_image.size#获取原始图片宽高
scale = S/(min(image_w,image_h))#0.4 将图片最小边缩小为224
new_w = np.int(image_w*scale)#计算等比缩放后图像的长宽
new_h = np.int(image_h*scale)
print(new_w, new_h)#256 341
resize_image = or_image.resi ze((new_w, new_h), Image.BICUBIC)#使用线性差值方式缩放
max_offset_width = np.random.randint(low=0, high=new_w-crop_size+1,dtype='int32')
max_offset_height = np.random.randint(low=0, high=new_w-crop_size+1,dtype='int32')

超参数设置：

批量大小：batchsize = 256

权重衰减：weight decay = 5*10^(-4)

学习率：learning rate = 0.01 衰减因子为0.1

动量：momentum = 0.9（优化方式为带动量的SGD）

迭代步数：370K

轮数：epoches = 75

卷积核初始化方式：均值为0 方差为1的高斯分布（深层网络使用浅层网络权重初始化）

偏置初始化方式：初始化为0

全连接层初始化方式：高斯分布（std=0.005），bias常数初始化（0.1）

特点：

小卷积核
小池化核 Alexnet采用3*3池化核，VGG采用2*2池化核
层数更深
全连接转卷积使其可以接收任意大小作为输入

数据增强代码

#计算将图片最小边缩放为目标尺寸(smallest_side)时图像的宽高为多少
#即返回目标宽高
def _smallest_size_at_least(height, width, smallest_side):
    smallest_side = tf.convert_to_tensor(smallest_side, dtype=tf.int32)
    height = tf.to_float(height)
    width = tf.to_float(width)
    smallest_side = tf.float(smallest_side)
    scale = tf.cond(tf.greater(height,width),lambda:smallest_side/width,lambda:smallest_side/height)
    new_height = tf.cast(tf.rint(height*scale),tf.int32)
    new_width = tf.cast(tf.rint(width*scale),tf.int32)
    return new_height,new_width

#对图片进行等比变化，等比缩放后最小边长为smallest_side
def _aspect_preserving_resize(image,smallest_side):
    smallest_side = tf.convert_to_tensor(smallest_side,dtype=tf.int32)
    shape = tf.shape(image)
    height = shape[0]
    width = shape[1]
    new_height, new_width = _smallest_size_at_least(height,width,smallest_side)
    image = tf.image.convert_image_dtype(image,dtype=tf.float32)
    resized_image = tf.image.resize_images(image,[new_height,new_width],method=tf.image.ResizeMethod.BICUBIC)
    return resized_image

#随机裁剪224*224的图像块
def random_crop(image,crop_height,crop_width):
    original_shape = tf.reshape(img_data_jpg)
    #判断original_shape的维度是否为3，如果是 不抛出异常
    rank_assertion = tf.Assert(tf.equal(tf.rank(img_data_jpg),3),['Rank of image must be equal to 3]）
    #cropped_shape = [224,224,3]
    with tf.control_dependencies([rank_assertion]):
        cropped_shape = tf.stack([crop_height,crop_width,original_shape[2])
    #判断原图大小与剪切图像块的大小，若原图小于图像块尺寸，抛出异常
    size_assertion = tf.Assert(tf.logical_and(tf.greater_equal(original_shape[0],crop_height),tf.greater_equal(original_shape[1],crop_width)),['Crop size greater than the image size.'])
    #确定最大裁剪范围
    max_offset_height = tf.reshape(original_shape[0] - crop_height + 1,[])
    max_offset_width = tf.reshape(orginal_shape[1] -crop_width + 1,[])
    #生成随机裁剪的起点
    offset_height = tf.random_uniform([], maxval=max_offset_height,dtype=tf.int32)
    offset_width = tf.random_uniform([],maxval=max_offset_width,dtype=tf.int32)
    #裁剪的偏移量
    offsets = tf.cast(tf.stack([offset_height,offset_width,0),tf.int32)
    #对图片进行裁剪
    with tf.control_dependencies([size_assertion]):
        image = tf.slice(img_data_jpg,offsets,cropped_shape)
    return tf.reshape(image,cropped_shape)

#进行数据处理
#图像缩小最小尺寸为256，最大尺寸为512
resize_side_min = RESIZE_SIDE_MIN
reside_side_max = RESIZE_SIDE_MAX
#读出的为未解码的图片
image_raw_data_jpg = tf.gfile.GFile('./timg1.jpeg,'rb').read()
#图像解码
img_data_jpg = tf.image.decode_jpeg(image_raw_data_jpg)
#随机产生最小边尺寸
resize_side = tf.random_uniform([],minval=resize_side_min,maxval=resize_side_max + 1,dtype=tf.int32)
#对图片进行等比缩放
resize_img = _aspect_preserving_resize(img_data_jpg,resize_side)
#对缩放后的图片进行随机裁剪
crop_img = random_crop(resize_img,crop_height,crop_width)
#左右反转
image_data = tf.image.random_flip_left_right(crop_img)
#调整图像亮度
image_data = tf.image.random_brightness(image_data,0.5)

论文笔记

VGG

相关

《Video Abnormal Event Detection by Learning to Complete Visual Cloze Tests》论

论文笔记《Beyond Self-attention: External Attention using Two Linear Layers for

论文笔记《Distill on the Go: Online knowledge distillation in self-supervised le

[论文笔记] Methodologies for Data Quality Assessment and Improvement (ACM Comput

Disentangling User Interest and Conformity for Recommendation with Causal Embedd

论文笔记：Learning wrapped guidance for blind face restoration

论文笔记

【论文笔记】R-CNN系列之论文理解

[Attention Is All You Need]论文笔记

论文笔记：Causal Attention for Vision-Language Tasks

Deep Learning论文笔记之（二）Sparse Filtering稀疏滤波

论文笔记：（ICCV2019）KPConv: Flexible and Deformable Convolution for Point Clouds

标签