您现在的位置是:网站首页> 编程资料编程资料
python目标检测yolo1 yolo2 yolo3和SSD网络结构对比_python_
2023-05-26
458人已围观
简介 python目标检测yolo1 yolo2 yolo3和SSD网络结构对比_python_
……最近在学习yolo1、yolo2和yolo3,事实上它们和SSD网络有一定的相似性,我准备汇总一下,看看有什么差别
各个网络的结构图与其实现代码
1、yolo1

由图可见,其进行了二十多次卷积还有四次最大池化,其中3x3卷积用于提取特征,1x1卷积用于压缩特征,最后将图像压缩到7x7xfilter的大小,相当于将整个图像划分为7x7的网格,每个网格负责自己这一块区域的目标检测。
整个网络最后利用全连接层使其结果的size为(7x7x30),其中7x7代表的是7x7的网格,30前20个代表的是预测的种类,后10代表两个预测框及其置信度(5x2)。
网络部分代码如下:
# relu的改进版 def leak_relu(self,x, alpha=0.1): return tf.maximum(alpha * x, x) # 建立网络部分 def _build_net(self): x = tf.placeholder(tf.float32, [None, 448, 448, 3]) with tf.variable_scope('yolo'): # _conv_layer(self, x, num_filters, filter_size, stride,scope) with tf.variable_scope('conv_2'): # (448,448,3)->(224,224,64) net = self._conv_layer(x, 64, 7, 2,'conv_2') # (224,224,64)->(112,112,64) net = self._maxpool_layer(net, 2, 2) with tf.variable_scope('conv_4'): # (112,112,64)->(112,112,192) net = self._conv_layer(net, 192, 3, 1,'conv_4') # (112,112,192)->(56,56,192) net = self._maxpool_layer(net, 2, 2) with tf.variable_scope('conv_6'): # (56,56,128) net = self._conv_layer(net, 128, 1, 1,'conv_6') with tf.variable_scope('conv_7'): # (56,56,256) net = self._conv_layer(net, 256, 3, 1,'conv_7') with tf.variable_scope('conv_8'): # (56,56,256) net = self._conv_layer(net, 256, 1, 1,'conv_8') with tf.variable_scope('conv_9'): # (56,56,512) net = self._conv_layer(net, 512, 3, 1,'conv_9') # (28,28,512) net = self._maxpool_layer(net, 2, 2) with tf.variable_scope('conv_11'): net = self._conv_layer(net, 256, 1, 1,'conv_11') with tf.variable_scope('conv_12'): net = self._conv_layer(net, 512, 3, 1,'conv_12') with tf.variable_scope('conv_13'): net = self._conv_layer(net, 256, 1, 1,'conv_13') with tf.variable_scope('conv_14'): net = self._conv_layer(net, 512, 3, 1,'conv_14') with tf.variable_scope('conv_15'): net = self._conv_layer(net, 256, 1, 1,'conv_15') with tf.variable_scope('conv_16'): net = self._conv_layer(net, 512, 3, 1,'conv_16') with tf.variable_scope('conv_17'): net = self._conv_layer(net, 256, 1, 1,'conv_17') with tf.variable_scope('conv_18'): net = self._conv_layer(net, 512, 3, 1,'conv_18') with tf.variable_scope('conv_19'): net = self._conv_layer(net, 512, 1, 1,'conv_19') with tf.variable_scope('conv_20'): net = self._conv_layer(net, 1024, 3, 1,'conv_20') # (14,14,512) net = self._maxpool_layer(net, 2, 2) with tf.variable_scope('conv_22'): net = self._conv_layer(net, 512, 1, 1,'conv_22') with tf.variable_scope('conv_23'): net = self._conv_layer(net, 1024, 3, 1,'conv_23') with tf.variable_scope('conv_24'): net = self._conv_layer(net, 512, 1, 1,'conv_24') with tf.variable_scope('conv_25'): net = self._conv_layer(net, 1024, 3, 1,'conv_25') with tf.variable_scope('conv_26'): net = self._conv_layer(net, 1024, 3, 1,'conv_26') with tf.variable_scope('conv_28'): # (7,7,1024) net = self._conv_layer(net, 1024, 3, 2,'conv_28') with tf.variable_scope('conv_29'): net = self._conv_layer(net, 1024, 3, 1,'conv_29') with tf.variable_scope('conv_30'): net = self._conv_layer(net, 1024, 3, 1,'conv_30') net = self._flatten(net) # (7x7x512,512) with tf.variable_scope('fc_33'): net = self._fc_layer(net, 512, activation=self.leak_relu,scope='fc_33') with tf.variable_scope('fc_34'): net = self._fc_layer(net, 4096, activation=self.leak_relu,scope='fc_34') with tf.variable_scope('fc_36'): net = self._fc_layer(net, 7*7*30,scope='fc_36') # 其返回了placeholder_x和(7,7,30)net return net,x # 生成卷积层 def _conv_layer(self, x, num_filters, filter_size, stride,scope): # 生成卷积层的weights in_channels = x.get_shape().as_list()[-1] weight = tf.Variable(tf.truncated_normal([filter_size, filter_size, in_channels, num_filters], stddev=0.1),name='weights') # 生成卷积层的bias bias = tf.Variable(tf.zeros([num_filters,]),name='biases') # 计算要padding的量, pad_size = filter_size // 2 pad_mat = np.array([[0, 0], [pad_size, pad_size], [pad_size, pad_size], [0, 0]]) x_pad = tf.pad(x, pad_mat) # 卷积 conv = tf.nn.conv2d(x_pad, weight, strides=[1, stride, stride, 1], padding="VALID",name=scope) # 经过优化后的relu output = self.leak_relu(tf.nn.bias_add(conv, bias)) return output def _fc_layer(self, x, num_out, activation=None,scope=None): # 全连接层 num_in = x.get_shape().as_list()[-1] weight = tf.Variable(tf.truncated_normal([num_in, num_out], stddev=0.1),name='weights') bias = tf.Variable(tf.zeros([num_out,]),name='biases') output = tf.nn.xw_plus_b(x, weight, bias,name=scope) if activation: output = activation(output) return output def _maxpool_layer(self, x, pool_size, stride): # 最大池化 output = tf.nn.max_pool(x, [1, pool_size, pool_size, 1], strides=[1, stride, stride, 1], padding="SAME") return output def _flatten(self, x): """flatten the x""" tran_x = tf.transpose(x, [0, 3, 1, 2]) # channle first mode nums = np.product(x.get_shape().as_list()[1:]) return tf.reshape(tran_x, [-1, nums]) 预测结果如下:

可见预测结果较差。
2、yolo2

YOLOv2使用了一个新的分类网络作为特征提取部分,网络使用了较多的3 x 3卷积核,在每一次池化操作后把通道数翻倍。借鉴了network in network的思想,把1 x 1的卷积核置于3 x 3的卷积核之间,用来压缩特征。使用batch normalization稳定模型训练,加速收敛,正则化模型。
与此同时,其保留了一个shortcut用于存储之前的特征。
除去网络结构的优化外,yolo2相比于yolo1加入了先验框部分,我们可以看到最后输出的conv_dec的shape为(13,13,425),其中13x13是把整个图分为13x13的网格用于预测,425可以分解为(85x5),在85中,由于yolo2常用的是coco数据集,其中具有80个类,剩余的5指的是x、y、w、h和其置信度。x5的5中,意味着预测结果包含5个框,分别对应5个先验框。
解码部分代码如下:
def decode(self,net): self.anchor_size = tf.constant(self.anchor_size,tf.float32) # net的shape为[batch,169,5,85] net = tf.reshape(net, [-1, 13 * 13, self.num_anchors, self.num_class + 5]) # 85 里面 0、1为xy的偏移量,2、3是wh的偏移量,4是置信度,5->84是每个种类的概率 # 偏移量、置信度、类别 # 中心坐标相对于该cell坐上角的偏移量,sigmoid函数归一化到(0,1) # [batch,169,5,2] xy_offset = tf.nn.sigmoid(net[:, :, :, 0:2]) wh_offset = tf.exp(net[:, :, :, 2:4]) obj_probs = tf.nn.sigmoid(net[:, :, :, 4]) class_probs = tf.nn.softmax(net[:, :, :, 5:]) # 在feature map对应坐标生成anchors,13,13 height_index = tf.range(self.feature_map_size[0], dtype=tf.float32) width_index = tf.range(self.feature_map_size[1], dtype=tf.float32) x_cell, y_cell = tf.meshgrid(height_index, width_index) x_cell = tf.reshape(x_cell, [1, -1, 1]) # 和上面[H*W,num_anchors,num_class+5]对应 y_cell = tf.reshape(y_cell, [1, -1, 1]) # x_cell和y_cell是网格分割中心 # xy_offset是相对中心的偏移情况 bbox_x = (x_cell + xy_offset[:, :, :, 0]) / 13 bbox_y = (y_cell + xy_offset[:, :, :, 1]) / 13 bbox_w = (self.anchor_size[:, 0] * wh_offset[:, :, :, 0]) / 13 bbox_h = (self.anchor_size[:, 1] * wh_offset[:, :, :, 1]) / 13 bboxes = tf.stack([bbox_x - bbox_w / 2, bbox_y - bbox_h / 2, bbox_x + bbox_w / 2, bbox_y + bbox_h / 2], axis=3) return bboxes, obj_probs, class_probs
网络部分代码如下:
def conv2d(self,x,filters_num,filters_size,pad_size=0,stride=1,batch_normalize=True,activation=leaky_relu,use_bias=False,name='conv2d'): # 是否进行pad if pad_size > 0: x = tf.pad(x,[[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]]) # pad后进行卷积 out = tf.layers.conv2d(x,filters=filters_num,kernel_size=filters_size,strides=stride,padding='VALID',activation=None,use_bias=use_bias,name=name) # BN应该在卷积层conv和激活函数activation之间, # (后面有BN层的conv就不用偏置bias,并激活函数activation在后) # 如果需要标准化则进行标准化 if batch_normalize: out = tf.layers.batch_normalization(out,axis=-1,momentum=0.9,training=False,name=name+'_bn') if activation: out = activation(out) return out def maxpool(self,x, size=2, stride=2, name='maxpool'): return tf.layers.max_pooling2d(x, pool_size=size, strides=stride,name=name) def passthrough(self,x, stride): # 变小变长 return tf.space_to_depth(x, block_size=stride) def darknet(self): x = tf.placeholder(dtype=tf.float32,shape=[None,416,416,3]) # 416,416,3 -> 416,416,32 net = self.conv2d(x, filters_num=32, filters_size=3, pad_size=1, name='conv1') # 416,416,32 -> 208,208,32 net = self.maxpool(net, size=2, stride=2, name='pool1') # 208,208,32 -> 208,208,64 net = self.conv2d(net, 64, 3, 1, name='conv2') # 208,208,64 -> 104,104,64 net = self.maxpool(net, 2, 2, name='pool2') # 104,104,64 -> 104,104,128 net = self.conv2d(net, 128, 3, 1, name='conv3_1') net = self.conv2d(net, 64, 1, 0, name='conv3_2') net = self.conv2d(net, 128, 3, 1, name='conv3_3') # 104,104,128 -> 52,52,128 net = self.maxpool(net, 2, 2, name='pool3') net = self.conv2d(net, 256, 3, 1, name='conv4_1') net = self.conv2d(net, 128, 1, 0, name='conv4_2') net = self.conv2d(net, 256, 3, 1, name='conv4_3') # 52,52,128 -> 26,26,256 net = self.maxpool(net, 2, 2, name='pool4') # 26,26,256-> 26,26,512 net = self.conv2d(net, 512, 3, 1, name='conv5_1') net = self.conv2d(net, 256, 1, 0, name='conv5_2') net = self.conv2d(net, 512, 3, 1, name='conv5_3') net = self.conv2d(net, 256, 1, 0, name='conv5_4') net = self.conv2d(net, 512, 3, 1, name='conv5_5') # 这一层特征图,要进行后面passthrough,保留一层特征层 shortcut = net # 26,26,512-> 13,13,512 net = self.maxpool(net, 2, 2, name='pool5') # # 13,13,512-> 13,13,1024 net = self.conv2d(net, 1024, 3, 1, name='conv6_1') net = self.conv2d(net, 512, 1, 0, name='conv6_2') net = self.conv2d(net, 1024, 3, 1, name='conv6_3') net = self.conv2d(net, 512, 1, 0, name='conv6_4') net = self.conv2d(net, 1024, 3, 1, name='conv6_5') ''' 训练检测网络时去掉了分类网络的网络最后一个卷积层, 在后面增加了三个卷积核尺寸为3 * 3,卷积核数量为1024的卷积层, 并在这三个卷积层的最后一层后面跟一个卷积核尺寸为1 * 1的卷积层, 卷积核数量是(B * (5 + C))。 对于VOC数据集
相关内容
- python数据操作之lambda表达式详情_python_
- 自然语言处理之文本热词提取(含有《源码》和《数据》)_python_
- Python数据分析之PMI数据图形展示_python_
- Keras搭建分类网络平台VGG16 MobileNet ResNet50_python_
- Python数据分析之绘制m1-m2数据_python_
- Python数据分析之绘制ppi-cpi剪刀差图形_python_
- Python+SymPy实现秒解微积分详解_python_
- python爬虫爬取股票的北上资金持仓数据_python_
- python神经网络AlexNet分类模型训练猫狗数据集_python_
- Python图像运算之腐蚀与膨胀详解_python_
点击排行
本栏推荐
