神级深度学习框架TensorFlow理解的正确姿势

wangdawei 8年前
   <p>在2015年11月9号Google发布了人工智能系统TensorFlow并宣布开源，此举在深度学习领域影响巨大，也受到大量的深度学习开发者极大的关注。当然，对于人工智能这个领域，依然有不少质疑的声音，但不可否认的是人工智能仍然是未来发展的趋势。</p>    <p>而TensorFlow能够在登陆GitHub的当天就成为最受关注的项目，作为构建深度学习模型的最佳方式、深度学习框架的领头者，在发布当周轻松获得超过1万个星数评级，这主要是因为Google在人工智能领域的研发成绩斐然和神级的技术人才储备。当然还有一点是在围棋上第一次打败人类，然后升级版Master保持连续60盘不败的AlphaGo，其强化学习的框架也是基于TensorFlow的高级API实现的。</p>    <p>TensorFlow: 为什么是它?</p>    <p>作为Goolge二代DL框架，使用数据流图的形式进行计算的TensorFlow已经成为了机器学习、深度学习领域中最受欢迎的框架之一。自从发布以来，TensorFlow不断在完善并增加新功能，并在今年的2月26号在Mountain View举办的首届年度TensorFlow开发者峰会上正式发布了TensorFlow 1.0版本，其最大的亮点就是通过优化模型达到最快的速度，且快到令人难以置信，更让人想不到的是很多拥护者用TensorFlow 1.0的发布来定义AI的元年。</p>    <p>通过以上Google指数，深度学习占据目前流程技术的第一位</p>    <p>TensorFlow在过去获得成绩主要有以下几点：</p>    <ul>     <li>TensorFlow被应用在Google很多的应用包括：Gmail, Google Play Recommendation, Search, Translate, Map等等；</li>     <li>在医疗方面，TensorFlow被科学家用来搭建根据视网膜来预防糖尿病致盲（后面也提到Stanford的PHD使用TensorFlow来预测皮肤癌，相关工作上了Nature封面）；</li>     <li>通过在音乐、绘画这块的领域使用TensorFlow构建深度学习模型来帮助人类更好地理解艺术；</li>     <li>使用TensorFlow框架和高科技设备，构建自动化的海洋生物检测系统，用来帮助科学家了解海洋生物的情况；</li>     <li>TensorFlow在移动客户端发力，有多款在移动设备上使用TensorFlow做翻译、风格化等工作；</li>     <li>TensorFlow在移动设备CPU（高通820）上，能够达到更高的性能和更低的功耗；</li>     <li>TensorFlow ecosystem结合其他开源项目能够快速地搭建高性能的生产环境；</li>     <li>TensorBoard Embedded vector可视化工作</li>     <li>能够帮助PHD/科研工作者快速开展project研究工作。</li>    </ul>    <p>Google第一代分布式机器学习框架DistBelief不再满足Google内部的需求，Google的小伙伴们在DistBelief基础上做了重新设计，引入各种计算设备的支持包括CPU/GPU/TPU，以及能够很好地运行在移动端，如安卓设备、ios、树莓派 等等，支持多种不同的语言（因为各种high-level的api，训练仅支持Python，inference支持包括C++，Go，Java等等），另外包括像TensorBoard这类很棒的工具，能够有效地提高深度学习研究工作者的效率。</p>    <p>TensorFlow在Google内部项目应用的增长也十分迅速：在Google多个产品都有应用如：Gmail，Google Play Recommendation， Search， Translate， Map等等；有将近100多project和paper使用TensorFlow做相关工作。</p>    <p>TensorFlow在正式版发布前的过去14个月的时间内也获得了很多的成绩，包括475+非Google的Contributors，14000+次commit，超过5500标题中出现过TensorFlow的github project以及在Stack Overflow上有包括5000+个已被回答 的问题，平均每周80+的issue提交，且被一些顶尖的学术研究项目使用： – Neural Machine Translation – Neural Architecture Search – Show and Tell.</p>    <p>当然了，说到底深度学习就是用非监督式或者半监督式的特征学习，分层特征提取高校算法来替代手工获取特征。目前研究人员和从事深度学习的开发者使用深度学习框架也并非只有TensorFlow一个，同样也有很多在视觉、语言、自然语言处理和生物信息等领域较为优秀的框架，比如Torch、Caffe、Theano、Deeplearning4j等。</p>    <p>下面，编者带大家深入理解TensorFlow这个在深度学习领域的领头者的一些关键技术、算法以及思想。</p>    <p>GoogLeNet</p>    <p>GoogLeNet是ILSVRC 2014的冠军，主要是致敬经典的LeNet-5算法，主要是Google的team成员完成，paper见Going Deeper with Convolutions.相关工作主要包括LeNet-5、Gabor filters、Network-in-Network.Network-in-Network改进了传统的CNN网络，采用少量的参数就轻松地击败了AlexNet网络，使用Network-in-Network的模型最后大小约为29MNetwork-in-Network caffe model.GoogLeNet借鉴了Network-in-Network的思想，下面会详细讲述下。</p>    <p>1）Network-in-Network</p>    <p>左边是我们CNN的线性卷积层，一般来说线性卷积层用来提取线性可分的特征，但所提取的特征高度非线性时，我们需要更加多的filters来提取各种潜在的特征，这样就存在一个问题，filters太多，导致网络参数太多，网络过于复杂对于计算压力太大。</p>    <p>文章主要从两个方法来做了一些改良：1，卷积层的改进：MLPconv，在每个local部分进行比传统卷积层复杂的计算，如上图右，提高每一层卷积层对于复杂特征的识别能力，这里举个不恰当的例子，传统的CNN网络，每一层的卷积层相当于一个只会做单一任务，你必须要增加海量的filters来达到完成特定量类型的任务，而MLPconv的每层conv有更加大的能力，每一层能够做多种不同类型的任务，在选择filters时只需要很少量的部分；2，采用全局均值池化来解决传统CNN网络中最后全连接层参数过于复杂的特点，而且全连接会造成网络的泛化能力差，Alexnet中有提高使用dropout来提高网络的泛化能力。</p>    <p>最后作者设计了一个4层的Network-in-network+全局均值池化层来做imagenet的分类问题。</p>    <pre>  <code class="language-javascript">class NiN(Network):      def setup(self):          (self.feed('data')               .conv(11, 11, 96, 4, 4, padding='VALID', name='conv1')               .conv(1, 1, 96, 1, 1, name='cccp1')               .conv(1, 1, 96, 1, 1, name='cccp2')               .max_pool(3, 3, 2, 2, name='pool1')               .conv(5, 5, 256, 1, 1, name='conv2')               .conv(1, 1, 256, 1, 1, name='cccp3')               .conv(1, 1, 256, 1, 1, name='cccp4')               .max_pool(3, 3, 2, 2, padding='VALID', name='pool2')               .conv(3, 3, 384, 1, 1, name='conv3')               .conv(1, 1, 384, 1, 1, name='cccp5')               .conv(1, 1, 384, 1, 1, name='cccp6')               .max_pool(3, 3, 2, 2, padding='VALID', name='pool3')               .conv(3, 3, 1024, 1, 1, name='conv4-1024')               .conv(1, 1, 1024, 1, 1, name='cccp7-1024')               .conv(1, 1, 1000, 1, 1, name='cccp8-1024')               .avg_pool(6, 6, 1, 1, padding='VALID', name='pool4')               .softmax(name='prob'))   </code></pre>    <p>网络基本结果如上，代码见https://github.com/ethereon/caffe-tensorflow. 这里因为我最近工作变动的问题，没有了机器来跑一篇，也无法画下基本的网络结构图，之后我会补上。这里指的提出的是中间cccp1和ccp2（cross channel pooling）等价于1*1kernel大小的卷积层。caffe中NIN的实现如下：</p>    <pre>  <code class="language-javascript">name: "nin_imagenet"   layers {     top: "data"     top: "label"     name: "data"     type: DATA     data_param {       source: "/home/linmin/IMAGENET-LMDB/imagenet-train-lmdb"       backend: LMDB       batch_size: 64     }     transform_param {       crop_size: 224       mirror: true       mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean"     }     include: { phase: TRAIN }   }   layers {     top: "data"     top: "label"     name: "data"     type: DATA     data_param {       source: "/home/linmin/IMAGENET-LMDB/imagenet-val-lmdb"       backend: LMDB       batch_size: 89     }     transform_param {       crop_size: 224       mirror: false       mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean"     }     include: { phase: TEST }   }   layers {     bottom: "data"     top: "conv1"     name: "conv1"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 96       kernel_size: 11       stride: 4       weight_filler {         type: "gaussian"         mean: 0         std: 0.01       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "conv1"     top: "conv1"     name: "relu0"     type: RELU   }   layers {     bottom: "conv1"     top: "cccp1"     name: "cccp1"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 96       kernel_size: 1       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.05       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "cccp1"     top: "cccp1"     name: "relu1"     type: RELU   }   layers {     bottom: "cccp1"     top: "cccp2"     name: "cccp2"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 96       kernel_size: 1       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.05       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "cccp2"     top: "cccp2"     name: "relu2"     type: RELU   }   layers {     bottom: "cccp2"     top: "pool0"     name: "pool0"     type: POOLING     pooling_param {       pool: MAX       kernel_size: 3       stride: 2     }   }   layers {     bottom: "pool0"     top: "conv2"     name: "conv2"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 256       pad: 2       kernel_size: 5       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.05       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "conv2"     top: "conv2"     name: "relu3"     type: RELU   }   layers {     bottom: "conv2"     top: "cccp3"     name: "cccp3"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 256       kernel_size: 1       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.05       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "cccp3"     top: "cccp3"     name: "relu5"     type: RELU   }   layers {     bottom: "cccp3"     top: "cccp4"     name: "cccp4"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 256       kernel_size: 1       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.05       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "cccp4"     top: "cccp4"     name: "relu6"     type: RELU   }   layers {     bottom: "cccp4"     top: "pool2"     name: "pool2"     type: POOLING     pooling_param {       pool: MAX       kernel_size: 3       stride: 2     }   }   layers {     bottom: "pool2"     top: "conv3"     name: "conv3"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 384       pad: 1       kernel_size: 3       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.01       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "conv3"     top: "conv3"     name: "relu7"     type: RELU   }   layers {     bottom: "conv3"     top: "cccp5"     name: "cccp5"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 384       kernel_size: 1       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.05       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "cccp5"     top: "cccp5"     name: "relu8"     type: RELU   }   layers {     bottom: "cccp5"     top: "cccp6"     name: "cccp6"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 384       kernel_size: 1       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.05       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "cccp6"     top: "cccp6"     name: "relu9"     type: RELU   }   layers {     bottom: "cccp6"     top: "pool3"     name: "pool3"     type: POOLING     pooling_param {       pool: MAX       kernel_size: 3       stride: 2     }   }   layers {     bottom: "pool3"     top: "pool3"     name: "drop"     type: DROPOUT     dropout_param {       dropout_ratio: 0.5     }   }   layers {     bottom: "pool3"     top: "conv4"     name: "conv4-1024"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 1024       pad: 1       kernel_size: 3       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.05       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "conv4"     top: "conv4"     name: "relu10"     type: RELU   }   layers {     bottom: "conv4"     top: "cccp7"     name: "cccp7-1024"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 1024       kernel_size: 1       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.05       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "cccp7"     top: "cccp7"     name: "relu11"     type: RELU   }   layers {     bottom: "cccp7"     top: "cccp8"     name: "cccp8-1024"     type: CONVOLUTION     blobs_lr: 1     blobs_lr: 2     weight_decay: 1     weight_decay: 0     convolution_param {       num_output: 1000       kernel_size: 1       stride: 1       weight_filler {         type: "gaussian"         mean: 0         std: 0.01       }       bias_filler {         type: "constant"         value: 0       }     }   }   layers {     bottom: "cccp8"     top: "cccp8"     name: "relu12"     type: RELU   }   layers {     bottom: "cccp8"     top: "pool4"     name: "pool4"     type: POOLING     pooling_param {       pool: AVE       kernel_size: 6       stride: 1     }   }   layers {     name: "accuracy"     type: ACCURACY     bottom: "pool4"     bottom: "label"     top: "accuracy"     include: { phase: TEST }   }   layers {     bottom: "pool4"     bottom: "label"     name: "loss"     type: SOFTMAX_LOSS     include: { phase: TRAIN }   }   </code></pre>    <p>NIN的提出其实也可以认为我们加深了网络的深度，通过加深网络深度（增加单个NIN的特征表示能力）以及将原先全连接层变为aver_pool层，大大减少了原先需要的filters数，减少了model的参数。paper中实验证明达到Alexnet相同的性能，最终model大小仅为29M。</p>    <p>理解NIN之后，再来看GoogLeNet就不会有不明所理的感觉。</p>    <p>痛点</p>    <p>越大的CNN网络，有更大的model参数，也需要更多的计算力支持，并且由于模型过于复杂会过拟合；</p>    <p>在CNN中，网络的层数的增加会伴随着需求计算资源的增加；</p>    <p>稀疏的network是可以接受，但是稀疏的数据结构通常在计算时效率很低</p>    <p>Inception module</p>    <p style="text-align:center"><img src="https://simg.open-open.com/show/e617b2fe0d66881362138664915c37dd.png"></p>    <p>Inception module的提出主要考虑多个不同size的卷积核能够hold图像当中不同cluster的信息，为方便计算，paper中分别使用1*1，3*3，5*5，同时加入3*3 max pooling模块。 然而这里存在一个很大的计算隐患，每一层Inception module的输出的filters将是分支所有filters数量的综合，经过多层之后，最终model的数量将会变得巨大，naive的inception会对计算资源有更大的依赖。 前面我们有提到Network-in-Network模型，1*1的模型能够有效进行降维（使用更少的来表达尽可能多的信息），所以文章提出了”Inception module with dimension reduction”,在不损失模型特征表示能力的前提下，尽量减少filters的数量，达到减少model复杂度的目的：</p>    <p style="text-align:center"><img src="https://simg.open-open.com/show/36d7ddfacea744c2743522cd0361f8db.png"></p>    <p>Overall of GoogLeNet</p>    <p><img src="https://simg.open-open.com/show/1b5858b4ee0269fa8620812e8a111b7d.png"></p>    <p>在tensorflow构造GoogLeNet基本的代码：</p>    <pre>  <code class="language-javascript">from kaffe.tensorflow import Network      class GoogleNet(Network):       def setup(self):           (self.feed('data')                .conv(7, 7, 64, 2, 2, name='conv1_7x7_s2')                .max_pool(3, 3, 2, 2, name='pool1_3x3_s2')                .lrn(2, 2e-05, 0.75, name='pool1_norm1')                .conv(1, 1, 64, 1, 1, name='conv2_3x3_reduce')                .conv(3, 3, 192, 1, 1, name='conv2_3x3')                .lrn(2, 2e-05, 0.75, name='conv2_norm2')                .max_pool(3, 3, 2, 2, name='pool2_3x3_s2')                .conv(1, 1, 64, 1, 1, name='inception_3a_1x1'))              (self.feed('pool2_3x3_s2')                .conv(1, 1, 96, 1, 1, name='inception_3a_3x3_reduce')                .conv(3, 3, 128, 1, 1, name='inception_3a_3x3'))              (self.feed('pool2_3x3_s2')                .conv(1, 1, 16, 1, 1, name='inception_3a_5x5_reduce')                .conv(5, 5, 32, 1, 1, name='inception_3a_5x5'))              (self.feed('pool2_3x3_s2')                .max_pool(3, 3, 1, 1, name='inception_3a_pool')                .conv(1, 1, 32, 1, 1, name='inception_3a_pool_proj'))              (self.feed('inception_3a_1x1',                      'inception_3a_3x3',                      'inception_3a_5x5',                      'inception_3a_pool_proj')                .concat(3, name='inception_3a_output')                .conv(1, 1, 128, 1, 1, name='inception_3b_1x1'))              (self.feed('inception_3a_output')                .conv(1, 1, 128, 1, 1, name='inception_3b_3x3_reduce')                .conv(3, 3, 192, 1, 1, name='inception_3b_3x3'))              (self.feed('inception_3a_output')                .conv(1, 1, 32, 1, 1, name='inception_3b_5x5_reduce')                .conv(5, 5, 96, 1, 1, name='inception_3b_5x5'))              (self.feed('inception_3a_output')                .max_pool(3, 3, 1, 1, name='inception_3b_pool')                .conv(1, 1, 64, 1, 1, name='inception_3b_pool_proj'))              (self.feed('inception_3b_1x1',                      'inception_3b_3x3',                      'inception_3b_5x5',                      'inception_3b_pool_proj')                .concat(3, name='inception_3b_output')                .max_pool(3, 3, 2, 2, name='pool3_3x3_s2')                .conv(1, 1, 192, 1, 1, name='inception_4a_1x1'))              (self.feed('pool3_3x3_s2')                .conv(1, 1, 96, 1, 1, name='inception_4a_3x3_reduce')                .conv(3, 3, 208, 1, 1, name='inception_4a_3x3'))              (self.feed('pool3_3x3_s2')                .conv(1, 1, 16, 1, 1, name='inception_4a_5x5_reduce')                .conv(5, 5, 48, 1, 1, name='inception_4a_5x5'))              (self.feed('pool3_3x3_s2')                .max_pool(3, 3, 1, 1, name='inception_4a_pool')                .conv(1, 1, 64, 1, 1, name='inception_4a_pool_proj'))              (self.feed('inception_4a_1x1',                      'inception_4a_3x3',                      'inception_4a_5x5',                      'inception_4a_pool_proj')                .concat(3, name='inception_4a_output')                .conv(1, 1, 160, 1, 1, name='inception_4b_1x1'))              (self.feed('inception_4a_output')                .conv(1, 1, 112, 1, 1, name='inception_4b_3x3_reduce')                .conv(3, 3, 224, 1, 1, name='inception_4b_3x3'))              (self.feed('inception_4a_output')                .conv(1, 1, 24, 1, 1, name='inception_4b_5x5_reduce')                .conv(5, 5, 64, 1, 1, name='inception_4b_5x5'))              (self.feed('inception_4a_output')                .max_pool(3, 3, 1, 1, name='inception_4b_pool')                .conv(1, 1, 64, 1, 1, name='inception_4b_pool_proj'))              (self.feed('inception_4b_1x1',                      'inception_4b_3x3',                      'inception_4b_5x5',                      'inception_4b_pool_proj')                .concat(3, name='inception_4b_output')                .conv(1, 1, 128, 1, 1, name='inception_4c_1x1'))              (self.feed('inception_4b_output')                .conv(1, 1, 128, 1, 1, name='inception_4c_3x3_reduce')                .conv(3, 3, 256, 1, 1, name='inception_4c_3x3'))              (self.feed('inception_4b_output')                .conv(1, 1, 24, 1, 1, name='inception_4c_5x5_reduce')                .conv(5, 5, 64, 1, 1, name='inception_4c_5x5'))              (self.feed('inception_4b_output')                .max_pool(3, 3, 1, 1, name='inception_4c_pool')                .conv(1, 1, 64, 1, 1, name='inception_4c_pool_proj'))              (self.feed('inception_4c_1x1',                      'inception_4c_3x3',                      'inception_4c_5x5',                      'inception_4c_pool_proj')                .concat(3, name='inception_4c_output')                .conv(1, 1, 112, 1, 1, name='inception_4d_1x1'))              (self.feed('inception_4c_output')                .conv(1, 1, 144, 1, 1, name='inception_4d_3x3_reduce')                .conv(3, 3, 288, 1, 1, name='inception_4d_3x3'))              (self.feed('inception_4c_output')                .conv(1, 1, 32, 1, 1, name='inception_4d_5x5_reduce')                .conv(5, 5, 64, 1, 1, name='inception_4d_5x5'))              (self.feed('inception_4c_output')                .max_pool(3, 3, 1, 1, name='inception_4d_pool')                .conv(1, 1, 64, 1, 1, name='inception_4d_pool_proj'))              (self.feed('inception_4d_1x1',                      'inception_4d_3x3',                      'inception_4d_5x5',                      'inception_4d_pool_proj')                .concat(3, name='inception_4d_output')                .conv(1, 1, 256, 1, 1, name='inception_4e_1x1'))              (self.feed('inception_4d_output')                .conv(1, 1, 160, 1, 1, name='inception_4e_3x3_reduce')                .conv(3, 3, 320, 1, 1, name='inception_4e_3x3'))              (self.feed('inception_4d_output')                .conv(1, 1, 32, 1, 1, name='inception_4e_5x5_reduce')                .conv(5, 5, 128, 1, 1, name='inception_4e_5x5'))              (self.feed('inception_4d_output')                .max_pool(3, 3, 1, 1, name='inception_4e_pool')                .conv(1, 1, 128, 1, 1, name='inception_4e_pool_proj'))              (self.feed('inception_4e_1x1',                      'inception_4e_3x3',                      'inception_4e_5x5',                      'inception_4e_pool_proj')                .concat(3, name='inception_4e_output')                .max_pool(3, 3, 2, 2, name='pool4_3x3_s2')                .conv(1, 1, 256, 1, 1, name='inception_5a_1x1'))              (self.feed('pool4_3x3_s2')                .conv(1, 1, 160, 1, 1, name='inception_5a_3x3_reduce')                .conv(3, 3, 320, 1, 1, name='inception_5a_3x3'))              (self.feed('pool4_3x3_s2')                .conv(1, 1, 32, 1, 1, name='inception_5a_5x5_reduce')                .conv(5, 5, 128, 1, 1, name='inception_5a_5x5'))              (self.feed('pool4_3x3_s2')                .max_pool(3, 3, 1, 1, name='inception_5a_pool')                .conv(1, 1, 128, 1, 1, name='inception_5a_pool_proj'))              (self.feed('inception_5a_1x1',                      'inception_5a_3x3',                      'inception_5a_5x5',                      'inception_5a_pool_proj')                .concat(3, name='inception_5a_output')                .conv(1, 1, 384, 1, 1, name='inception_5b_1x1'))              (self.feed('inception_5a_output')                .conv(1, 1, 192, 1, 1, name='inception_5b_3x3_reduce')                .conv(3, 3, 384, 1, 1, name='inception_5b_3x3'))              (self.feed('inception_5a_output')                .conv(1, 1, 48, 1, 1, name='inception_5b_5x5_reduce')                .conv(5, 5, 128, 1, 1, name='inception_5b_5x5'))              (self.feed('inception_5a_output')                .max_pool(3, 3, 1, 1, name='inception_5b_pool')                .conv(1, 1, 128, 1, 1, name='inception_5b_pool_proj'))              (self.feed('inception_5b_1x1',                      'inception_5b_3x3',                      'inception_5b_5x5',                      'inception_5b_pool_proj')                .concat(3, name='inception_5b_output')                .avg_pool(7, 7, 1, 1, padding='VALID', name='pool5_7x7_s1')                .fc(1000, relu=False, name='loss3_classifier')                .softmax(name='prob'))   </code></pre>    <p>代码在https://github.com/ethereon/caffe-tensorflow中，作者封装了一些基本的操作，了解网络结构之后，构造GoogLeNet很容易。之后等到新公司之后，我会试着在tflearn的基础上写下GoogLeNet的网络代码。</p>    <p>GoogLeNet on Tensorflow</p>    <p>GoogLeNet为了实现方便，我用tflearn来重写了下，代码中和caffe model里面不一样的就是一些padding的位置，因为改的比较麻烦，必须保持inception部分的concat时要一致，我这里也不知道怎么修改pad的值（caffe prototxt），所以统一padding设定为same，具体代码如下：</p>    <pre>  <code class="language-javascript"># -*- coding: utf-8 -*-      """ GoogLeNet.   Applying 'GoogLeNet' to Oxford's 17 Category Flower Dataset classification task.   References:       - Szegedy, Christian, et al.       Going deeper with convolutions.       - 17 Category Flower Dataset. Maria-Elena Nilsback and Andrew Zisserman.   Links:       - [GoogLeNet Paper](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf)       - [Flower Dataset (17)](http://www.robots.ox.ac.uk/~vgg/data/flowers/17/)   """      from __future__ import division, print_function, absolute_import      import tflearn   from tflearn.layers.core import input_data, dropout, fully_connected   from tflearn.layers.conv import conv_2d, max_pool_2d, avg_pool_2d   from tflearn.layers.normalization import local_response_normalization   from tflearn.layers.merge_ops import merge   from tflearn.layers.estimator import regression      import tflearn.datasets.oxflower17 as oxflower17   X, Y = oxflower17.load_data(one_hot=True, resize_pics=(227, 227))         network = input_data(shape=[None, 227, 227, 3])   conv1_7_7 = conv_2d(network, 64, 7, strides=2, activation='relu', name = 'conv1_7_7_s2')   pool1_3_3 = max_pool_2d(conv1_7_7, 3,strides=2)   pool1_3_3 = local_response_normalization(pool1_3_3)   conv2_3_3_reduce = conv_2d(pool1_3_3, 64,1, activation='relu',name = 'conv2_3_3_reduce')   conv2_3_3 = conv_2d(conv2_3_3_reduce, 192,3, activation='relu', name='conv2_3_3')   conv2_3_3 = local_response_normalization(conv2_3_3)   pool2_3_3 = max_pool_2d(conv2_3_3, kernel_size=3, strides=2, name='pool2_3_3_s2')   inception_3a_1_1 = conv_2d(pool2_3_3, 64, 1, activation='relu', name='inception_3a_1_1')   inception_3a_3_3_reduce = conv_2d(pool2_3_3, 96,1, activation='relu', name='inception_3a_3_3_reduce')   inception_3a_3_3 = conv_2d(inception_3a_3_3_reduce, 128,filter_size=3,  activation='relu', name = 'inception_3a_3_3')   inception_3a_5_5_reduce = conv_2d(pool2_3_3,16, filter_size=1,activation='relu', name ='inception_3a_5_5_reduce' )   inception_3a_5_5 = conv_2d(inception_3a_5_5_reduce, 32, filter_size=5, activation='relu', name= 'inception_3a_5_5')   inception_3a_pool = max_pool_2d(pool2_3_3, kernel_size=3, strides=1, )   inception_3a_pool_1_1 = conv_2d(inception_3a_pool, 32, filter_size=1, activation='relu', name='inception_3a_pool_1_1')      # merge the inception_3a__   inception_3a_output = merge([inception_3a_1_1, inception_3a_3_3, inception_3a_5_5, inception_3a_pool_1_1], mode='concat', axis=3)      inception_3b_1_1 = conv_2d(inception_3a_output, 128,filter_size=1,activation='relu', name= 'inception_3b_1_1' )   inception_3b_3_3_reduce = conv_2d(inception_3a_output, 128, filter_size=1, activation='relu', name='inception_3b_3_3_reduce')   inception_3b_3_3 = conv_2d(inception_3b_3_3_reduce, 192, filter_size=3,  activation='relu',name='inception_3b_3_3')   inception_3b_5_5_reduce = conv_2d(inception_3a_output, 32, filter_size=1, activation='relu', name = 'inception_3b_5_5_reduce')   inception_3b_5_5 = conv_2d(inception_3b_5_5_reduce, 96, filter_size=5,  name = 'inception_3b_5_5')   inception_3b_pool = max_pool_2d(inception_3a_output, kernel_size=3, strides=1,  name='inception_3b_pool')   inception_3b_pool_1_1 = conv_2d(inception_3b_pool, 64, filter_size=1,activation='relu', name='inception_3b_pool_1_1')      #merge the inception_3b_*   inception_3b_output = merge([inception_3b_1_1, inception_3b_3_3, inception_3b_5_5, inception_3b_pool_1_1], mode='concat',axis=3,name='inception_3b_output')      pool3_3_3 = max_pool_2d(inception_3b_output, kernel_size=3, strides=2, name='pool3_3_3')   inception_4a_1_1 = conv_2d(pool3_3_3, 192, filter_size=1, activation='relu', name='inception_4a_1_1')   inception_4a_3_3_reduce = conv_2d(pool3_3_3, 96, filter_size=1, activation='relu', name='inception_4a_3_3_reduce')   inception_4a_3_3 = conv_2d(inception_4a_3_3_reduce, 208, filter_size=3,  activation='relu', name='inception_4a_3_3')   inception_4a_5_5_reduce = conv_2d(pool3_3_3, 16, filter_size=1, activation='relu', name='inception_4a_5_5_reduce')   inception_4a_5_5 = conv_2d(inception_4a_5_5_reduce, 48, filter_size=5,  activation='relu', name='inception_4a_5_5')   inception_4a_pool = max_pool_2d(pool3_3_3, kernel_size=3, strides=1,  name='inception_4a_pool')   inception_4a_pool_1_1 = conv_2d(inception_4a_pool, 64, filter_size=1, activation='relu', name='inception_4a_pool_1_1')      inception_4a_output = merge([inception_4a_1_1, inception_4a_3_3, inception_4a_5_5, inception_4a_pool_1_1], mode='concat', axis=3, name='inception_4a_output')         inception_4b_1_1 = conv_2d(inception_4a_output, 160, filter_size=1, activation='relu', name='inception_4a_1_1')   inception_4b_3_3_reduce = conv_2d(inception_4a_output, 112, filter_size=1, activation='relu', name='inception_4b_3_3_reduce')   inception_4b_3_3 = conv_2d(inception_4b_3_3_reduce, 224, filter_size=3, activation='relu', name='inception_4b_3_3')   inception_4b_5_5_reduce = conv_2d(inception_4a_output, 24, filter_size=1, activation='relu', name='inception_4b_5_5_reduce')   inception_4b_5_5 = conv_2d(inception_4b_5_5_reduce, 64, filter_size=5,  activation='relu', name='inception_4b_5_5')      inception_4b_pool = max_pool_2d(inception_4a_output, kernel_size=3, strides=1,  name='inception_4b_pool')   inception_4b_pool_1_1 = conv_2d(inception_4b_pool, 64, filter_size=1, activation='relu', name='inception_4b_pool_1_1')      inception_4b_output = merge([inception_4b_1_1, inception_4b_3_3, inception_4b_5_5, inception_4b_pool_1_1], mode='concat', axis=3, name='inception_4b_output')         inception_4c_1_1 = conv_2d(inception_4b_output, 128, filter_size=1, activation='relu',name='inception_4c_1_1')   inception_4c_3_3_reduce = conv_2d(inception_4b_output, 128, filter_size=1, activation='relu', name='inception_4c_3_3_reduce')   inception_4c_3_3 = conv_2d(inception_4c_3_3_reduce, 256,  filter_size=3, activation='relu', name='inception_4c_3_3')   inception_4c_5_5_reduce = conv_2d(inception_4b_output, 24, filter_size=1, activation='relu', name='inception_4c_5_5_reduce')   inception_4c_5_5 = conv_2d(inception_4c_5_5_reduce, 64,  filter_size=5, activation='relu', name='inception_4c_5_5')      inception_4c_pool = max_pool_2d(inception_4b_output, kernel_size=3, strides=1)   inception_4c_pool_1_1 = conv_2d(inception_4c_pool, 64, filter_size=1, activation='relu', name='inception_4c_pool_1_1')      inception_4c_output = merge([inception_4c_1_1, inception_4c_3_3, inception_4c_5_5, inception_4c_pool_1_1], mode='concat', axis=3,name='inception_4c_output')      inception_4d_1_1 = conv_2d(inception_4c_output, 112, filter_size=1, activation='relu', name='inception_4d_1_1')   inception_4d_3_3_reduce = conv_2d(inception_4c_output, 144, filter_size=1, activation='relu', name='inception_4d_3_3_reduce')   inception_4d_3_3 = conv_2d(inception_4d_3_3_reduce, 288, filter_size=3, activation='relu', name='inception_4d_3_3')   inception_4d_5_5_reduce = conv_2d(inception_4c_output, 32, filter_size=1, activation='relu', name='inception_4d_5_5_reduce')   inception_4d_5_5 = conv_2d(inception_4d_5_5_reduce, 64, filter_size=5,  activation='relu', name='inception_4d_5_5')   inception_4d_pool = max_pool_2d(inception_4c_output, kernel_size=3, strides=1,  name='inception_4d_pool')   inception_4d_pool_1_1 = conv_2d(inception_4d_pool, 64, filter_size=1, activation='relu', name='inception_4d_pool_1_1')      inception_4d_output = merge([inception_4d_1_1, inception_4d_3_3, inception_4d_5_5, inception_4d_pool_1_1], mode='concat', axis=3, name='inception_4d_output')      inception_4e_1_1 = conv_2d(inception_4d_output, 256, filter_size=1, activation='relu', name='inception_4e_1_1')   inception_4e_3_3_reduce = conv_2d(inception_4d_output, 160, filter_size=1, activation='relu', name='inception_4e_3_3_reduce')   inception_4e_3_3 = conv_2d(inception_4e_3_3_reduce, 320, filter_size=3, activation='relu', name='inception_4e_3_3')   inception_4e_5_5_reduce = conv_2d(inception_4d_output, 32, filter_size=1, activation='relu', name='inception_4e_5_5_reduce')   inception_4e_5_5 = conv_2d(inception_4e_5_5_reduce, 128,  filter_size=5, activation='relu', name='inception_4e_5_5')   inception_4e_pool = max_pool_2d(inception_4d_output, kernel_size=3, strides=1,  name='inception_4e_pool')   inception_4e_pool_1_1 = conv_2d(inception_4e_pool, 128, filter_size=1, activation='relu', name='inception_4e_pool_1_1')         inception_4e_output = merge([inception_4e_1_1, inception_4e_3_3, inception_4e_5_5,inception_4e_pool_1_1],axis=3, mode='concat')      pool4_3_3 = max_pool_2d(inception_4e_output, kernel_size=3, strides=2, name='pool_3_3')         inception_5a_1_1 = conv_2d(pool4_3_3, 256, filter_size=1, activation='relu', name='inception_5a_1_1')   inception_5a_3_3_reduce = conv_2d(pool4_3_3, 160, filter_size=1, activation='relu', name='inception_5a_3_3_reduce')   inception_5a_3_3 = conv_2d(inception_5a_3_3_reduce, 320, filter_size=3, activation='relu', name='inception_5a_3_3')   inception_5a_5_5_reduce = conv_2d(pool4_3_3, 32, filter_size=1, activation='relu', name='inception_5a_5_5_reduce')   inception_5a_5_5 = conv_2d(inception_5a_5_5_reduce, 128, filter_size=5,  activation='relu', name='inception_5a_5_5')   inception_5a_pool = max_pool_2d(pool4_3_3, kernel_size=3, strides=1,  name='inception_5a_pool')   inception_5a_pool_1_1 = conv_2d(inception_5a_pool, 128, filter_size=1,activation='relu', name='inception_5a_pool_1_1')      inception_5a_output = merge([inception_5a_1_1, inception_5a_3_3, inception_5a_5_5, inception_5a_pool_1_1], axis=3,mode='concat')         inception_5b_1_1 = conv_2d(inception_5a_output, 384, filter_size=1,activation='relu', name='inception_5b_1_1')   inception_5b_3_3_reduce = conv_2d(inception_5a_output, 192, filter_size=1, activation='relu', name='inception_5b_3_3_reduce')   inception_5b_3_3 = conv_2d(inception_5b_3_3_reduce, 384,  filter_size=3,activation='relu', name='inception_5b_3_3')   inception_5b_5_5_reduce = conv_2d(inception_5a_output, 48, filter_size=1, activation='relu', name='inception_5b_5_5_reduce')   inception_5b_5_5 = conv_2d(inception_5b_5_5_reduce,128, filter_size=5,  activation='relu', name='inception_5b_5_5' )   inception_5b_pool = max_pool_2d(inception_5a_output, kernel_size=3, strides=1,  name='inception_5b_pool')   inception_5b_pool_1_1 = conv_2d(inception_5b_pool, 128, filter_size=1, activation='relu', name='inception_5b_pool_1_1')   inception_5b_output = merge([inception_5b_1_1, inception_5b_3_3, inception_5b_5_5, inception_5b_pool_1_1], axis=3, mode='concat')      pool5_7_7 = avg_pool_2d(inception_5b_output, kernel_size=7, strides=1)   pool5_7_7 = dropout(pool5_7_7, 0.4)   loss = fully_connected(pool5_7_7, 17,activation='softmax')   network = regression(loss, optimizer='momentum',                        loss='categorical_crossentropy',                        learning_rate=0.001)   model = tflearn.DNN(network, checkpoint_path='model_googlenet',                       max_checkpoints=1, tensorboard_verbose=2)   model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True,             show_metric=True, batch_size=64, snapshot_step=200,             snapshot_epoch=False, run_id='googlenet_oxflowers17')   </code></pre>    <p>大家如果感兴趣，可以看看这部分的caffe model prototxt， 帮忙检查下是否有问题，代码我已经提交到tflearn的官方库了，add GoogLeNet(Inception) in Example，各位有tensorflow的直接安装下tflearn，看看是否能帮忙检查下是否有问题，我这里因为没有GPU的机器，跑的比较慢，TensorBoard的图如下，不像之前Alexnet那么明显（主要还是没有跑那么多epoch,这里在写入的时候发现主机上没有磁盘空间了，尴尬，然后从新写了restore来跑的，TensorBoard的图也貌似除了点问题， 好像每次载入都不太一样，但是从基本的log里面的东西来看，是逐步在收敛的，这里图也贴下看看吧）</p>    <p style="text-align:center"><img src="https://simg.open-open.com/show/49c47bf9fd64993598471fc912638f40.png"></p>    <p>网络结构，这里有个bug，可能是TensorBoard的，googlenet的graph可能是太大，大概是1.3m，在chrome上无法下载，试了火狐貌似可以了：</p>    <p><img src="https://simg.open-open.com/show/1d514c5f30860d472d2a74c6a1938b8d.png"></p>    <p> </p>    <p> </p>    <p>来自：http://ai.51cto.com/art/201703/536061.htm</p>    <p> </p>
神级深度学习框架TensorFlow理解的正确姿势

相关经验

目录