Caffe on GPU Cluster

發表於 2015-05-23

FOR ALL SOFTWARE PACKAGE,
YOU CAN GOOGLE IT AND DOWNLOAD FROM CORRESPONDING WEBSITE.

Clone the caffe

git clone https://github.com/BVLC/caffe.git

Before start this tutorial, you should first install python on cpu clusters

Install python

cd python2.7.9
./configure --prefix=/str/users/tangxu/local/ --enable-shared
make
make install

Attention:

All the path /home/YOURNAME/local should be changed to /str/users/tangxu/local/ if you want to install it on the cpu clusters.
All the path /str/users/tangxu/local/ should be changed to /home/YOURNAME/local if you want to install it on the gpu server.

.bashrc file setting

vim .bashrc

export PATH=/str/users/tangxu/local/bin:$PATH
export LD_LIBRARY_PATH=/str/users/tangxu/local/lib:$LD_LIBRARY_PATH

export PKG_CONFIG_PATH=/str/users/tangxu/local/lib/pkgconfig:$PKG_CONFIG_PATH
export PATH=/usr/local/cuda-7.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH
PATH=/str/users/tangxu/local/include:/usr/include:/usr/local/include:$C_INCLUDE_PATH
PATH=/str/users/tangxu/local/include:/usr/include:/usr/local/include:$CPLUS_INCLUDE_PATH
export LD_LIBRARY_PATH=/opt/intel/lib/intel64:/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH
export PYTHONPATH=/str/users/tangxu/local/lib/python2.7/site-packages/

Install openblas

download openblas 
make FC=gfortran NO_AFFINITY=1 USE_OPENMP=1 USE_LAPACK=1
make PREFIX=/str/users/tangxu/local/ install

Install all depend

Install cmake

download cmake
./bootstrap --prefix=/str/users/tangxu/local
make
make install

Install Protobuf

download protobuf 
tar zxvf protobuf.tar.gz
cd protobuf
./configure --prefix=/str/users/tangxu/local
make
make install

Install snappy

download snappy
tar zxvf snappy.tar.gz
cd snappyma
./configure --prefix=/str/users/tangxu/local
make
make install

Install leveldb

download leveldb
tar zxvf leveldb.tar.gz
cd leveldb
make
cp -av libleveldb.* /str/users/tangxu/local/lib/
cp -av include/leveldb /str/users/tangxu/local/include/

Install OpenCV

download opencv
tar zxvf opencv.tar.gz
cd opencv
mkdir release && cd release
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/str/users/tangxu/local -D CUDA_GENERATION=Kepler ..
make
make install

Install Boost

download boost
./bootstrap.sh --prefix=/str/users/tangxu/local
./b2 -j 32
./b2 install

Install google-glog [!!!NOT EASY, CONFLICT WITH THE GFLAGS INSTALLED IN /USR/LOCAL BY ROOT]

download glog
tar zxvf glog.tar.gz
cd glog
./configure --prefix=/str/users/tangxu/local
make -j
make install

Install gflags

download gflags
cd gflags
mkdir build && cd build
CXXFLAGS="-fPIC" cmake -D CMAKE_INSTALL_PREFIX=/str/users/tangxu/local ..
make -j
make install

Install lmdb

download lmdb
cd mdb/libraries/liblmdb
make
make prefix=/str/users/tangxu/local install
# man was not found, it does not matter [you should add the file local/man/man1]

Install hdf5

 download hdf5
tar zxvf hdf5.tar.gz
cd hdf5
./configure --prefix=/str/users/tangxu/local
make
make check                # run test suite.
make install
make check-install        # verify installation.

Install cuDNN

download cuDNN
*copy the cudnn lib and head to ~/local/lib and ~/local/include
*or you can download the cudnn library from website and copy
cp /path/to/cudnn/*  /str/users/tangxu/local/lib
cp /path/to/cudnn/*  /str/users/tangxu/local/include

Add the path int ~/.bashrc

vim ~/.bashrc
============[.bashrc]
// for gpu
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
// for OpenBLAS
export LD_LIBRARY_PATH=/data1/NLPRMNT/xxxxxxxxxx/local/OpenBLAS/lib:$LD_LIBRARY_PATH
#for ~/local 
export LD_LIBRARY_PATH=/data1/NLPRMNT/xxxxxxxxxx/local/lib:$LD_LIBRARY_PATH
export PATH=/data1/NLPRMNT/xxxxxxxxxx/local/bin:$PATH
// for openMP
export OMP_NUM_THREADS=20
============
source ~/.bashrc

Compile caffe

*Edit the Makefile.config
cp Makefile.config.example Makefile.config
vim Makefile.config
============[Makefile.config]
USE_CUDNN := 1
CUDA_DIR := /usr/local/cuda
BLAS := open
BLAS_INCLUDE := /data1/NLPRMNT/xxxxxxxxxx/local/OpenBLAS/include
BLAS_LIB := /data1/NLPRMNT/xxxxxxxxxx/local/OpenBLAS/lib
INCLUDE_DIRS := /data1/NLPRMNT/xxxxxxxxxx/local/include
LIBRARY_DIRS := /data1/NLPRMNT/xxxxxxxxxx/local/lib

#commit the python/matlab part
============

*make, the first two lines were run on login node
make all -j8
make test -j8
make runtest -j8

Deep Learning for Object Detection and Segmentation

By Xu Tang

發表於 2015-05-19

slides下载链接地址

这个slides是我和陈思秦在上海科技大学paper reading group组会上面做的presentation。slides中包括了object detection和segmentation的introduction，datasets，related work，evaluation metric以及novel models。其中models主要讲解经典的deep learning做object detection模型《Rich feature hierarchies for accurate object detection and semantic segmentation》和deep learning做segmentation模型Feedforward semantic segmentation with zoom-out features。

选取几页ppt截图如下：

常用数据集：

常用评判标准：

模型示例图：

hinton2012的经典CNN模型图示：

RCNN模型图示：

segmentation模型图示：

RussianCube

By Xu Tang

發表於 2015-05-11

俄罗斯方块openGL程序：

使用本程序需要安装openGL库文件，具体安装教程请看链接。

俄罗斯方块程序声明：

由于本程序选装用的是矩阵相乘进行旋转，因此个别地方会有bug。如果将程序改为直接列举出19种旋转的情况，这些bug会消除。另外，我在程序里面写了很多输出检测的调试代码，因此运行的时候，会有点卡。如果删除所有的命令行输出语句，程序几乎没有bug。

后期改进也没有什么问题，只不过由于课程比较紧，有很多论文需要看，所以不打算改进了。

俄罗斯方块程序链接地址

VALSE2015

By Xu Tang

發表於 2015-05-10

5月8日至10日在成都参加了VALSE2015，VALSE是国内计算机视觉青年学者的盛会。大会程序安排 / 会议手册详细版

5月8日活动：

8日活动，王晓刚和王乃岩给了tutorial，是关于deep learning的经验指导，以及他们最近的工作。王晓刚主要是用deep learning做人脸，众所周知，CUHK在这方面做得很出色，特别是ILSVRC竞赛中获得了很好的成绩。关于王晓刚的slides，我看到了某位同学整理的PDF版本, 我自己也整理了一份slides的拍照版公布在百度网盘链接。王晓刚本人主页上也有公布slides内容。下面我简单的整理一下王晓刚给的talk的内容，王晓刚talk中谈到以下一些内容：
1)Overfitting:训练集很好，测试集很差的情况。
2）针对人脸识别中人脸数据集太规矩的情况，介绍了LFW数据集，并且讲了他们小组在这个数据集上面识别率的提高。
3）为什么DL取得成功。因为imagenet数据集的公布以及hinton2012结构的提出，还有evaluation task的提出。大数据才适合做deep learning，小数据及不适合做deep learning。特别是人与非人的问题，用深度学习去做会很难，因为神经网络面对非人的场景会confused。
4）总结了经典的深度学习模型：CNN，auto-encoder，deep belief net
5）为什么深度学习会work。首先是深度学习能够学习到比较好的feature，而且这些feature在CNN中是能够通过pool后，从小的pixel获得局部区域的信息的。而且深度比宽度更重要。
6）joint learning和separate learning的对比，同时简述了end-to-end的大趋势。同时也讲了他的work里面是如何用到joint learning的。
7）domain knowledge在深度神经网络里的利用。
8）花了很大功夫介绍DeepID到底学到了什么样的特征。
9）深度学习就是大数据的机器学习，特征学习，end-to-end以及上下文的学习。深度学习的表示是sparse，selective，robust的。

王乃岩博士的工作主要是tracking，他的slides的前半部分对我很有启发作用，他对神经网络的multi-level进行了分析。对于我这种深度学习的新手，很有效果。提出新模型的时候，可以借鉴王乃岩的工作。他的slides在此链接可以找到。
王乃岩talk中的内容大概可以分为以下几点：
1）pixel labeling应用的几个场景以及问题。比如image segmentation、boundary detection、saliency detection、3D scene understanding。
2）王博士提到的最重要的DL里面的两个趋势是，end-to-end learning以及multi-level fusion的问题。对于我这种新手来说，这方面的指导是很重要的。关于end-to-end也就是把处理一类问题的几个步骤结合起来进行参数的优化，（比如说对于object detection问题，可以将提取特征、分类器等操作的参数一起优化。）具体可以参考一下图示：

而对于multi-level fusion，我可以用下面几张图展示一下：

3)接下来，王乃岩博士讲了他的work：object detection，Image Caption Generation，Surface Normal Estimation，Visual Tracking以及用到的模型细节。

5月9日活动：

9日活动，周志华和王立威的工作都比较偏机器学习的理论，对于我这个深度学习方向的，我有很多东西都不是很懂。因此不列举出来了，不过周志华老师的ppt一般会在会议后给出链接，具体可以参考周志华老师微博。
王立威老师的talk内容大致为：证明margin这一机器学习经典结论并不完全正确，SVM性能并非仅由margin决定而与特征空间维数无关。具体的，我将证明一个基于与特征空间维数相关的margin上界。该上界一致紧于经典的维数无关margin上界；当特征空间维数是无穷大时，新上界等价于传统维数无关margin上界。这一margin理论表明，核方法为了提高margin而增加特征空间维数时，一定程度上付出了性能的代价。实验结果显示该理论对于SVM核函数的选择具有指导意义。
王瑞平的工作是我最近打算研究的视频里面的人脸识别，王瑞平老师也是山世光教授课题组的。他的talk内容是《Learning on Riemannian Manifold for Video-Based Face Recognition》。王瑞平老师talk的slides可以参考这里。

比较有意思的是下午微软MSRA孙剑博士和百度IDL美国黄畅博士的talk。他们分别讲述了两个工业界巨头近期关于深度学习的工作。
MSRA孙剑老师的VALSE2015 slides首先讲述了deep learning的initialization algorithm，network designs，parametric neurons。其中初始化算法主要是讲设计一个好的算法来得到神经网络的初始化参数，网络设计主要是关于模型结构的讲解，最后一个孙剑博士主要讲了一下PReLu的performance的变化。然后讲了MSRA最近关于deep的很多工作，包括how-old.net、object detection中的SPP-net。slides网盘地址。
百度IDL美国黄畅博士得slides讲述了IDL美国这两年的工作以及对deep learning未来发展趋势的预测、经验总结。百度的工作主要是OCR的end-to-end，人脸识别，face detection。slides网盘地址
黄畅博士总结的深度学习的经验是：
1）数据扩充用来引入输入图片的低维度知识。
2）结构化loss利用系统输出的高维度规则。
3）稀疏参数和特征，变化size的卷积，多任务的joint学习，低秩的规则化都是有帮助的。
黄畅博士谈到的深度学习的未来是：
1）大规模的weak、部分标注的数据。
2）针对独立的任务设计整体的框架。
3）early vision + high-level vision。
4）硬件和传感器。
5）sequential vs. concurrent。

VALSE2015成都墙报poster环节，我拍了关于object detection，segmentation，image classification，distance metric learning，face recognition，tracking有关的所有poster。里面的poster都是ICCV/CVPR/TIP等顶级会议、期刊的论文。上传网盘地址。

VALSE2016在武汉。bid结果是VALSE2017厦门，VALSE2018大连。

5月10日活动：

VALSE2015之Ladies in VALSE

10日的活动比较无聊，所以不具体说了，放图：

最后放上颜水城老师和我的合照！！！

再放上马毅老师的新书《Generalized Principal Component Analysis》镇楼！！！欢迎大家购买马老师新书哦！！！

《Zero-Shot Learning Through Cross-Modal Transfer》

By Xu Tang

發表於 2015-05-01

《Zero-Shot Learning Through Cross-Modal Transfer》论文阅读理解

最近想了解下zero-shot learning的资料，于是看了篇andrew ng的论文，论文地址《Zero-Shot Learning Through Cross-Modal Transfer》。由于我之前一直是研究的图像有关领域，所以对于我这种刚了解文本的菜鸟，要想更好的了解这篇文章，最好是提前了解什么是【词向量】,【如何根据local，global的context提取词向量Improving Word Representations via Global Context and Multiple Word Prototypes】，【如何提取图片的特征】。基本上有了这些知识就不难看懂这篇论文。

摘要：

本文旨在引入一个模型来识别图像，即便这个图像中有个别类的物体不在训练样本中。唯一关于未知类物体的先验知识来自无监督的文本语料库。我们的模型既能够在已有的训练样本类的测试中获得state-of-art的识别率，又能够在未知类作为测试样本时候获得不错的效果。首先，我们在语义空间使用异常值检测（将测试样本通过投影矩阵theta投影到一个space中），然后分别使用两个独立的识别模型。如果检测出来是已知类，则使用softmax分类器；如果检测出来是未知类，则使用等距高斯分布进行分类。

Introduction：

Zero-shot model能够预测已知和未知类的label。例如，从没看见过一张猫的图片，却可以决定这张test图片的label到底是一只猫，还是一个已知的训练样本中的类，比如狗或者马。这种模型基于两个主要的想法：
1、图像通过神经网络模型学到的参数，被map到words的语义空间。
2、模型合并异常值检测概率，用于决定一个新的图片是否在已知类的流形中。如果图片是已知类，则可以使用标准的分类器。否则，图片被分配到基于似然性的未知类中。

Word and Image Representations(单词和图像的表示)：

单词被表示成分布特征的向量，我们使用Huang[15]的无监督模型来得到50维度的预训练的词向量，作为初始化的word vectors。
具体方法可以参见论文《Improving Word Representations via Global Context and Multiple Word Prototypes》。
想法很简单，就是结合local 和global context来学习一个更好的词向量（这种词向量很好的针对一词多义、同音异义的情况训练一个单词的不同的向量）。优化函数就是要最小化：

整个操作如下图所示：

我们使用Coates[6]的方法来提取原始图片中的F维度的图像特征。

Projecting Images into Semantic Word Spaces(把图片映射到语义词向量空间)：

我们需要把图片映射到50-维度的词向量空间。我们的训练和测试实际上是把Cifar10数据库里面的一大部分类拿出来当做available training data，这一部分也叫做seen classes Y_s。极少部分类当做zero-shot classes（也就是训练样本中不出现的类），这部分叫做unseen classes Y_u。
本章主要是讲已知类的图片映射矩阵theta的训练函数：

至于图2，作者采用了t-SNE[33]的方法来将50-维度的词向量空间映射到2维空间进行可视化。我们可以明显看出，已知类几乎都是聚类在一团的，而未知类是零散分布的。我们可以根据这个来找到哪些是猫，哪些是卡车。

Zero-Shot Learning Model：

这部分主要是讲如何去做zero-shot classes类的分类器。
首先，我们需要预测p(y|x),y可以分成两部分。一部分是已知类，一部分是未知类。

V∈s为已知类的先验概率模型，V∈u为未知类的先验概率模型。如果是已知类，则分类器选用softmax回归。如果是未知类，则使用等距高斯分布进行分类。
注意公式里面出现的theta*x表示将测试样本映射到词向量的空间，然后可以得到判断为未知类和已知类的概率，哪个概率高则属于哪一类。如果属于未知类，则将未知类的向量空间与其临近的向量空间进行对比，得到属于cat还是truck。

Schedule

By Xu Tang

發表於 2015-04-15

Paper Reading Lists

Simultaneous Detection and Segmentation

Efﬁcient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Code Reading

efficient graph based segmentation

Paper Reading Lists About Object Detection

By Xu Tang

發表於 2015-04-10

Aprils:

segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection

Visual Saliency Based on Multiscale Deep Features

Feedforward semantic segmentation with zoom-out features

self-taught object localization with deep networks

object detectors emerge in deep scene CNNs

Hello World

By Xu Tang

發表於 2015-04-07

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

1	$ hexo new "My New Post"

More info: Writing

Run server

1	$ hexo server

More info: Server

Generate static files

1	$ hexo generate

More info: Generating

Deploy to remote sites

1	$ hexo deploy

More info: Deployment