今天在PC上简单跑了一个mxnet入门例子，记录下来。

mxnet简介

mxnet是一款深度学习框架，在几款常用的深度学习框架中比较靠前。它的优点是：

消耗的内存少
这一点，可以让mxnet运行在内存较小的设备上面，比如手机或者IOT设备。
易扩展
可以方便扩展到多个gpu和多个计算机。
灵活性
提供命令式和符号式两种类型的编程API。命令式API方便开发者学习，容易调试和修改超参数等。

通用的深度学习训练过程

入门demo

下面的例子，是使用mxnet训练一个两层的深度学习网络。网络结构如下。

遵循上面的步骤，我们来一步步训练一个模型，并使用模型进行预测。

步骤1：收集数据

下面的代码从某个函数中生成数据。

import pickle
import numpy as np
# define boundary
def cos_curve(x):
    return 0.25*np.sin(2*x*np.pi+0.5*np.pi) + 0.5
# samples store point, labels store label
np.random.seed(1234)
samples = []
labels = []
# sample number
sample_density = 50
for i in range(sample_density):
    x1, x2 = np.random.random(2)
    # computer boundary
    bound = cos_curve(x1)
    # ignore the sample near the boundary
    if bound - 0.1 < x2 <= bound + 0.1:
        continue
    else:
        samples.append((x1, x2))
        # set the label
        if x2 > bound:
            labels.append(1)
        else:
            labels.append(0)

# store the samples and labels
with open('data.pkl', 'wb') as f:
    pickle.dump((samples, labels), f)
# visualize
import matplotlib.pyplot as plt
for i, sample in enumerate(samples):
    plt.plot(sample[0], sample[1], 'o' if labels[i] else '^', 
            mec='r' if labels[i] else 'b',
            mfc='none',
            markerSize=10)
x1 = np.linspace(0, 1)
plt.plot(x1, cos_curve(x1), 'k--')
plt.show()

步骤2：建立模型

import numpy as np
import mxnet as mx
# define data input
data = mx.sym.Variable('data')
# data flow through a fully connected layer with two outputs
fc1 = mx.sym.FullyConnected(data=data, name='fc1', num_hidden=2)
# a sigmoid activation layer
sigmoid1 = mx.sym.Activation(data=fc1, name='sigmoid1', act_type='sigmoid')
# another fully connected layer with two outputs
fc2 = mx.sym.FullyConnected(data=sigmoid1, name='fc2', num_hidden=2)
# output by softmax
mlp = mx.sym.SoftmaxOutput(data=fc2, name='softmax')
# visualize
shape = {'data': (2,)}
mlp_dot = mx.viz.plot_network(symbol=mlp, shape=shape)
mlp_dot.render('simple_mlp.gv', view=True)

步骤3：训练模型参数

# train model
import pickle
import logging
# read data

with open('data.pkl', 'rb') as f:
    samples, labels = pickle.load(f)
# logging level
logging.getLogger().setLevel(logging.WARN)
# batch grandient descend
batch_size = len(labels)
samples = np.array(samples)
labels = np.array(labels)
# configure model param
train_iter = mx.io.NDArrayIter(samples, labels, batch_size)
model = mx.model.FeedForward(
    symbol=mlp,
    num_epoch=1000,
    learning_rate=0.1,
    momentum=0.99
)
# fit model
model.fit(X=train_iter)
# predict
print(model.predict(mx.nd.array([[0.5, 0.5]])))

步骤4：预测以及可视化训练结果

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# axes grid length
X = np.arange(0, 1.05, 0.05)
Y = np.arange(0, 1.05, 0.05)
X, Y = np.meshgrid(X, Y)
grids = mx.nd.array([[X[i][j], Y[i][j]] for i in range(X.shape[0]) for j in range(X.shape[1])])
# get result of predicted
grid_probs = model.predict(grids)[:, 1].reshape(X.shape)
# drawing
fig = plt.figure('sample surface')
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, grid_probs, alpha=0.15, color='k',rstride=2, cstride=2, lw=0.5)
# select samples
samples0 = samples[labels==0]
samples0_probs = model.predict(samples0)[:, 1]
samples1 = samples[labels==1]
samples1_probs = model.predict(samples1)[:, 1]
# plot sample figure
ax.scatter(samples0[:, 0], samples0[:, 1], samples0_probs, c='r', marker='o', s=50)
ax.scatter(samples1[:, 0], samples1[:, 1], samples1_probs, c='b', marker='^', s=50)
plt.show()

注意，上面的代码中，仍然用模型来预测训练样本的结果，这样的做法其实是不规范的。只不过，我们为了方便，所以这么做了。

参考

《深度学习与计算机视觉：算法原理、框架应用与代码实现》

mxnet入门demo