# 如何优雅地用TensorFlow预测时间序列：TFTS库详细教程

2017-08-20 15:40:28 发布
 您的评价: 0
1收藏
 文件夹 请选择... ------------- 新增文件夹... 新增文件夹 标签 (多个标签用逗号分隔)

## 前言

• 读入时间序列数据（分为从numpy数组和csv文件两种方式）
• 用AR模型对时间序列进行预测
• 用LSTM模型对时间序列进行预测（包含单变量和多变量）

## 从Numpy数组中读入时间序列数据

``````# coding: utf-8
from __future__ import print_function
import numpy as np
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import tensorflow as tf

x = np.array(range(1000))
noise = np.random.uniform(-0.2, 0.2, 1000)
y = np.sin(np.pi * x / 100) + x / 200. + noise
plt.plot(x, y)
plt.savefig('timeseries_y.jpg')``````

``````data = {
tf.contrib.timeseries.TrainEvalFeatures.TIMES: x,
tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,
}

``````with tf.Session() as sess:
# 要用tf.train.start_queue_runners启动队列才能正常进行读取
coord = tf.train.Coordinator()
print(sess.run(full_data))
coord.request_stop()``````

``````train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(

``````with tf.Session() as sess:
batch_data = train_input_fn.create_batch()
coord = tf.train.Coordinator()
one_batch = sess.run(batch_data[0])
coord.request_stop()

print('one_batch_data:', one_batch)``````

## 从CSV文件中读入时间序列数据

``````1,-0.6656603714
2,-0.1164380359
3,0.7398626488
4,0.7368633029
5,0.2289480898
6,2.257073255
7,3.023457405
8,2.481161007
9,3.773638612
10,5.059257738
11,3.553186083``````

CSV文件的第一列为时间点，第二列为该时间点上观察到的值。将其读入的方法为：

``````# coding: utf-8
from __future__ import print_function
import tensorflow as tf

csv_file_name = './data/period_trend.csv'

## 使用AR模型预测时间序列

``````x = np.array(range(1000))
noise = np.random.uniform(-0.2, 0.2, 1000)
y = np.sin(np.pi * x / 100) + x / 200. + noise
plt.plot(x, y)
plt.savefig('timeseries_y.jpg')

data = {
tf.contrib.timeseries.TrainEvalFeatures.TIMES: x,
tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,
}

train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(

``````ar = tf.contrib.timeseries.ARRegressor(
periodicities=200, input_window_size=30, output_window_size=10,
num_features=1,
loss=tf.contrib.timeseries.ARModel.NORMAL_LIKELIHOOD_LOSS)``````

num_features参数表示在一个时间点上观察到的数的维度。我们这里每一步都是一个单独的值，所以num_features=1。

``ar.train(input_fn=train_input_fn, steps=6000)``

TFTS中验证(evaluation)的含义是：使用训练好的模型在原先的训练集上进行计算，由此我们可以观察到模型的拟合效果，对应的程序段是：

``````evaluation_input_fn = tf.contrib.timeseries.WholeDatasetInputFn(reader)
evaluation = ar.evaluate(input_fn=evaluation_input_fn, steps=1)``````

evaluation[‘start_tuple’]会被用于之后的预测中，它相当于最后30步的输出值和对应的时间点。以此为起点，我们可以对1000步以后的值进行预测，对应的代码为：

``````(predictions,) = tuple(ar.predict(
input_fn=tf.contrib.timeseries.predict_continuation_input_fn(
evaluation, steps=250)))``````

``````plt.figure(figsize=(15, 5))
plt.plot(data['times'].reshape(-1), data['values'].reshape(-1), label='origin')
plt.plot(evaluation['times'].reshape(-1), evaluation['mean'].reshape(-1), label='evaluation')
plt.plot(predictions['times'].reshape(-1), predictions['mean'].reshape(-1), label='prediction')
plt.xlabel('time_step')
plt.ylabel('values')
plt.legend(loc=4)
plt.savefig('predict_result.jpg')``````

## 使用LSTM预测单变量时间序列

``````x = np.array(range(1000))
noise = np.random.uniform(-0.2, 0.2, 1000)
y = np.sin(np.pi * x / 50 ) + np.cos(np.pi * x / 50) + np.sin(np.pi * x / 25) + noise

data = {
tf.contrib.timeseries.TrainEvalFeatures.TIMES: x,
tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,
}

train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(

``````estimator = ts_estimators.TimeSeriesRegressor(
model=_LSTMModel(num_features=1, num_units=128),

num_features = 1表示单变量时间序列，即每个时间点上观察到的量只是一个单独的数值。num_units=128表示使用隐层为128大小的LSTM模型。

``````estimator.train(input_fn=train_input_fn, steps=2000)
evaluation = estimator.evaluate(input_fn=evaluation_input_fn, steps=1)
# Predict starting after the evaluation
(predictions,) = tuple(estimator.predict(
input_fn=tf.contrib.timeseries.predict_continuation_input_fn(
evaluation, steps=200)))``````

## 使用LSTM预测多变量时间序列

``````0,0.926906299771,1.99107237682,2.56546245685,3.07914768197,4.04839057867
1,0.108010001864,1.41645361423,2.1686839775,2.94963962176,4.1263503303
2,-0.800567600028,1.0172132907,1.96434754116,2.99885333086,4.04300485864
3,0.0607042871898,0.719540073421,1.9765012584,2.89265588817,4.0951014426
4,0.933712200629,0.28052120776,1.41018552514,2.69232603996,4.06481164223
5,-0.171730652974,0.260054421028,1.48770816369,2.62199129293,4.44572807842
6,-1.00180162933,0.333045158863,1.50006392277,2.88888309683,4.24755865606
7,0.0580061875336,0.688929398826,1.56543458772,2.99840358953,4.52726873347``````

``````csv_file_name = path.join("./data/multivariate_periods.csv")
csv_file_name,
column_names=((tf.contrib.timeseries.TrainEvalFeatures.TIMES,)
+ (tf.contrib.timeseries.TrainEvalFeatures.VALUES,) * 5))
train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(

``````estimator = ts_estimators.TimeSeriesRegressor(
model=_LSTMModel(num_features=5, num_units=128),

## 扩展阅读

TensorFlow实战之K-Means聚类算法实践

TensorFlow 官方文档中文版

## 为您推荐

Memcached 深度分析
Git详解之八 Git与其他系统
PostgreSQL vs. MS SQL Server

## 更多

TensorFlow

### 阅读目录

 相关文档 　—　更多 相关经验 　—　更多 相关讨论 　—　更多