把Elasticsearch作为时间序列数据库使用

bing200767 8年前

来自: http://blog.csdn.net//jiao_fuyou/article/details/49663687


这篇文章算是对另一篇《Elasticsearch as a Time Series Data Store》的简单翻译吧,自己的理解吧。

  • 首先_source被关闭了,这样原始的json文档不会被重复存储一遍。
  • 其次_all也被关闭了。而且每个字段的store都是False,也就是不会单独被存储。
  • 这些都关掉了,那么数据存哪里了?存在doc_values里。doc_values用于在做聚合运算的时候,根据一批文档id快速找到对应的列的值。doc_values在磁盘上一个按列压缩存储的文件,非常高效。
curl -XPOST http://172.16.18.116:9200/test -d '  {      "settings": { "number_of_shards": 1, "number_of_replicas": 0, "index.query.default_field": "timestamp", "index.mapping.ignore_malformed": false, "index.mapping.coerce": false, "index.query.parse.allow_unmapped_fields": false },      "mappings": { "test": { "_source": {"enabled": false}, "_all": {"enabled": false}, "properties": { "timestamp": { "type": "date", "index": "no", "store": false, "dynamic": "strict", "doc_values": true, "fielddata": { "format": "doc_values" } }, "appid": { "type": "string", "index": "no", "store": false, "dynamic": "strict", "doc_values": true, "fielddata": { "format": "doc_values" } }, "result": { "type": "string", "index": "no", "store": false, "dynamic": "strict", "doc_values": true, "fielddata": { "format": "doc_values" } }, "cmdid": { "type": "string", "index": "no", "store": false, "dynamic": "strict", "doc_values": true, "fielddata": { "format": "doc_values" } }, "optime": { "type": "integer", "index": "no", "store": false, "dynamic": "strict", "doc_values": true, "fielddata": { "format": "doc_values" } }, "total_count": { "type": "integer", "index": "no", "store": false, "dynamic": "strict", "doc_values": true, "fielddata": { "format": "doc_values" } } } } } }'

增加一条数据:

curl -XPOST http://172.16.18.116:9200/test/test/1 -d '  {      "timestamp": 53534543,      "appid": 1,      "result": "test",      "cmdid": "test",      "optime": 53534543,      "total_count": 100 }  '

查询一下:

curl -XGET http://172.16.18.116:9200/test/test/_search  {      "took": 1,      "timed_out": false,      "_shards": { "total": 1, "successful": 1, "failed": 0 },      "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 1 } ] } }

能查到数据,但是看不到原始字段内容,因为没存储也没索引,但是doc_values=true,实际上是保存到了磁盘上的

下面做一下聚合操作:

curl -XPOST http://172.16.18.116:9200/test/test/_search  {      "aggs": { "timestamp": { "terms": { "field": "timestamp" }, "aggs": { "total_count": {"sum": {"field": "total_count"}} } } } }

结果:

{      "took": 2,      "timed_out": false,      "_shards": { "total": 1, "successful": 1, "failed": 0 },      "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 1 } ] },      "aggregations": { "timestamp": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 53534543, "key_as_string": "1970-01-01T14:52:14.543Z", "doc_count": 1, "total_count": { "value": 100 } } ] } } }

可以看到聚合操作可以获取到total_count值。