Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple 'children' aggregations interfere with each other #9958

Closed
perryn opened this issue Mar 3, 2015 · 9 comments
Closed

multiple 'children' aggregations interfere with each other #9958

perryn opened this issue Mar 3, 2015 · 9 comments

Comments

@perryn
Copy link

perryn commented Mar 3, 2015

If you have two sibling aggregations that make use of the 'children' aggregation, then the second aggregation seems to count each document twice.

Here is a cut-down example

I have a mapping that has parent documents of type 'datastreams' with child documents of type 'readings'

I ran this, which has two identical sibling aggregations:

curl -XGET 'https://blahblahblah/index/datastreams/_search?search_type=count&pretty' -d '{
"aggs" : {
     "one" : {
        "children": {
          "type": "readings"
        },
       "aggs" :{
         "grand-total" : { "sum" : { "field" : "readings.dv" } }
       }
     },
     "two" : {
        "children": {
          "type": "readings"
         },
        "aggs" :{
          "grand-total" : { "sum" : { "field" : "readings.dv" } }
        }
     }  
 }  
}' 

and got the response

{
   "took" : 23620,
   "timed_out" : false,
   "_shards" : {
      "total" : 5,
      "successful" : 5,
      "failed" : 0
   },
  "hits" : {
     "total" : 1389,
     "max_score" : 0.0,
     "hits" : [ ]
  },
 "aggregations" : {
    "two" : {
       "doc_count" : 729353222,
       "grand-total" : {
          "value" : 2.389726905175203E10
       }
    },
   "one" : {
      "doc_count" : 364676611,
      "grand-total" : {
          "value" : 1.1948634525852589E10
      }
   }
 }
}

Although aggregation 'one' and 'two' are identical Elasticsearch seems to have double counted the documents for aggregation 'two'

@martijnvg
Copy link
Member

Hi @perryn on what version of ES are experiencing this interference issue? Also are you able to reproduce on a smaller scale?

@perryn
Copy link
Author

perryn commented Mar 3, 2015

Hi @martijnvg

The above was seen on 1.4.4 but I was also able to reproduce the issue on a smaller scale on 1.4.2.

I ran the same query as above and got the following result

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 600,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "one" : {
      "doc_count" : 12,
      "grand-total" : {
        "value" : 259.251
      }
    },
    "two" : {
      "doc_count" : 24,
      "grand-total" : {
        "value" : 518.502
      }
    }
  }
}

cheers
Perryn

@martijnvg
Copy link
Member

@perryn I think that the issue you're reporting is fixed by #10263. Would be great if you can verify this!

@AlexKovalevich
Copy link

Can reproduce in 1.5. That's actually how I came here, by trying to find a solution.

If I specify children aggregation first and then would do different sub aggregations under it - they seem to work as expected. So the example below returns two identical histograms as requested for this example.
{
"aggregations": {
"rating_histogram_total_1": {
"children": {
"type": "myIndex"
},
"aggregations": {
"rating_histogram_total_1": {
"histogram": {
"field": "myIndex.reviewsRatingAverage",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 0,
"max": 5
}
}
},
"rating_histogram_total_2": {
"histogram": {
"field": "myIndex.reviewsRatingAverage",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 0,
"max": 5
}
}
}
}
}
}
}

The following request, however, doesn't work. If I put children aggregation under different filters and then try to use the same histogram, the second histogram returns values n^2, the third one - n^3 and so on. I didn't debug what would happen if filters above "children would have different criteria s".

Example: query:

{
"aggregations": {
"rating_histogram_total_1": {
"children": {
"type": "myIndex"
},
"aggregations": {
"rating_histogram_total_1": {
"histogram": {
"field": "myIndex.reviewsRatingAverage",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 0,
"max": 5
}
}
}
}
},
"rating_histogram_total_2": {
"children": {
"type": "myIndex"
},
"aggregations": {
"rating_histogram_total_2": {
"histogram": {
"field": "myIndex.reviewsRatingAverage",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 0,
"max": 5
}
}
}
}
},
"rating_histogram_total_3": {
"children": {
"type": "myIndex"
},
"aggregations": {
"rating_histogram_total_3": {
"histogram": {
"field": "myIndex.reviewsRatingAverage",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 0,
"max": 5
}
}
}
}
},
"rating_histogram_total_4": {
"children": {
"type": "myIndex"
},
"aggregations": {
"rating_histogram_total_4": {
"histogram": {
"field": "myIndex.reviewsRatingAverage",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 0,
"max": 5
}
}
}
}
}
}
}

And the results are:

{
"aggregations": {
"rating_histogram_total_4": {
"doc_count": 32,
"rating_histogram_total_4": {
"buckets": [
{
"key": 0,
"doc_count": 24
},
{
"key": 1,
"doc_count": 0
},
{
"key": 2,
"doc_count": 0
},
{
"key": 3,
"doc_count": 8
},
{
"key": 4,
"doc_count": 0
},
{
"key": 5,
"doc_count": 0
}
]
}
},
"rating_histogram_total_1": {
"doc_count": 4,
"rating_histogram_total_1": {
"buckets": [
{
"key": 0,
"doc_count": 3
},
{
"key": 1,
"doc_count": 0
},
{
"key": 2,
"doc_count": 0
},
{
"key": 3,
"doc_count": 1
},
{
"key": 4,
"doc_count": 0
},
{
"key": 5,
"doc_count": 0
}
]
}
},
"rating_histogram_total_3": {
"doc_count": 16,
"rating_histogram_total_3": {
"buckets": [
{
"key": 0,
"doc_count": 12
},
{
"key": 1,
"doc_count": 0
},
{
"key": 2,
"doc_count": 0
},
{
"key": 3,
"doc_count": 4
},
{
"key": 4,
"doc_count": 0
},
{
"key": 5,
"doc_count": 0
}
]
}
},
"rating_histogram_total_2": {
"doc_count": 8,
"rating_histogram_total_2": {
"buckets": [
{
"key": 0,
"doc_count": 6
},
{
"key": 1,
"doc_count": 0
},
{
"key": 2,
"doc_count": 0
},
{
"key": 3,
"doc_count": 2
},
{
"key": 4,
"doc_count": 0
},
{
"key": 5,
"doc_count": 0
}
]
}
}
}
}

I hope it will help.
Thank you.

@martijnvg
Copy link
Member

@AlexKovalevich Thanks for sharing. It think this was fixed via #10263 (since it looks similar to the issue you describe). This will be included in 1.5.1. I can't be sure if this really fixed your issue, because there is no reproduction. If you like you can build from the 1.5 branch and see if the issue still occurs. This would help a lot.

@martijnvg
Copy link
Member

@AlexKovalevich Now that 1.5.1 has been released can you check if the issue still occurs in your environment?

@fozzylyon
Copy link

@martijnvg I was running into the same issue and I can verify that 1.5.1 fixed it for me.

@clintongormley
Copy link

awesome. @fozzylyon thanks for letting us know. closing

@AlexKovalevich
Copy link

Fixed for me too in 1.5.1 !
(Sorry for delay, couldn't test earlier)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants