Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless mapping refresh #10318

Closed
EikeDehling opened this issue Mar 30, 2015 · 13 comments
Closed

Endless mapping refresh #10318

EikeDehling opened this issue Mar 30, 2015 · 13 comments

Comments

@EikeDehling
Copy link
Contributor

  • We're running ES 1.4.2
  • Can sometimes be solved by closing/re-opening affected indices, but the issue usually returns prettys soon
  • This "clogs" up pending tasks and some tasks get stuck / cluster actions will now always timeout (e.g. loading/removing warmers)
  • This consumes all network capacity on the ES nodes

https://gist.github.com/EikeDehling/a015a5137ac5d99dc850

[buzzcapture@hermes ~]$ curl http://artemis3:9200/_cat/pending_tasks
18149502 1s HIGH refresh-mapping [postings-5360000000][[posting]]
18149503 745ms HIGH refresh-mapping [postings-5500000000][[posting]]
18149509 736ms HIGH refresh-mapping [postings-5360000000][[posting]]
18149504 737ms HIGH refresh-mapping [postings-4180000000][[posting]]
18149510 735ms HIGH refresh-mapping [postings-5190000000][[posting]]
18149512 735ms HIGH refresh-mapping [postings-5430000000][[posting]]
18149506 736ms HIGH refresh-mapping [postings-5430000000][[posting]]
18149505 736ms HIGH refresh-mapping [postings-5500000000][[posting]]
18149511 735ms HIGH refresh-mapping [postings-5430000000][[posting]]
18149515 732ms HIGH refresh-mapping [postings-5430000000][[posting]]
18149519 290ms HIGH refresh-mapping [postings-5430000000][[posting]]
18149521 289ms HIGH refresh-mapping [postings-5190000000][[posting]]
18149513 734ms HIGH refresh-mapping [postings-5100000000][[posting]]
18149525 287ms HIGH refresh-mapping [postings-5100000000][[posting]]
18149507 736ms HIGH refresh-mapping [postings-5500000000][[posting]]
18149508 736ms HIGH refresh-mapping [postings-5500000000][[posting]]
18149514 733ms HIGH refresh-mapping [postings-5500000000][[posting]]
18149516 299ms HIGH refresh-mapping [postings-4180000000][[posting]]
18149517 298ms HIGH refresh-mapping [postings-5360000000][[posting]]
18149518 291ms HIGH refresh-mapping [postings-5430000000][[posting]]
12674966 1.7d NORMAL master ping (from: [Y8LnaPqjTv-4Vn4CWXWWlQ])
18149520 290ms HIGH refresh-mapping [postings-5430000000][[posting]]
12676681 1.7d NORMAL master ping (from: [Y8LnaPqjTv-4Vn4CWXWWlQ])
18149522 289ms HIGH refresh-mapping [postings-5190000000][[posting]]
18149523 288ms HIGH refresh-mapping [postings-5100000000][[posting]]
18149524 288ms HIGH refresh-mapping [postings-4180000000][[posting]]
12678378 1.7d NORMAL master ping (from: [Y8LnaPqjTv-4Vn4CWXWWlQ])
18149526 286ms HIGH refresh-mapping [postings-5430000000][[posting]]
18149527 286ms HIGH refresh-mapping [postings-5190000000][[posting]]
18149528 286ms HIGH refresh-mapping [postings-4180000000][[posting]]
18149529 286ms HIGH refresh-mapping [postings-5500000000][[posting]]
18149530 284ms HIGH refresh-mapping [postings-5500000000][[posting]]
18149531 284ms HIGH refresh-mapping [postings-5360000000][[posting]]
18149532 284ms HIGH refresh-mapping [postings-5100000000][[posting]]
18149533 284ms HIGH refresh-mapping [postings-5500000000][[posting]]
18149534 281ms HIGH refresh-mapping [postings-5360000000][[posting]]

@bleskes
Copy link
Contributor

bleskes commented Mar 30, 2015

@EikeDehling can you enable debug logging for indices.cluster and grep the logs for the output of this line? I want to see what the difference is between the two sources . A gist will be great.

logger.debug("[{}] parsed mapping [{}], and got different sources\noriginal:\n{}\nparsed:\n{}", index, mappingType, mappingSource, mapperService.documentMapper(mappingType).mappingSource());

@EikeDehling
Copy link
Contributor Author

@bleskes Thanks for the quick reponse!

Gist here: https://gist.github.com/EikeDehling/129aa3f8213ad8552f49

The difference in mapping appears to be in nested elements, apparently they are not ordered alphbetically? The difference in serialisation is under posting.properties.body.fields.text.fielddata , there entries there are ordered differently in the original/parsed version.

@EikeDehling
Copy link
Contributor Author

This gist is a bit easier to read:

https://gist.github.com/EikeDehling/fc1289cc443b7acdc3f4

The issue is under the key posting.properties.body.fields._text_.fielddata : Ordering is different for the original/parsed mapping.

@bleskes
Copy link
Contributor

bleskes commented Mar 31, 2015

@EikeDehling thx for that. It's accurate. The problem lies in the way the field data settings are rendered:

https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/elasticsearch/index/mapper/core/AbstractFieldMapper.java#L756

builder.field("fielddata", (Map) fieldDataType.getSettings().getAsMap());

The order of the keys in that map is arbitrary (practically). It may be different between master and nodes causing this endless loop.

To work around this, you can set indices.cluster.send_refresh_mapping to false (requires node restart). This will disable the sending of mapping refresh instructions. You must remember to remove this settings before you upgrade to the next ES version, which will have a fix for this.

@bleskes
Copy link
Contributor

bleskes commented Mar 31, 2015

@EikeDehling do you run on Java8 by any chance? (wondering to better understand how frequently this can happen)

@EikeDehling
Copy link
Contributor Author

We're running Java7, 1.7.0_45

Thanks for the tip about settings.

I also found that line of code indeed, i'll try and make a patch/test.

@bleskes
Copy link
Contributor

bleskes commented Apr 1, 2015

Cool. If you wait an hour or two, I’ll probably make a PR with a fix.

On 01 Apr 2015, at 10:25, EikeDehling notifications@github.com wrote:

We're running Java7, 1.7.0_45

Thanks for the tip about settings.

I also found that line of code indeed, i'll try and make a patch/test.


Reply to this email directly or view it on GitHub.

@EikeDehling
Copy link
Contributor Author

This is my initial patch+unit test, happy to compare to what you're producing. Sorry, i'm not that handy with github/PR's yet.

https://gist.github.com/EikeDehling/2e34a78a54de646b71ca

Any chance there will also be a 1.4 release with a fix?

@wkoot
Copy link

wkoot commented Apr 1, 2015

@bleskes You said that the indices.cluster.send_refresh_mapping requires node restart, what would the effect be if you only have a few (say half of) the nodes which have this setting set to false?

@bleskes
Copy link
Contributor

bleskes commented Apr 1, 2015

Then the other half might still send refresh mapping to the master. You need it on all data nodes. But you can do a rolling restart, one by one.

On 01 Apr 2015, at 11:24, wkoot notifications@github.com wrote:

@bleskes You said that the indices.cluster.send_refresh_mapping requires node restart, what would the effect be if you only have a few (say half of) the nodes which have this setting set to false?


Reply to this email directly or view it on GitHub.

@EikeDehling
Copy link
Contributor Author

I am trying this fix in our staging environment, i'll let you know if that fixes the issue.

https://github.com/EikeDehling/elasticsearch/commit/cc79d71bbc4d55cb12a50df2acc67ca6ba4ac5dc

@bleskes
Copy link
Contributor

bleskes commented Apr 1, 2015

@EikeDehling looks good can you make a PR ? see https://www.elastic.co/contributing-to-elasticsearch . Also it would be great if you simplify the test and add random keys (but we can iterate on the PR).

@EikeDehling
Copy link
Contributor Author

I made a PR, and afterwards signed the contributor license, i hope that's ok.

I randomized the test and simplified a bit, happy to hear suggestions for improvements.

@EikeDehling EikeDehling reopened this Apr 1, 2015
@bleskes bleskes closed this as completed in 2fc2c82 Apr 2, 2015
bleskes added a commit to bleskes/elasticsearch that referenced this issue Apr 8, 2015
We recently run into two issues where mapping weren't serialized in a consistent manner (elastic#10302 and elastic#10318). We rely on this consistency to do a byte level checl that mappings we get from the master are indentical to the one we have locally. Mistakes here can cause endless refresh mapping loops.

This commit adds an assert that verifies this upon every update from the master.
bleskes added a commit that referenced this issue May 29, 2015
We recently run into two issues where mapping weren't serialized in a consistent manner (#10302 and #10318). We rely on this consistency to do a byte level checl that mappings we get from the master are indentical to the one we have locally. Mistakes here can cause endless refresh mapping loops.

This commit adds an assert that verifies this upon every update from the master.
bleskes added a commit that referenced this issue May 29, 2015
We recently run into two issues where mapping weren't serialized in a consistent manner (#10302 and #10318). We rely on this consistency to do a byte level checl that mappings we get from the master are indentical to the one we have locally. Mistakes here can cause endless refresh mapping loops.

This commit adds an assert that verifies this upon every update from the master.
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants