Fail shard when index service/mappings fails to instantiate #10283

kimchy · 2015-03-26T18:29:15Z

When the index service (which holds shards) fails to be created as a result of a shard being allocated on a node, we should fail the relevant shard, otherwise, it will remain stuck.
Same goes when there is a failure to process updated mappings form the master.

Note, both failures typically happen when the node is misconfigured (i.e. missing plugins, ...), since they get created and processed on the master node before being published.

When the index service (which holds shards) fails to be created as a result of a shard being allocated on a node, we should fail the relevant shard, otherwise, it will remain stuck. Same goes when there is a failure to process updated mappings form the master. Note, both failures typically happen when the node is misconfigured (i.e. missing plugins, ...), since they get created and processed on the master node before being published. closes elastic#10283

dakrone · 2015-03-26T18:46:58Z

src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

+                    logger.warn("[{}] failed to create index for shard {}", e, indexMetaData.index(), shard.shardId());
+                    failedShards.put(shard.shardId(), new FailedShard(shard.version()));
+                    shardStateAction.shardFailed(shard, indexMetaData.getUUID(),
+                            "failed to create index to allocated shard " + shard.shardId() + ", failure " + ExceptionsHelper.detailedMessage(e),


I think "allocated" should be "allocate" here.

dakrone · 2015-03-26T18:47:31Z

Left one comment about grammar, but LGTM otherwise

kimchy · 2015-03-26T20:58:12Z

@dakrone thanks for the review, updated the log message, will wait for another review

bleskes · 2015-03-27T13:42:15Z

src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

-                } catch (Exception e) {
-                    logger.warn("[{}] failed to create index", e, indexMetaData.index());
+                } catch (Throwable e) {
+                    logger.warn("[{}] failed to create index for shard {}", e, indexMetaData.index(), shard.shardId());


does it make sense to put the "fail a shard" logic under a method? to make sure we don't forget to put it in failedShards etc.

bleskes · 2015-03-27T13:54:54Z

Change LGTM2 - left some comments regarding logging and error reporting

bleskes · 2015-03-27T13:55:36Z

+1 on getting this into 1.5.1

kimchy · 2015-03-27T14:54:34Z

@bleskes pushed another round, review?

bleskes · 2015-03-27T15:28:08Z

LGTM again. nice clean up.

When the index service (which holds shards) fails to be created as a result of a shard being allocated on a node, we should fail the relevant shard, otherwise, it will remain stuck. Same goes when there is a failure to process updated mappings form the master. Note, both failures typically happen when the node is misconfigured (i.e. missing plugins, ...), since they get created and processed on the master node before being published. closes #10283

…eerRecovery Fails due to elastic#10283

…eerRecovery Fails due to #10283

After processing mapping updates from the master, we compare the resulting binary representation of them and compare it the one cluster state has. If different, we send a refresh mapping request to master, asking it to reparse the mapping and serialize them again. This mechanism is used to update the mapping after a format change caused by a version upgrade. The very same process can also be triggered when an old master leaves the cluster, triggering a local cluster state update. If that update contains old mapping format, the local node will again signal the need to refresh, but this time there is no master to accept the request. Instead of failing (which we now do because of elastic#10283, we should just skip the notification and wait for the next elected master to publish a new mapping (triggering another refresh if needed).

After processing mapping updates from the master, we compare the resulting binary representation of them and compare it the one cluster state has. If different, we send a refresh mapping request to master, asking it to reparse the mapping and serialize them again. This mechanism is used to update the mapping after a format change caused by a version upgrade. The very same process can also be triggered when an old master leaves the cluster, triggering a local cluster state update. If that update contains old mapping format, the local node will again signal the need to refresh, but this time there is no master to accept the request. Instead of failing (which we now do because of #10283, we should just skip the notification and wait for the next elected master to publish a new mapping (triggering another refresh if needed). Closes #10311

When the index service (which holds shards) fails to be created as a result of a shard being allocated on a node, we should fail the relevant shard, otherwise, it will remain stuck. Same goes when there is a failure to process updated mappings form the master. Note, both failures typically happen when the node is misconfigured (i.e. missing plugins, ...), since they get created and processed on the master node before being published. closes elastic#10283

…eerRecovery Fails due to elastic#10283

After processing mapping updates from the master, we compare the resulting binary representation of them and compare it the one cluster state has. If different, we send a refresh mapping request to master, asking it to reparse the mapping and serialize them again. This mechanism is used to update the mapping after a format change caused by a version upgrade. The very same process can also be triggered when an old master leaves the cluster, triggering a local cluster state update. If that update contains old mapping format, the local node will again signal the need to refresh, but this time there is no master to accept the request. Instead of failing (which we now do because of elastic#10283, we should just skip the notification and wait for the next elected master to publish a new mapping (triggering another refresh if needed). Closes elastic#10311

kimchy force-pushed the fail_shard_on_index_creation_failure branch from 597b7bd to 3f5ffbe Compare March 26, 2015 18:29

kimchy added >bug v2.0.0-beta1 review v1.6.0 v1.5.1 resiliency labels Mar 26, 2015

dakrone reviewed Mar 26, 2015
View reviewed changes

update log message

9c02999

bleskes reviewed Mar 27, 2015
View reviewed changes

kimchy added 2 commits March 27, 2015 14:56

update log message

4016e51

refactor failed handling to a method

15dd0bd

kimchy removed the review label Mar 27, 2015

kimchy closed this in 6181d8e Mar 27, 2015

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Mar 28, 2015

Test: add await fix to RecoveryBackwardsCompatibilityTests.testReuseP…

ca33a18

…eerRecovery Fails due to elastic#10283

bleskes added a commit that referenced this pull request Mar 28, 2015

Test: add await fix to RecoveryBackwardsCompatibilityTests.testReuseP…

1d18746

…eerRecovery Fails due to #10283

bleskes mentioned this pull request Mar 30, 2015

Don't try to send a mapping refresh if there is no master #10311

Closed

clintongormley added the :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. label Apr 9, 2015

kimchy mentioned this pull request May 22, 2015

Random shards do not recover if one index has faulty synonym filter defined #6357

Closed

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Test: add await fix to RecoveryBackwardsCompatibilityTests.testReuseP…

f40a237

…eerRecovery Fails due to elastic#10283

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail shard when index service/mappings fails to instantiate #10283

Fail shard when index service/mappings fails to instantiate #10283

kimchy commented Mar 26, 2015

dakrone Mar 26, 2015

dakrone commented Mar 26, 2015

kimchy commented Mar 26, 2015

bleskes Mar 27, 2015

bleskes commented Mar 27, 2015

bleskes commented Mar 27, 2015

kimchy commented Mar 27, 2015

bleskes commented Mar 27, 2015

Fail shard when index service/mappings fails to instantiate #10283

Fail shard when index service/mappings fails to instantiate #10283

Conversation

kimchy commented Mar 26, 2015

dakrone Mar 26, 2015

Choose a reason for hiding this comment

dakrone commented Mar 26, 2015

kimchy commented Mar 26, 2015

bleskes Mar 27, 2015

Choose a reason for hiding this comment

bleskes commented Mar 27, 2015

bleskes commented Mar 27, 2015

kimchy commented Mar 27, 2015

bleskes commented Mar 27, 2015