Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail shard when index service/mappings fails to instantiate #10283

Closed

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented Mar 26, 2015

When the index service (which holds shards) fails to be created as a result of a shard being allocated on a node, we should fail the relevant shard, otherwise, it will remain stuck.
Same goes when there is a failure to process updated mappings form the master.

Note, both failures typically happen when the node is misconfigured (i.e. missing plugins, ...), since they get created and processed on the master node before being published.

When the index service (which holds shards) fails to be created as a result of a shard being allocated on a node, we should fail the relevant shard, otherwise, it will remain stuck.
Same goes when there is a failure to process updated mappings form the master.

Note, both failures typically happen when the node is misconfigured (i.e. missing plugins, ...), since they get created and processed on the master node before being published.
closes elastic#10283
logger.warn("[{}] failed to create index for shard {}", e, indexMetaData.index(), shard.shardId());
failedShards.put(shard.shardId(), new FailedShard(shard.version()));
shardStateAction.shardFailed(shard, indexMetaData.getUUID(),
"failed to create index to allocated shard " + shard.shardId() + ", failure " + ExceptionsHelper.detailedMessage(e),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "allocated" should be "allocate" here.

@dakrone
Copy link
Member

dakrone commented Mar 26, 2015

Left one comment about grammar, but LGTM otherwise

@kimchy
Copy link
Member Author

kimchy commented Mar 26, 2015

@dakrone thanks for the review, updated the log message, will wait for another review

} catch (Exception e) {
logger.warn("[{}] failed to create index", e, indexMetaData.index());
} catch (Throwable e) {
logger.warn("[{}] failed to create index for shard {}", e, indexMetaData.index(), shard.shardId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to put the "fail a shard" logic under a method? to make sure we don't forget to put it in failedShards etc.

@bleskes
Copy link
Contributor

bleskes commented Mar 27, 2015

Change LGTM2 - left some comments regarding logging and error reporting

@bleskes
Copy link
Contributor

bleskes commented Mar 27, 2015

+1 on getting this into 1.5.1

@kimchy
Copy link
Member Author

kimchy commented Mar 27, 2015

@bleskes pushed another round, review?

@bleskes
Copy link
Contributor

bleskes commented Mar 27, 2015

LGTM again. nice clean up.

@kimchy kimchy removed the review label Mar 27, 2015
@kimchy kimchy closed this in 6181d8e Mar 27, 2015
kimchy added a commit that referenced this pull request Mar 27, 2015
When the index service (which holds shards) fails to be created as a result of a shard being allocated on a node, we should fail the relevant shard, otherwise, it will remain stuck.
Same goes when there is a failure to process updated mappings form the master.

Note, both failures typically happen when the node is misconfigured (i.e. missing plugins, ...), since they get created and processed on the master node before being published.
closes #10283
kimchy added a commit that referenced this pull request Mar 27, 2015
When the index service (which holds shards) fails to be created as a result of a shard being allocated on a node, we should fail the relevant shard, otherwise, it will remain stuck.
Same goes when there is a failure to process updated mappings form the master.

Note, both failures typically happen when the node is misconfigured (i.e. missing plugins, ...), since they get created and processed on the master node before being published.
closes #10283
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Mar 28, 2015
bleskes added a commit that referenced this pull request Mar 28, 2015
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Mar 30, 2015
After processing mapping updates from the master, we compare the resulting binary representation of them and compare it the one cluster state has. If different, we send a refresh mapping request to master, asking it to reparse the mapping and serialize them again. This mechanism is used to update the mapping after a format change caused by a version upgrade.

The very same process can also be triggered when an old master leaves the cluster, triggering a local cluster state update. If that update contains old mapping format, the local node will again signal the need to refresh, but this time there is no master to accept the request. Instead of failing (which we now do because of elastic#10283, we should just skip the notification and wait for the next elected master to publish a new mapping (triggering another refresh if needed).
bleskes added a commit that referenced this pull request Mar 30, 2015
After processing mapping updates from the master, we compare the resulting binary representation of them and compare it the one cluster state has. If different, we send a refresh mapping request to master, asking it to reparse the mapping and serialize them again. This mechanism is used to update the mapping after a format change caused by a version upgrade.

The very same process can also be triggered when an old master leaves the cluster, triggering a local cluster state update. If that update contains old mapping format, the local node will again signal the need to refresh, but this time there is no master to accept the request. Instead of failing (which we now do because of #10283, we should just skip the notification and wait for the next elected master to publish a new mapping (triggering another refresh if needed).

Closes #10311
bleskes added a commit that referenced this pull request Mar 30, 2015
After processing mapping updates from the master, we compare the resulting binary representation of them and compare it the one cluster state has. If different, we send a refresh mapping request to master, asking it to reparse the mapping and serialize them again. This mechanism is used to update the mapping after a format change caused by a version upgrade.

The very same process can also be triggered when an old master leaves the cluster, triggering a local cluster state update. If that update contains old mapping format, the local node will again signal the need to refresh, but this time there is no master to accept the request. Instead of failing (which we now do because of #10283, we should just skip the notification and wait for the next elected master to publish a new mapping (triggering another refresh if needed).

Closes #10311
bleskes added a commit that referenced this pull request Mar 30, 2015
After processing mapping updates from the master, we compare the resulting binary representation of them and compare it the one cluster state has. If different, we send a refresh mapping request to master, asking it to reparse the mapping and serialize them again. This mechanism is used to update the mapping after a format change caused by a version upgrade.

The very same process can also be triggered when an old master leaves the cluster, triggering a local cluster state update. If that update contains old mapping format, the local node will again signal the need to refresh, but this time there is no master to accept the request. Instead of failing (which we now do because of #10283, we should just skip the notification and wait for the next elected master to publish a new mapping (triggering another refresh if needed).

Closes #10311
@clintongormley clintongormley added the :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. label Apr 9, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
When the index service (which holds shards) fails to be created as a result of a shard being allocated on a node, we should fail the relevant shard, otherwise, it will remain stuck.
Same goes when there is a failure to process updated mappings form the master.

Note, both failures typically happen when the node is misconfigured (i.e. missing plugins, ...), since they get created and processed on the master node before being published.
closes elastic#10283
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
After processing mapping updates from the master, we compare the resulting binary representation of them and compare it the one cluster state has. If different, we send a refresh mapping request to master, asking it to reparse the mapping and serialize them again. This mechanism is used to update the mapping after a format change caused by a version upgrade.

The very same process can also be triggered when an old master leaves the cluster, triggering a local cluster state update. If that update contains old mapping format, the local node will again signal the need to refresh, but this time there is no master to accept the request. Instead of failing (which we now do because of elastic#10283, we should just skip the notification and wait for the next elected master to publish a new mapping (triggering another refresh if needed).

Closes elastic#10311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. resiliency v1.5.1 v1.6.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants