New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable discovery ttl and heartbeat timer #18204
Conversation
@mavenugo can you please add some information as to what this PR is for, and why/where it's needed? Could you also add that information to the commit message, to give reviewers a bit of context while reviewing this. 😉 |
@thaJeztah added the required explanation to the PR description. PTAL. |
@mavenugo thanks! Can you also use the same description as the commit message, before other maintainers start to spank me because I didn't check 😇 |
Docker daemon uses kv-store as the host-discovery backend. Discovery module tracks the liveness of a node through a simple keepalive mechanism. The keepalive mechanism depends on every node performing heartbeat by registering itself with the discovery module (via KV-Store Put operation). And for every Put operation, the discovery module in all other nodes will receive a Watch notification. That keeps the node alive. Any node that fails to register itself within the TTL timer is considered dead and removed from the discovery database. The default timer (heartbeat = 20 seconds & ttl = 60 seconds) works fine for small clusters. But for large clusters, these default timers are extremely aggressive and that causes high CPU & most of the processing is spent managing the node discovery and that impacts normal daemon operation. Hence we need a way to make the discovery ttl and heartbeat configurable. As the cluster size grows, the user can change these timers to make sure the daemon scales. Signed-off-by: Madhu Venugopal <madhu@docker.com>
@thaJeztah done :) |
Design LGTM |
LGTM |
1 similar comment
LGTM |
Configurable discovery ttl and heartbeat timer
Ahh totally forgot about the small docs portion.. 😞 Sorry @thaJeztah |
hehe, alright, I'll have a look. One quick find; looks like the new options also need to be added to the man page; https://github.com/docker/docker/blob/master/man/docker-daemon.8.md#cluster-store-options The descriptions of the options are a bit hard to read, because it's a very long sentence. I'll see if I can come with a suggestion in a bit (was looking at some other stuff :)) |
Docker daemon uses kv-store as the host-discovery backend.
Discovery module tracks the liveness of a node through a simple keepalive mechanism.
The keepalive mechanism depends on every node performing heartbeat by registering
itself with the discovery module (via KV-Store Put operation). And for every Put operation,
the discovery module in all other nodes will receive a Watch notification. That keeps the
node alive. Any node that fails to register itself within the TTL timer is considered
dead and removed from the discovery database.
The default timer (heartbeat = 20 seconds & ttl = 60 seconds) works fine for small clusters.
But for large clusters, these default timers are extremely aggressive and that causes
high CPU & most of the processing is spent managing the node discovery and that impacts
normal daemon operation.
Hence we need a way to make the discovery ttl and heartbeat configurable.
As the cluster size grows, the user can change these timers to make sure the daemon scales.
Signed-off-by: Madhu Venugopal madhu@docker.com