Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable discovery ttl and heartbeat timer #18204

Merged
merged 1 commit into from Nov 30, 2015
Merged

Conversation

mavenugo
Copy link
Contributor

Docker daemon uses kv-store as the host-discovery backend.
Discovery module tracks the liveness of a node through a simple keepalive mechanism.
The keepalive mechanism depends on every node performing heartbeat by registering
itself with the discovery module (via KV-Store Put operation). And for every Put operation,
the discovery module in all other nodes will receive a Watch notification. That keeps the
node alive. Any node that fails to register itself within the TTL timer is considered
dead and removed from the discovery database.

The default timer (heartbeat = 20 seconds & ttl = 60 seconds) works fine for small clusters.
But for large clusters, these default timers are extremely aggressive and that causes
high CPU & most of the processing is spent managing the node discovery and that impacts
normal daemon operation.

Hence we need a way to make the discovery ttl and heartbeat configurable.
As the cluster size grows, the user can change these timers to make sure the daemon scales.

Signed-off-by: Madhu Venugopal madhu@docker.com

@thaJeztah
Copy link
Member

@mavenugo can you please add some information as to what this PR is for, and why/where it's needed? Could you also add that information to the commit message, to give reviewers a bit of context while reviewing this. 😉

@mavenugo
Copy link
Contributor Author

@thaJeztah added the required explanation to the PR description. PTAL.

@thaJeztah
Copy link
Member

@mavenugo thanks! Can you also use the same description as the commit message, before other maintainers start to spank me because I didn't check 😇

Docker daemon uses kv-store as the host-discovery backend.
Discovery module tracks the liveness of a node through a simple
keepalive mechanism.  The keepalive mechanism depends on every
node performing heartbeat by registering itself with the discovery
module (via KV-Store Put operation). And for every Put operation,
the discovery module in all other nodes will receive a Watch
notification. That keeps the node alive.
Any node that fails to register itself within the TTL timer is
considered dead and removed from the discovery database.

The default timer (heartbeat = 20 seconds & ttl = 60 seconds)
works fine for small clusters.  But for large clusters, these
default timers are extremely aggressive and that causes high CPU
& most of the processing is spent managing the node discovery
and that impacts normal daemon operation.

Hence we need a way to make the discovery ttl and heartbeat
configurable.  As the cluster size grows, the user can change
these timers to make sure the daemon scales.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
@mavenugo
Copy link
Contributor Author

@thaJeztah done :)

@icecrime
Copy link
Contributor

Design LGTM

@icecrime
Copy link
Contributor

LGTM

1 similar comment
@abronan
Copy link
Contributor

abronan commented Nov 30, 2015

LGTM

abronan added a commit that referenced this pull request Nov 30, 2015
Configurable discovery ttl and heartbeat timer
@abronan abronan merged commit 0f0cf26 into moby:master Nov 30, 2015
@abronan
Copy link
Contributor

abronan commented Nov 30, 2015

Ahh totally forgot about the small docs portion.. 😞

Sorry @thaJeztah

@thaJeztah
Copy link
Member

hehe, alright, I'll have a look.

One quick find; looks like the new options also need to be added to the man page; https://github.com/docker/docker/blob/master/man/docker-daemon.8.md#cluster-store-options

The descriptions of the options are a bit hard to read, because it's a very long sentence. I'll see if I can come with a suggestion in a bit (was looking at some other stuff :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants