Configurable discovery ttl and heartbeat timer #18204

mavenugo · 2015-11-24T20:01:41Z

Docker daemon uses kv-store as the host-discovery backend.
Discovery module tracks the liveness of a node through a simple keepalive mechanism.
The keepalive mechanism depends on every node performing heartbeat by registering
itself with the discovery module (via KV-Store Put operation). And for every Put operation,
the discovery module in all other nodes will receive a Watch notification. That keeps the
node alive. Any node that fails to register itself within the TTL timer is considered
dead and removed from the discovery database.

The default timer (heartbeat = 20 seconds & ttl = 60 seconds) works fine for small clusters.
But for large clusters, these default timers are extremely aggressive and that causes
high CPU & most of the processing is spent managing the node discovery and that impacts
normal daemon operation.

Hence we need a way to make the discovery ttl and heartbeat configurable.
As the cluster size grows, the user can change these timers to make sure the daemon scales.

Signed-off-by: Madhu Venugopal madhu@docker.com

thaJeztah · 2015-11-24T23:01:13Z

@mavenugo can you please add some information as to what this PR is for, and why/where it's needed? Could you also add that information to the commit message, to give reviewers a bit of context while reviewing this. 😉

mavenugo · 2015-11-25T02:18:04Z

@thaJeztah added the required explanation to the PR description. PTAL.

thaJeztah · 2015-11-25T10:33:37Z

@mavenugo thanks! Can you also use the same description as the commit message, before other maintainers start to spank me because I didn't check 😇

Docker daemon uses kv-store as the host-discovery backend. Discovery module tracks the liveness of a node through a simple keepalive mechanism. The keepalive mechanism depends on every node performing heartbeat by registering itself with the discovery module (via KV-Store Put operation). And for every Put operation, the discovery module in all other nodes will receive a Watch notification. That keeps the node alive. Any node that fails to register itself within the TTL timer is considered dead and removed from the discovery database. The default timer (heartbeat = 20 seconds & ttl = 60 seconds) works fine for small clusters. But for large clusters, these default timers are extremely aggressive and that causes high CPU & most of the processing is spent managing the node discovery and that impacts normal daemon operation. Hence we need a way to make the discovery ttl and heartbeat configurable. As the cluster size grows, the user can change these timers to make sure the daemon scales. Signed-off-by: Madhu Venugopal <madhu@docker.com>

mavenugo · 2015-11-25T14:54:07Z

@thaJeztah done :)

icecrime · 2015-11-30T17:53:22Z

Design LGTM

icecrime · 2015-11-30T17:55:03Z

LGTM

abronan · 2015-11-30T18:07:43Z

LGTM

Configurable discovery ttl and heartbeat timer

abronan · 2015-11-30T19:04:30Z

Ahh totally forgot about the small docs portion.. 😞

Sorry @thaJeztah

thaJeztah · 2015-11-30T19:41:12Z

hehe, alright, I'll have a look.

One quick find; looks like the new options also need to be added to the man page; https://github.com/docker/docker/blob/master/man/docker-daemon.8.md#cluster-store-options

The descriptions of the options are a bit hard to read, because it's a very long sentence. I'll see if I can come with a suggestion in a bit (was looking at some other stuff :))

GordonTheTurtle added the status/0-triage label Nov 24, 2015

mavenugo added this to the 1.10 milestone Nov 24, 2015

mavenugo force-pushed the dhb branch from 04704e8 to 5ff623c Compare November 24, 2015 20:08

vdemeester added status/1-design-review and removed status/0-triage labels Nov 24, 2015

mavenugo force-pushed the dhb branch from 5ff623c to 2be3053 Compare November 25, 2015 03:28

mavenugo force-pushed the dhb branch from 2be3053 to 2efdb8c Compare November 25, 2015 14:53

icecrime added status/2-code-review and removed status/1-design-review labels Nov 30, 2015

abronan added status/4-merge and removed status/2-code-review labels Nov 30, 2015

abronan added a commit that referenced this pull request Nov 30, 2015

Merge pull request #18204 from mavenugo/dhb

0f0cf26

Configurable discovery ttl and heartbeat timer

abronan merged commit 0f0cf26 into moby:master Nov 30, 2015

thaJeztah added the impact/changelog label Jan 15, 2016

mavenugo deleted the dhb branch January 21, 2016 11:45

albers mentioned this pull request Jan 24, 2016

bash completion for new --cluster-store-opt values #19636

Merged

sdurrheimer mentioned this pull request Jan 25, 2016

Add zsh completion for new 'docker daemon --cluster-store-opt discove… #19656

Merged

chanwit mentioned this pull request Feb 4, 2016

Performance degradation on cluster with overlay networking (cluster-store related) docker-archive/classicswarm#1750

Closed

thaJeztah mentioned this pull request Mar 14, 2017

The variable heartbeat might be 0 #31755

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable discovery ttl and heartbeat timer #18204

Configurable discovery ttl and heartbeat timer #18204

mavenugo commented Nov 24, 2015

thaJeztah commented Nov 24, 2015

mavenugo commented Nov 25, 2015

thaJeztah commented Nov 25, 2015

mavenugo commented Nov 25, 2015

icecrime commented Nov 30, 2015

icecrime commented Nov 30, 2015

abronan commented Nov 30, 2015

abronan commented Nov 30, 2015

thaJeztah commented Nov 30, 2015

Configurable discovery ttl and heartbeat timer #18204

Configurable discovery ttl and heartbeat timer #18204

Conversation

mavenugo commented Nov 24, 2015

thaJeztah commented Nov 24, 2015

mavenugo commented Nov 25, 2015

thaJeztah commented Nov 25, 2015

mavenugo commented Nov 25, 2015

icecrime commented Nov 30, 2015

icecrime commented Nov 30, 2015

abronan commented Nov 30, 2015

abronan commented Nov 30, 2015

thaJeztah commented Nov 30, 2015