Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't try to use default route MTU as container MTU #18108

Merged
merged 1 commit into from Dec 7, 2015
Merged

don't try to use default route MTU as container MTU #18108

merged 1 commit into from Dec 7, 2015

Conversation

phemmer
Copy link
Contributor

@phemmer phemmer commented Nov 20, 2015

Trying to use the default route's MTU as the container (bridge) MTU is a bad idea:

  • The box can have multiple default routes with different MTUs.
  • The default route and MTU can change after docker has started.
  • Traffic from the container might not even go over the default route (alternate routes, virtual networks, & inter-container communication).

Aside from the issues trying to determine the MTU to use, it's also unnecessary. The kernel performs path MTU discovery to resolve this exact situation. So, this PR lets the kernel do its job.

It might even be a good idea to raise the default MTU to 9000. But this PR just fixes the bad behavior. We can improve things in another PR.

closes #7796

@LK4D4
Copy link
Contributor

LK4D4 commented Nov 20, 2015

LGTM
ping @mavenugo @mrjana @aboch

@calavera
Copy link
Contributor

LGTM, but I'll wait until someone in the networking teams gives their 👍

@LK4D4
Copy link
Contributor

LK4D4 commented Nov 23, 2015

ping @mavenugo @mrjana @aboch

@aboch
Copy link
Contributor

aboch commented Nov 23, 2015

@phemmer

This logic was already removed in #13060

Then it was added back (slightly modified) in #13953 in order to fix #13952

Can you please check whether your change is not going to reopen the issue #13952
(Also pinging @ibuildthecloud who opened 13952)

@phemmer
Copy link
Contributor Author

phemmer commented Nov 25, 2015

Yes, this will introduce the behavior described in #13952. However #13952 does not mention any actual problems caused by the behavior.

@mountkin
Copy link
Contributor

@phemmer PTAL #13475

@aboch
Copy link
Contributor

aboch commented Nov 25, 2015

If manully setting --mtu daemon flag is an acceptable solution for #13952, then this PR should be merged.
It would also help on #18249

@phemmer
Copy link
Contributor Author

phemmer commented Nov 25, 2015

I'm going to take a look at what mountkin mentioned.
My first thought is something was misconfigured on the system as PMUTD is supposed to prevent such issues. Though I'll see if I can get my hands on a GCE box and replicate the issue.

@phemmer
Copy link
Contributor Author

phemmer commented Nov 26, 2015

Yeah, I'm not able to reproduce any issues on GCE. Everything behaves exactly as it's supposed to.

I launched a GCE box. Host MTU was 1460. Launched docker with --mtu 1500. Started tcpdump inside the container, and started a large stream to a remote host.
The tcpdump output shows PMTUD working exactly as it's supposed to:

02:24:39.387865 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [S], seq 3106098261, win 29200, options [mss 1460,sackOK,TS val 909093 ecr 0,nop,wscale 7], length 0
02:24:39.429699 IP 66.229.123.231.3000 > 172.17.0.2.48770: Flags [S.], seq 3695655258, ack 3106098262, win 5792, options [mss 1460,sackOK,TS val 3753644923 ecr 909093,nop,wscale 2], length 0
02:24:39.429725 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [.], ack 1, win 229, options [nop,nop,TS val 909103 ecr 3753644923], length 0
02:24:39.429854 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [.], seq 1:1449, ack 1, win 229, options [nop,nop,TS val 909103 ecr 3753644923], length 1448
02:24:39.429868 IP 172.17.0.1 > 172.17.0.2: ICMP 66.229.123.231 unreachable - need to frag (mtu 1460), length 556
02:24:39.429877 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [.], seq 1449:2897, ack 1, win 229, options [nop,nop,TS val 909103 ecr 3753644923], length 1448
02:24:39.429881 IP 172.17.0.1 > 172.17.0.2: ICMP 66.229.123.231 unreachable - need to frag (mtu 1460), length 556
02:24:39.429886 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [P.], seq 2897:4345, ack 1, win 229, options [nop,nop,TS val 909103 ecr 3753644923], length 1448
02:24:39.469948 IP 66.229.123.231.3000 > 172.17.0.2.48770: Flags [.], ack 1409, win 2152, options [nop,nop,TS val 3753644927 ecr 909103], length 0
02:24:39.469962 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [.], seq 5793:7201, ack 1, win 229, options [nop,nop,TS val 909113 ecr 3753644927], length 1408
02:24:39.469964 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [.], seq 7201:7241, ack 1, win 229, options [nop,nop,TS val 909113 ecr 3753644927], length 40
02:24:39.469965 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [.], seq 7241:8649, ack 1, win 229, options [nop,nop,TS val 909113 ecr 3753644927], length 1408
02:24:39.474975 IP 66.229.123.231.3000 > 172.17.0.2.48770: Flags [.], ack 1449, win 2152, options [nop,nop,TS val 3753644927 ecr 909103], length 0
02:24:39.474995 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [.], seq 8649:10057, ack 1, win 229, options [nop,nop,TS val 909115 ecr 3753644927], length 1408
02:24:39.474978 IP 66.229.123.231.3000 > 172.17.0.2.48770: Flags [.], ack 2857, win 2856, options [nop,nop,TS val 3753644927 ecr 909103], length 0
02:24:39.474999 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [P.], seq 10057:11465, ack 1, win 229, options [nop,nop,TS val 909115 ecr 3753644927], length 1408
02:24:39.475001 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [.], seq 11465:12873, ack 1, win 229, options [nop,nop,TS val 909115 ecr 3753644927], length 1408
02:24:39.475003 IP 172.17.0.2.48770 > 66.229.123.231.3000: Flags [.], seq 12873:14281, ack 1, win 229, options [nop,nop,TS val 909115 ecr 3753644927], length 1408
02:24:39.474982 IP 66.229.123.231.3000 > 172.17.0.2.48770: Flags [.], ack 5753, win 4264, options [nop,nop,TS val 3753644928 ecr 909103], length 0
02:24:39.509035 IP 66.229.123.231.3000 > 172.17.0.2.48770: Flags [.], ack 7201, win 4968, options [nop,nop,TS val 3753644931 ecr 909113], length 0

Notice the ICMP need to frag messages indicating an MTU of 1460. Notice that they're coming from the host itself (172.17.0.1), not from a hop outside the host, since the packet can't even leave the box as the default route's MTU is too small.

@aboch
Copy link
Contributor

aboch commented Nov 26, 2015

Thanks @phemmer for the test.
LGTM

@tiborvass
Copy link
Contributor

thanks @phemmer ! Needs a rebase

@phemmer
Copy link
Contributor Author

phemmer commented Dec 7, 2015

rebased

@tiborvass
Copy link
Contributor

LGTM

@thaJeztah
Copy link
Member

Thanks @phemmer! I searched the docs, and it looks like https://github.com/docker/docker/blob/43077f9b6406e3d5e401a361b4c9742c00be528b/docs/userguide/networking/default_network/custom-docker0.md mentions setting the default based on the hosts interface, so that may need some changes.

I don't think other sections of the documentation mention the default value currently.

@thaJeztah thaJeztah added this to the 1.10 milestone Dec 7, 2015
Signed-off-by: Patrick Hemmer <patrick.hemmer@gmail.com>
@phemmer
Copy link
Contributor Author

phemmer commented Dec 7, 2015

Documentation adjusted.

@thaJeztah
Copy link
Member

Thanks @phemmer!

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docker does not start when there is more than one default route
8 participants