New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker 1.8 corrupts /etc/hosts files in containers #17190
Comments
Yeah, modification of this file is rather racy. I can corrupt the file very easily with the following:
|
The basic problem is that Docker is using https://github.com/docker/docker/blob/master/vendor/src/github.com/docker/libnetwork/etchosts/etchosts.go#L66 which uses truncate and write: Truncate and write is inherently non-atomic and racy. The only safe-ish thing to do here is to use file rename instead of truncate and write, this still isn't 100% atomic, but it is dramatically safer (eg. http://stackoverflow.com/questions/7054844/is-rename-atomic for details) Honestly, the only truly safe thing to do here is to use an actual DNS server like we do in Kubernetes, instead of mucking with |
rename should be atomic, BUT it doesn't work across a bind mount. On Mon, Oct 19, 2015 at 10:38 PM, Brendan Burns notifications@github.com
|
I agree that DNS would be cool. Messing with hosts also slow (in addition to races) if you have a lot of containers. |
All that aside - docker 1.8 seems to be totally broken in this regard. Can On Mon, Oct 19, 2015 at 10:43 PM, Alexander Morozov <
|
yes please, second the request for a flag so that we can disable /etc/hosts writing. |
I think it's more than that. |
You are not the first person to ask: moby/libnetwork#519 In weave we introduced a hack that prevents docker from updating |
@thockin The first order of business is to fix the racy behavior of writing to /etc/hosts, which btw has existed since Disabling /etc/hosts update is a UI discussion and has implications on the |
@mrjana 2 thoughts.
Even hardlinks don't work.
|
For 1) We are going to experiment with acquiring a write lease(using For 2) even with links it wasn't just during startup. everytime the child container restarted the parent container's hosts file gets live updated with the child container's new IP Anyway, I do agree we need to mitigate this race quickly. |
I did not know about SETLEASE - I learned something. I also tried it, and
Same behavior with hard links. My guess is that Linux refuses to promise a On Tue, Oct 20, 2015 at 8:50 AM, Jana Radhakrishnan <
|
http://lxr.free-electrons.com/source/fs/locks.c#L1576
On Tue, Oct 20, 2015 at 9:19 AM, Tim Hockin thockin@google.com wrote:
|
Options that I see:
Option 3 seems like the least risky, most likely to succeed plan. We can help produce the patch, if needed, though it will be faster if you guys do it (knowing where to look in the code is 75% of the problem). What do you guys think? |
@thockin I suspect 1 and 2 would break weave's hack that prevents Docker from updating Option 3, please. |
Option 3 is our preferred plan, too, and that's the patch we'll carry if we have to. I know 1.9.x is at RC, but this bug nullifies everything we have done to qual 1.8.x so we're begging for a 1.8.4 patch. |
I'm personally willing to develop the patch for 1.8.4, I've dug through the code a bit, and I'm pretty sure I know where to insert the right flag points. We're likely to develop this patch one way or the other, and we'll just carry a patched 1.8.4 in Kubernetes if we have to, but we would obviously very much rather not. |
@thockin Getting back to fixing the racy /etc/hosts update here's something that might be the best possible solution:
The point is to employ mandatory locking and confirmed with the following that it works across bind mounts:
Works across bind mounts(I acquire the lock in But there is a catch. In some rare cases when the write system call is well past the check for lock and we try to acquire the lock, the lock will still be sucessfull. So it's not 100% race free but very close. |
I just tried this. It's clever. PROBLEM: The host FS needs to be mounted Maybe you can mount all of /var/lib/docker/containers as a tmpfs with -o That tiny bit of race is still pretty messy, but obviously better. I still On Tue, Oct 20, 2015 at 4:45 PM, Jana Radhakrishnan <
|
I have to characterize the host FS mount because in my test I never remounted my rootfs to But we don't need |
At least on my system, you can not remount a directory with It's disappointing that this is still racy - if I already have the file I still think the only complete answer is to nsenter the mount namespace of More importantly: I think we're focusing on the race, but I am not On Tue, Oct 20, 2015 at 5:48 PM, Jana Radhakrishnan <
|
Please, please, please do not do this. At least not without also providing a way to disable the entire |
Yes, please provide the ability to disable |
+1 |
@rade why is that worse than what's happening today (or the proposed 90% fix to the race)? At least the atomic replacement will not corrupt the file. |
it is worse because it breaks the existing hacks for disabling |
@thockin I hope we all understand release-cycles and issue management :) My responses to @mfischer-zd is to the technical aspect of issue management and the fact that this issue was filed on 1.8.3 and is applicable to the default bridge network. And hence technically #17325 resolved this. Wont you agree to that ? With a healthy upstream/downstream relationship with Kubernetes and Docker, I hope this time around, you would be able to try out RCs early in the release cycle and not wait for x.x.3 release to give feedback. |
Ultimately I leave the decision to you, @mavenugo - my only concern is that /etc/hosts management is in a bit of a quagmire at the moment (though it looks like you're making good progress on fixing them) and there is a large number of issues to sift through to figure out just which issue affects a particular user. I don't mean to suggest that this is the appropriate issue to track those with, though. What I was trying to say is:
|
Until there's an open issue specifically about not corrupting reads of #17278 sounds like it fixes the biggest problem, though my casual reading As for zinging us on filing bugs late - I said it before, this bug took is
|
@thockin As I said before, the racy behavior existed since Also @mfischer-zd now that the default bridge network does not do automatic service discovery please retest to see if issue number 3 (about /etc/hosts not getting updated if a linked container is restarted) is still seen. If seen it would be helpful to file a separate issue for that because it is definitely not the same as racy behavior issue. |
A shared net namespace ( |
@thockin As I said before, the racy behavior existed since --link existed so let's not think that this racy behavior was introduced in 1.9. I agree it's not fixed in 1.9 but it's not fixed in 1.8, 1.7, 1.6 or 1.5 and so on as well. It just became more common in 1.8.3 because docker itself was contributing for the race and was reported by you via this issue and we fixed it. Since this issue was reported for that specific problem it is appropriate to close the issue. I am perfectly fine to track the racy behavior issue as a separate issue. But let's not mix a lot of things and confuse users.
All I am saying is there isn't a bug that I know of that tracks the
fact that in-place editing /etc/hosts is going to cause user problems.
And let's not be disingenuous - yes it existed forever, but with 1.9
you're encouraging people to create Network objects and as soon as
they do, Docker is back to "contributing for the race".
Truth be told, it doesn't matter for us - we have a workaround. I am
just advocating that we make sure this is an understood shortcoming of
networks in 1.9
|
Maybe I misunderstood then. I understood that the corruption was parallel Thanks, sorry for any FUD - honest misunderstanding. On Sun, Oct 25, 2015 at 10:07 PM, Jana Radhakrishnan <
|
I am not sure which race you are talking about. But if you are talking about Docker contributing to the race (oh I just corrected my grammatical mistake that I made when I typed by message in a hurry and I am so glad how you pointed out my mistake so subtly by repeating the same mistake in quotes. That was very kind of you) by even writing to /etc/hosts file then you don't have to create Network objects and you don't need 1.9. You just need to use links in any previous release it is supported. I don't know where I am being disingenuous here. If you are talking about Docker contributing to the race by having multiple threads of its own trying to write to the same file, then that is fixed whether or not you use 1.9's new Network objects. |
Yep, just single start running in parallel with a lot of stops. |
On Sun, Oct 25, 2015 at 10:24 PM, Jana Radhakrishnan
Got it, thanks. Sorry for the confusion. |
On Sun, Oct 25, 2015 at 10:23 PM, Jana Radhakrishnan
Sorry, wasn't meant to be a jab, I just thought it was more context.
One of the big features of 1.9 (as I understand the plan) is networks. Yes, "don't use networks" is a valid workaround, but it's not one I The disingenuity is in closing this bug without filing another issue |
Oh, we don't mean to hide it under the carpet. We will properly document it in the Caveats section of 1.9 release(this is all already being planned for and I am not making this up on the fly) and we will file a proper issue describing the problem in greater detail so everybody is aware. This bug was closed by automatic github workflow because we marked this issue as "Fixes" in #17325. Also it is not going to be an effective way to use this issue to track and educate users with the racy behavior issue. It may confuse the users than providing them clarity. That is why we need a separate issue. You can rest assured that we are not dropping the ball on this. |
Sounds fair to me. Thanks. On Sun, Oct 25, 2015 at 10:49 PM, Jana Radhakrishnan <
|
Hang on, am I to understand that you're removing a feature people are relying on (in my case, relying on it extensively) between rc2 and the official release? I can see that having this blindly enabled all the time is problematic and I'm definitely in favour of having opt-out, but it looks like there's a workaround upthread and I've been relying on this behaviour for two months now, since 1.8.0 came out. My impression from #16619 (comment) was that this would just be a flag to disable the behaviour (even if it's on the daemon level). Make no mistake, this is a breaking change and will cause significant misery for a number of users - the feature is an incredible convenience for a number of reasons, and in some cases it enables use-cases that would otherwise be impossible (for example, exposing an SSH agent inside docker build). |
@aidanhs I've been looking at the documentation and release notes for Docker and I can't seem to find any mention of the (now reverted) behavior in question being documented anywhere. Bugs and mistakes happen; relying on undocumented behavior is going to cause problems for you. As far as I can tell, the only documented way to get Docker to automatically allow containers to address each other is to specify the |
@mfischer-zd I'd see this as equivalent to a new behaviour being added to the VOLUME command in dockerfiles such that if you do That functionality would be fairly nasty, but if it had lasted 2.5 months, spanning 3 official releases and two release candidates, I'd understand if the people who did like that functionality at least wanted it available as an argument to There are 4 different ways the functionality could be made available (excluding reverting the revert) listed at #17325 (comment), but none are available right now. I'd understand if this was a minor convenience or clearly a hack, but saying 'tough luck' about the removal a feature which feels intentional and seems to align with "batteries included" is fairly uncharitable. |
related: #16619
@mavenugo - We have a problem.
We've been experiencing periodic end-to-end test failures where some containers can not resolve "localhost". Killing the container and letting it restart usually fixes the problem. Today we got it into this state and pinned down the following:
The container in question has a /etc/hosts file that is several hundred lines long. That /etc/hosts file does NOT have localhost or the container's own IP. If I add those 2 lines the tests pass.
Of course I thought of #16619 . So we looked at /etc/hosts of every container on this machine - 84 in total. Those files range in size from 175 lines (correct: 7 boilerplate + 2 per container) to 1700+ lines. Looking at the offending files, there are a lot of duplicate IPs. It looks like old containers have not been removed, but localhost, "self", and the ipv6 boilerplate has been removed.
Even looking at my local test machine (far less churn) I see the same. I have 4 containers with network namespace running, I should have 8+7=15 lines, right?
Diffing a 15 and 19 line file shows:
Is there any reason the hosts files would be different for different containers on the same node?
I know docker is live-editing the hosts file (which is absolutely unsafe, but should only be unsafe for the users in the container). Any clues on what's going on?
@ArtfulCoder
The text was updated successfully, but these errors were encountered: