New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade daemon without restarting containers #2658
Comments
Some notes about this. There are two ways to handle upgrades:
Then there are multiple areas to consider:
Each area deserves a whole discussion on its own. Let me know where is the best place for that :-) |
If we would "handle upgrades and crashes the same way" (as I would prefer), the focus should be on crash recovery rather than planned upgrade. Concerns about crash recovery were the biggest concerns we had when thinking about introducing docker. Logging was a very important topic for us and this is definitely something we can improve on. |
I'd like to tackle this, but need a bit more context. My guess is that we could do something like this with container launches:
This should allow containers to run standalone. If we do not blow away the iptables rules on docker -d shutdown, pids are gathered and tracked on startup, and the linksv2 stuff is integrated to replace the userspace proxy, I think this will work. |
Isn't this essentially #2733? They're related, at least. |
Not exactly. This is replacing -Erik
|
Ah, cheers - thanks @erikh |
+1. If Docker is killed (e.g., via SIGTERM) or dies and restarts for ~any reason, it should not disrupt running containers. |
+1, this is tightly related to #6851 |
👍 for this, starting heavy services that need to check terabytes of data on startup is pita with current behavior. I've seen 30% of mesos cluster going down because docker got nil pointer dereference caused by very slow or hanged layer download. |
Now that 1.5 is out .. can someone please hack on this so the world can update peacefully :) |
This is the one thing currently that is preventing me from moving everything over 👍 |
Any update on this? Restarting all containers on a huge machine because |
Hi All, I have put out a proposal for docker's hot upgrade and SPoF issue. Please take a look. Thanks |
@thaJeztah Can you explain at a high-level how this will work when it's in place? |
@Dvorak basically, the You can already test the basics, by installing an experimental version (https://experimental.docker.com), and running the daemon manually (we're investigating an issue if it's started through systemd #21933). Basically;
Then, start a container, and After that, starting the daemon again ( So basically that functionality allows stopping the daemon, upgrading it, and starting it again after upgrading |
Just issued a docker-machine upgrade on my Mac without shutting down docker first - that destroyed all of the containers, which only represented about twelve hours of work. Is that supposed to happen? To the OPs point - not very graceful behavior. |
That's not really relevant to this issue. I would suggest reporting that here: https://github.com/docker/machine/issues |
Sorry! Thought this was the closest hit… Appreciate reply.
|
Added this to the 1.12 milestone and assigned to myself. What we currently have left to do is:
|
/var/run/docker.sock mount in containers refuse connection after daemon hot upgrade #22789 |
Hi, I took the experimental release today to test this out today. I found a few things that I thought would be worth flagging (just to note that my containers were initially started with docker-compose):
|
Hi, sorry to drag up an old issue, but did this make it into the 1.12.0 release or was it pushed back? We recently upgraded, but restarting dockerd still stops running containers. If I understood correctly, killing dockerd shouldn't kill running containers anymore right? |
@aluminous yes, it was implemented in #23213, which is in 1.12.0, but it's not enabled by default. Documentation was added through #24970, and can be found here; https://docs.docker.com/engine/admin/live-restore/ |
Apologies, I missed that section of the documentation. Thank you! |
No worries, I noticed it wasn't referred to from this issue, so was good to add some links here for reference anyway 😄 |
On a percentage of DC/OS agents (~5%) with DOcker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
Docker needs a way to upgrade itself graciously, with minimum interruption of service. Ideally all containers would continue to function with zero downtime and zero behavior change. This might not always be possible, for example if the upgrade introduces significant changes to the container's runtime environment itself. In that case, docker should give the sysadmin maximum flexibility - in a perfect world the upgrade could be rolled out separately for each container.
The text was updated successfully, but these errors were encountered: