Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: System-level containers #18724

Closed
cpuguy83 opened this issue Dec 16, 2015 · 20 comments
Closed

Proposal: System-level containers #18724

cpuguy83 opened this issue Dec 16, 2015 · 20 comments
Labels
kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.

Comments

@cpuguy83
Copy link
Member

Problem

A lot of people want to run system-level services, and even docker plugins in containers.
This means these services need to be started before any other container.

An example of this is RancherOS, which currently runs 2 daemons, one for system services and one for user services.... this is to ensure startup order and protection of "system" containers from accidental removal.

There are also cases like the swarm-agent, where when the agent is run in a container, swarm will hide that from the default ps output.

Proposed solution

In the past we've talked about having a plugin loading API, which would essentially be fancy containers... this may be a bit drastic at this point.
What I propose is adding a new bool flag --system (or HostConfig.System in the API) which does exactly 2 things:

  1. Makes sure the container will startup before non-system containers
  2. Hides the container from docker ps output, though will be visible in docker ps -a. Along with this there can also be a filter to show only system containers.

It is important to limit the scope to just these two things and nothing fancy or magical.
To start a system service you'd use the same exact UX/API as a normal container, if you want a volume, then set a volume, if you want API access mount the socket, if you need privileged use --privileged
A containerized volume plugin might use this like so:
docker run -d --system -v /var/myPlugin:/var/data:shared cpuguy83/myAwesomeVolumePlugin
Again, nothing fancy.

_Note that the name of the flag is the best I can come up with, not married to it. Also does not neccessarily have to be a boolean, but I think this helps to limit the scope_

@cpuguy83 cpuguy83 added the kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. label Dec 16, 2015
@cpuguy83
Copy link
Member Author

ping @ibuildthecloud

@duglin
Copy link
Contributor

duglin commented Dec 16, 2015

Does this need to apply to other things too - like volumes, networks, etc? I would think a "system" volume might want to be hidden from normal users in the same way.

Its too bad that we don't have the notion of users as this is starting to head down the path of scoping the visibility of resources based on something and something like a user's ACL would be the most natural thing to use here.

If all we want to solve is this one problem then I like this solution, its nice and easy, but I think it might be worth exploring what people will naturally want to do next and make sure this one boolean flag is really enough, otherwise it might be deprecated pretty quickly.

@LK4D4
Copy link
Contributor

LK4D4 commented Dec 16, 2015

Yes, a problem exists. But your flag trying to solve two problems at once: order and accidental removal. Fixing them separately or, at least, having the flexibility to turn one without other would be a great improvement too.

@cpuguy83
Copy link
Member Author

I could see modifying this to not deal with automatic filtering. As I was typing it out I was thinking it may be too much.

@abronan
Copy link
Contributor

abronan commented Dec 17, 2015

I tend to agree with duglin on this one, I think user ACL is the most natural way to deal with this. For example in Swarm, there would be an admin/maintainance account to deal with the Swarm containers and deploy more nodes/launch more agents and managers. This would ensure that other regular users cannot even list with docker ps -a and/or delete those containers by any mean.

This does not deal with the issue of starting containers in a specific order but with this basic construct then we can go on and have for example a system user for system containers that would have the priority at daemon startup.

The flag does solve the simple case of a set of containers to start before regular containers but what if I want to have a dependency graph and also start system containers in a specific order? ACL could solve that by giving priorities at daemon startup to users/system services (even though this solution is far from being ideal from a usability perspective, just the first thing that popped in my mind 😄)

@bboreham
Copy link
Contributor

What should be the result if a system container cannot start at all, or repeatedly starts and fails?

(I have discussed this at moby/libnetwork#813)

If the answer is "Docker should continue to attempt starting the system container and not let you do anything else before it succeeds", then there needs to be some way of defeating this, since you need to use Docker to change the configuration.

@cpuguy83
Copy link
Member Author

@bboreham In that case for a first step I'd just log the error as we do today and keep going.

@bboreham
Copy link
Contributor

@cpuguy83 This is not what happens for network plugins: the operation is retried several times, often enough to time-out the startup and leave you with nothing.

@cpuguy83
Copy link
Member Author

@bboreham That is how plugins work, the above proposal doesn't change that.
Container startup != plugin communication, and something that is --system would not necessarily be a plugin.

@bboreham
Copy link
Contributor

@cpuguy83 ok, thanks for clarifying.

@bboreham
Copy link
Contributor

Makes sure the container will startup before non-system containers

Can I just clarify some more: your proposal is to change the order in which the fork/exec operations are done, or to do something extra to establish that the system container has actually started, perhaps even to know that it is "ready"?

@cpuguy83
Copy link
Member Author

@bborehem neither... It just attempts to start them before normal containers.

@bboreham
Copy link
Contributor

I think your "attempts to start" is what I meant by "fork/exec"; the important part is that it doesn't give you any guarantees about the order in which those processes will then be scheduled by the kernel, and particularly no guarantee that the system containers will be ready before the non-system ones need them.

Right?

@cpuguy83
Copy link
Member Author

Docker provides the odering.
In terms of "readiness", that's up to the services which consume them, not Docker.

@jainvipin
Copy link

+1

  • docker rm behavior: accidental removal of these containers can be damaging. Say, I want to remove all app-containers, for which I am used to doing docker rm -f $(docker ps -aq) - it might also remove all non system/infra containers. Perhaps the long-term answer would be to have ACLs as suggested by @duglin. It would be best to not show these containers in docker ps output all together unless explicitly asked for.
  • starting a infra container later: it is best if order behavior is decoupled from whether it shows up in docker ps, etc. because I can think of starting a system/infra service afterwards (like how I can do it with systemd)

@cpuguy83
Copy link
Member Author

cpuguy83 commented Jan 6, 2016

So, in summary -- what we really need is a way to specify that a container should start before any other containers.
What this also implies is that these "system" containers cannot be linked to non-"system" containers.

@duglin
Copy link
Contributor

duglin commented Jan 6, 2016

One thing to keep in mind is that the entire notion of ordering is one that we should probably discourage people from even worrying about. At any time any component can do down and users of that component need to be prepared to deal with it - whether its pausing, retrying, or something else. So, to me dealing with a dependent component going down/restarting isn't much different than me dealing with it being started after me. Providing a way to specify ordering could be just giving people a false sense of security.

@bboreham
Copy link
Contributor

bboreham commented Jan 6, 2016

I tried to get this across earlier but I think I failed:

The order in which you start some processes gives no guarantee over the order in which they will subsequently get CPU time in order to achieve anything. That is up to the OS scheduler, and it may have other things to think about.

It seems to me unwise to add a feature which usually does what you want but sometimes doesn't.

@cpuguy83
Copy link
Member Author

cpuguy83 commented Jan 6, 2016

I agree, and in cases where these are docker plugins that plugin loading is handled correctly.. ie. containers that rely on a plugin resource are not delayed anyway until the plugin is available.

I'm going to close this as I think we can handle this in other ways.

@cpuguy83 cpuguy83 closed this as completed Jan 6, 2016
@bboreham
Copy link
Contributor

As mentioned at moby/libnetwork#882, there is a mirror-image problem at shutdown - you shouldn't shut down a container that is implementing some "system" feature before shutting down a container using that feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.
Projects
None yet
Development

No branches or pull requests

6 participants