An introduction to Clear Containers
Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today! |
Containers are hot. Everyone loves them. Developers love the ease of creating a "bundle" of something that users can consume; DevOps and information-technology departments love the ease of management and deployment. To a large degree, containers entered the spotlight when Docker changed the application-development industry on the server side in a way that resembles how the iPhone changed the client application landscape.
The word "container" is not just used for applications, though; it is also used to describe a technology that can run a piece of software in an isolated way. Such containers are about using control groups to manage resources and kernel namespaces to limit the visibility and reach of your container app. For the typical LWN reader, this is likely what one thinks about when encountering the word "container."
Many people who advocate for containers start by saying that virtual machines are expensive and slow to start, and that containers provide a more efficient alternative. The usual counterpoint is about how secure kernel containers really are against adversarial users with an arsenal of exploits in their pockets. Reasonable people can argue for hours on this topic, but the reality is that quite a few potential users of containers see this as a showstopper. There are many efforts underway to improve the security of containers and namespaces in both open-source projects and startup companies.
We (the Intel Clear Containers group) are taking a little bit of a different tack on the security of containers by going back to the basic question: how expensive is virtual-machine technology, really? Performance in this regard is primarily measured using two metrics: startup time and memory overhead. The first is about how quickly your data center can respond to an incoming request (say a user logs into your email system); the second is about how many containers you can pack on a single server.
We set out to build a system (which we call "Clear Containers") where one can use the isolation of virtual-machine technology along with the deployment benefits of containers. As part of this, we let go of the "machine" notion traditionally associated with virtual machines; we're not going to pretend to be a standard PC that is compatible with just about any OS on the planet.
To provide a preview of the results: we can launch such a secured container that uses virtualization technology in under 150 milliseconds, and the per-container memory overhead is roughly 18 to 20MB (this means you can run over 3500 of these on a server with 128GB of RAM). While this is not quite as fast as the fastest Docker startup using kernel namespaces, for many applications this is likely going to be good enough. And we aren't finished optimizing yet.
So how did we do this?
Hypervisor
With KVM as the hypervisor of choice, we looked at the QEMU layer. QEMU is great for running Windows or legacy Linux guests, but that flexibility comes at a hefty price. Not only does all of the emulation consume memory, it also requires some form of low-level firmware in the guest as well. All of this adds quite a bit to virtual-machine startup times (500 to 700 milliseconds is not unusual).
However, we have the kvmtool mini-hypervisor at our disposal (LWN has covered kvmtool in the past). With kvmtool, we no longer need a BIOS or UEFI; instead we can jump directly into the Linux kernel. Kvmtool is not cost-free, of course; starting kvmtool and creating the CPU contexts takes approximately 30 milliseconds. We have enhanced kvmtool to support execute-in-place on the kernel to avoid having to decompress the kernel image; we just mmap() the vmlinux file and jump into it, saving both memory and time.
Kernel
A Linux kernel boots pretty fast. On a real machine, most of the boot time in the kernel is spent initializing some piece of hardware. However, in a virtual machine, none of these hardware delays are there—it's all fake, after all—and, in practice, one uses only the virtio class of devices that are pretty much free to set up. We had to optimize away a few early-boot CPU initialization delays; but otherwise, booting a kernel in a virtual-machine context takes about 32 milliseconds, with a lot of room left for optimization.
We also had to fix several bugs in the kernel. Some fixes are upstream already and others will go upstream in the coming weeks.
User space
In 2008 we talked about the 5-second boot at the Plumbers Conference, and, since then, many things have changed—with systemd being at the top of the list. Systemd makes it trivial to create a user space environment that boots quickly. I would love to write a long essay here about how we had to optimize user space, but the reality is—with some minor tweaks and just putting the OS together properly—user space boots pretty quickly (less than 75 milliseconds) already. (When recording bootcharts at high resolution sampling, it's a little more, but that's all measurement overhead.)
Memory consumption
A key feature to help with memory consumption is DAX, which the 4.0 kernel now supports in the ext4 filesystem. If your storage is visible as regular memory to the host CPU, DAX enables the system to do execute-in-place of files stored there. In other words, when using DAX, you bypass the page cache and virtual-memory subsystem completely. For applications that use mmap(), this means a true zero-copy approach, and for code that uses the read() system call (or equivalent) you will have only one copy of the data. DAX was originally designed for fast flash-like storage that shows up as memory to the CPU; but in a virtual-machine environment, this type of storage is easy to emulate. All we need to do on the host is map the disk image file into the guest's physical memory, and use a small device driver in the guest kernel that exposes this memory region to the kernel as a DAX-ready block device.
What this DAX solution provides is a zero-copy, no-memory-cost solution for getting all operating-system code and data into the guest's user space. In addition, when the MAP_PRIVATE flag is used in the hypervisor, the storage becomes copy-on-write for free; writes in the guest to the filesystem are not persistent, so they will go away when the guest container terminates. This MAP_PRIVATE solution makes it trivial to share the same disk image between all the containers, and also means that even if one container is compromised and mucks with the operating-system image, these changes do not persist in future containers.
A second key feature to reduce memory cost is kernel same-page merging (KSM) on the host. KSM is a way to deduplicate memory within and between processes and KVM guests.
Finally, we optimized our core user space for minimal memory consumption. This mostly consists of calling the glibc malloc_trim() function at the end of the initialization of resident daemons, causing them to give back to the kernel any malloc() buffers that glibc held onto. Glibc by default implements a type of hysteresis where it holds on to some amount of freed memory as an optimization in case memory is needed again soon.
Next steps
We have this working as a proof of concept with rkt (implementing the
appc spec that LWN wrote about
recently). Once this work is a bit more mature,
we will investigate adding support into Docker as well.
More information on how to get started and get code can be found at
clearlinux.org,
which we will update as
we make progress with our integration and optimization efforts.
Index entries for this article | |
---|---|
GuestArticles | van de Ven, Arjan |
(Log in to post comments)
An introduction to Clear Containers
Posted May 18, 2015 16:26 UTC (Mon) by SEJeff (guest, #51588) [Link]
An introduction to Clear Containers
Posted May 19, 2015 14:46 UTC (Tue) by philipsbd (subscriber, #33789) [Link]
An introduction to Clear Containers
Posted May 18, 2015 16:29 UTC (Mon) by PaXTeam (guest, #24616) [Link]
An introduction to Clear Containers
Posted May 19, 2015 4:32 UTC (Tue) by krakensden (guest, #72039) [Link]
An introduction to Clear Containers
Posted May 19, 2015 19:41 UTC (Tue) by s0f4r (guest, #52284) [Link]
An introduction to Clear Containers
Posted May 21, 2015 0:57 UTC (Thu) by josh (subscriber, #17465) [Link]
Sadly it hasn't been merged, and Linus has effectively said it never will be. So it'll likely get split out into its own repository independent of the kernel source at some point.
An introduction to Clear Containers
Posted May 21, 2015 12:33 UTC (Thu) by justincormack (subscriber, #70439) [Link]
An introduction to Clear Containers
Posted Jun 2, 2016 9:52 UTC (Thu) by Sam_Smith (guest, #109091) [Link]
--
Sam_Smith
Web Developer and Aspiring Chef
Large file transfers
www.innorix.com/en/DS
It's not nearly the same thing..
Posted May 18, 2015 18:29 UTC (Mon) by dw (guest, #12017) [Link]
Speed matters
Posted May 18, 2015 19:08 UTC (Mon) by david.a.wheeler (guest, #72896) [Link]
At the RSA 2015 conference, IIRC a Docker rep stated that a large number of containers (the majority?) ran for less than 0.5 seconds. I can't confirm those numbers, but for the sake of argument let's accept them. A startup time of 2 seconds, for a task that takes 0.5 seconds, is a pretty significant overhead... so even a 2-second startup is going to be unacceptable to many use cases. Happily, the "clear containers" work seems to be producing much better results than that.
This "clear containers" work could be really important, since this could make hardware virtualization useful in situations where today OS-level containerization is far more practical. I've tweaked my paper Cloud Security: Virtualization, Containers, and Related Issues to include this.
It's not nearly the same thing..
Posted May 18, 2015 22:54 UTC (Mon) by sjj (subscriber, #2020) [Link]
More traditional VMs will be fine for more traditional workloads. Getting rid of emulated legacy hardware benefits those too.
Interesting times...
An introduction to Clear Containers
Posted May 18, 2015 21:47 UTC (Mon) by flussence (subscriber, #85566) [Link]
Where does that huge 18MB of overhead per instance come from though? The init stuff on my setup only adds up to about 1.5MB...
An introduction to Clear Containers
Posted May 18, 2015 22:13 UTC (Mon) by Jonno (subscriber, #49613) [Link]
Presumably from the additional kernel instance, and the emulation of the necessary virtio devices.
Remember that, despite the name, this is still a virtual machine, not a traditional container. In my qemu-kvm setup the guest-side kernel uses ~32 MiB, and qemu uses ~80 MiB for device emulation, so in that context 18 MiB is actually really tiny...
An introduction to Clear Containers
Posted May 18, 2015 23:30 UTC (Mon) by arjan (subscriber, #36785) [Link]
It's not the greatest number, and we're working on reducing it further.... but it's also not completely horrible.
An introduction to Clear Containers
Posted May 19, 2015 4:45 UTC (Tue) by balbir_singh (subscriber, #34142) [Link]
An introduction to Clear Containers
Posted May 19, 2015 13:28 UTC (Tue) by arjan (subscriber, #36785) [Link]
I have heard microkernel folks say two things
1) A microkernel is faster/more efficient
and
2) A microkernel is more secure (smaller surface)
I think 1) is pretty much debunked at this point with this article, but it's hard to argue with 2)... less functionality also leaves less to attack.
An introduction to Clear Containers
Posted May 19, 2015 16:03 UTC (Tue) by edomaur (subscriber, #14520) [Link]
An introduction to Clear Containers
Posted May 20, 2015 14:26 UTC (Wed) by edomaur (subscriber, #14520) [Link]
However, after having looked a bit at Clear Container, I would like to say that it's a bit like MirageOS in the resulting VM : something lightweight, with minimalistic OS.
MirageOS and rump kernels
Posted May 26, 2015 21:10 UTC (Tue) by mato (guest, #964) [Link]
I would also like to point out our work (disclaimer: I'm one of the core developers) on rump kernels[2] and the rumprun unikernel stack[3] which allows you to run existing, unmodified, POSIX applications as unikernels on KVM, Xen and bare metal.
I like to think of our (Mirage and rump kernels) approach as doing away with the traditional operating system altogether; it's the ultimate in minimalism. Only include the functionality required to get your application to run and nothing else.
This has several interesting advantages:
- We've all seen the various bugs found in the industry standard TLS stack. The Mirage folks have developed green-field type-safe implementations of the entire TCP, HTTP and TLS stack in OCaml. They've put up a bounty in the form of the BTC Piñata[4]. If you can break their stack, you get to keep the bitcoin.
- Containers (and Clear Containers) still include an entire operating system, accessible to the application running on it, and thus potentially exploitable. Compare that to running your application on rumprun, which has no concept of exec(). If there's no shell to exec() then there's nothing to break into.
- A combination of Mirage and rumprun paves the way to the best of both worlds. Run a Mirage frontend serving HTTP and TLS, and talk to a rumprun unikernel running (for example) your legacy PHP application.
[1] https://mirage.io/
[2] http://rumpkernel.org/
[3] http://repo.rumpkernel.org/rumprun
[4] http://ownme.ipredator.se/
An introduction to Clear Containers
Posted May 18, 2015 22:23 UTC (Mon) by olof (subscriber, #11729) [Link]
The presentation info:
http://lccona14.sched.org/event/f7d6705976087895610d86640...
Slides:
http://events.linuxfoundation.org/sites/events/files/slid...
An introduction to Clear Containers
Posted May 18, 2015 23:31 UTC (Mon) by arjan (subscriber, #36785) [Link]
note that kvmtool also supports plan9fs, and we use that for data access (as opposed to OS code). But using DAX saves a ton of memory due to the zero-copy nature that plan9 just can't touch.
An introduction to Clear Containers
Posted May 19, 2015 0:31 UTC (Tue) by luto (subscriber, #39314) [Link]
Virtfs
Posted Aug 28, 2015 10:43 UTC (Fri) by rektide (guest, #71530) [Link]
[1] https://www.linuxplumbersconf.org/2010/ocw/system/present...
Virtfs
Posted Aug 28, 2015 13:25 UTC (Fri) by Jonno (subscriber, #49613) [Link]
9pfs over virtio is zero-copy in the networking sense, not in the memory-management sense.
Eg. data goes directly from page-cache to virtio bus, and then directly from the virtio bus to the page-cache on the other side, without having to copy everything to and from some intermediary protocol package. There are still separate data copies in the host and guest page caches, and obviously all changes to one has to be synced to the other...
Virtfs
Posted Sep 1, 2015 8:02 UTC (Tue) by nix (subscriber, #2304) [Link]
An introduction to Clear Containers
Posted May 19, 2015 4:19 UTC (Tue) by krakensden (guest, #72039) [Link]
[1]: mostly, it can't find a root filesystem to make it happy, but I also haven't worked very hard at this
An introduction to Clear Containers
Posted May 19, 2015 13:29 UTC (Tue) by arjan (subscriber, #36785) [Link]
An introduction to Clear Containers
Posted May 21, 2015 11:57 UTC (Thu) by pbonzini (subscriber, #60935) [Link]
FWIW, QEMU is just bloated in the default configuration. With a custom firmware (here) and by disabling options at build time (./configure --disable-foo), I can boot Linux in 80 ms. The memory usage is still not great (44 MB), but then I have done absolutely zero effort to cut it down.
An introduction to Clear Containers
Posted May 21, 2015 12:51 UTC (Thu) by arjan (subscriber, #36785) [Link]
An introduction to Clear Containers
Posted May 21, 2015 13:18 UTC (Thu) by pbonzini (subscriber, #60935) [Link]
I got it with "time", modifying the firmware to exit QEMU 10 or so instructions before the jump to vmlinuz.
And it can do migration too, of course. :) What is the expected lifetime of these containers?
KSM?
Posted May 19, 2015 13:37 UTC (Tue) by fuhchee (guest, #40059) [Link]
KSM?
Posted May 19, 2015 14:20 UTC (Tue) by arjan (subscriber, #36785) [Link]
KSM isn't great in terms of performance (it's better to never copy than to copy and then later share), but the impact of it is mitigated hugely by the optimizations we've done... it has a lot less to do and it does that reasonably well.
An introduction to Clear Containers
Posted May 19, 2015 14:17 UTC (Tue) by cread (guest, #81529) [Link]
An introduction to Clear Containers
Posted May 19, 2015 14:31 UTC (Tue) by thiago (guest, #85680) [Link]
Works fine here:
depth=2 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Certification Authority
verify return:1
depth=1 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Domain Validation Secure Server CA
verify return:1
depth=0 OU = Domain Control Validated, OU = Issued through Intel Corporation E-PKI Manager, OU = COMODO SSL, CN = clearlinux.org
verify return:1
An introduction to Clear Containers
Posted May 19, 2015 16:38 UTC (Tue) by Lionel_Debroux (subscriber, #30014) [Link]
A stronger TLS cipher could be offered as first choice by the server, but at least, it's a PFS one, and the server speaks TLSv1.2.
An introduction to Clear Containers
Posted May 19, 2015 20:14 UTC (Tue) by thiago (guest, #85680) [Link]
An introduction to Clear Containers
Posted May 19, 2015 18:34 UTC (Tue) by ware (subscriber, #83607) [Link]
It's difficult to address the concern without details.
An introduction to Clear Containers
Posted May 19, 2015 20:03 UTC (Tue) by tpepper (guest, #31353) [Link]
https://www.ssllabs.com/ssltest/analyze.html?d=download.c...
There are two certification paths on the certs. Some browsers seem to be unhappy with one of the paths, so knowing more details on what was observed as out of the ordinary and how would be useful.
An introduction to Clear Containers
Posted May 19, 2015 23:14 UTC (Tue) by cesarb (subscriber, #6266) [Link]
Actually, there's only one certification path:
download.clearlinux.org
COMODO RSA Domain Validation Secure Server CA
COMODO RSA Certification Authority
AddTrust External CA Root
Recent browsers have "COMODO RSA Certification Authority" in their certificate store, so the path stops there. To be compatible with older browsers, however, the COMODO root is also signed by the older AddTrust root. So older browsers see the full path.
In that report, "COMODO RSA Certification Authority" shows as an "Extra download". That means it's not being sent by the server; the browser has to download it separately. But some older browsers (like old Firefox versions) do not know how to do that, and it's precisely these browsers who have a greater chance of needing it.
If that's what causing the issue, the solution is simple: instead of using just the "COMODO RSA Domain Validation Secure Server CA" as the intermediate certificate, use it together with the "COMODO RSA Certification Authority" intermediate certificate. The server administrator should have received a copy of both, it should be a simple matter of concatenating both (in the correct order; you're supposed to know what the correct order is) into a single intermediate certificate file.
An introduction to Clear Containers
Posted May 19, 2015 20:31 UTC (Tue) by philipsbd (subscriber, #33789) [Link]
This is really exciting for me as one of the developers of rkt. We built rkt with this concept of "stages" where the rkt stage1 here is being swapped out from the default which uses "Linux containers" and instead executing lkvm. In this case the Clear Containers team was able to swap out the stage1 with some fairly minimal code changes to rkt which are going upstream. Cool stuff!
An introduction to Clear Containers
Posted Sep 24, 2015 5:10 UTC (Thu) by philipsbd (subscriber, #33789) [Link]
An introduction to Clear Containers
Posted May 20, 2015 11:06 UTC (Wed) by pbonzini (subscriber, #60935) [Link]
What virtual hardware are you using for DAX support? Is there a spec somewhere?
An introduction to Clear Containers
Posted May 21, 2015 16:52 UTC (Thu) by NightMonkey (subscriber, #23051) [Link]
I think I understand why containers are so attractive to developers and to software publishers. Like Virtual Machines, they push aside worries and labor headache of having to have a monolithic software stack configured for multiple softwares, with competing, and sometimes conflicting, shared library requirements. But, the security aspect... With a Virtual Machine, as with a physical one, you can update the software to apply security updates and other bug fixes easily. But, how easy is that if you have dozens or more containers to track software versions on, and apply updates to these?
Or, are we just at the hand-waving stage with Containers, closing our eyes, holding our nose, and just diving in? Or, perhaps I'm thinking about it wrong, and the security "worries" on traditional kernel+libc userspace setups was more paranoia than reality? There just feels like there's a disconnect here, or a distraction from the same worries which haven't gone away with the advent of Containers. Something like "Hey, security and bugfixing is a pain, let's just make yet ANOTHER abstraction layer so no one can see how many holes we leave open."
Am I being nuts? Or has some magic happened to make these SysAdmin-ly worries go away?
Thanks.
An introduction to Clear Containers
Posted May 21, 2015 17:59 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
You do not 'apply updates' to containers. You recompile their templates with fixed versions of packages and then restart the affected container instances.
Docker strongly encourages statelessness, so it all feels natural.
> Am I being nuts? Or has some magic happened to make these SysAdmin-ly worries go away?
Yes and yes.
An introduction to Clear Containers
Posted May 21, 2015 19:27 UTC (Thu) by dlang (guest, #313) [Link]
In theory this is no different than the way VMs should be handled, you don't update them, you create new ones with the updated software.
In practice....
(Quote from somewhere)
In theory, theory and practice are the same, in practice they are not
An introduction to Clear Containers
Posted May 21, 2015 20:32 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
Docker containers were used completely differently from the start. For example, for a long time it had not been possible to run a shell inside an already running container.
An introduction to Clear Containers
Posted May 22, 2015 2:53 UTC (Fri) by lyda (guest, #7429) [Link]
An introduction to Clear Containers
Posted May 22, 2015 16:38 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]
An introduction to Clear Containers
Posted May 22, 2015 7:25 UTC (Fri) by niner (subscriber, #26151) [Link]
So we don't update containers, we re-create them with updated templates. But how _are_ these templates updated? Where do the security updates to the templates come from? How does an admin know that a template needs updating?
An introduction to Clear Containers
Posted May 22, 2015 16:36 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]
Using "docker build" command ( https://docs.docker.com/reference/builder/ ) or its equivalent.
> Where do the security updates to the templates come from?
The usual repositories and software installation channels.
> How does an admin know that a template needs updating?
Using the usual channels. For example, just like with real machines, an admin might periodically try to do 'apt-get update; apt-get upgrade' with only security updates repository turned on a test container.
An introduction to Clear Containers
Posted May 24, 2015 3:16 UTC (Sun) by misc (guest, #73730) [Link]
And of course, provided the containers do not requires schema change or any kind of upgrade to the DB or any data store ( storage that you also likely need to handle, potentially with containers too, if possible, in a shared cluster way, which open all kind of fun problems ). That's problems that can be solved, but that's not as easy as people seems to imply.
( there is a few others issues to solve, like logging of containers, proper isolation, and the inherent dependencies on the kernel host which make practice != theory ). Secret distribution is also a interesting one, so how do you give your wordpress containers access to the mysql db somewhere in a clean way. ( again, doable and not a insanely hard issue, but requires a bit more than just the vanilla docker and a workflow that is well defined )
An introduction to Clear Containers
Posted May 27, 2015 1:57 UTC (Wed) by ras (subscriber, #33059) [Link]
Sounds like the promised land. But it doesn't quite jar with reality. As a point of comparison, the "old way" of doing this was to install something like unattended-upgrades and let the system handle it itself. It's completely automated, with stuff all down time.
To do the same job in a container you say you rebuild it. But rebuilding is a far heaver operation - so much so that they provide tools to avoid it by persisting it as a .tar.gz. It can be done offline, but then how do you know when to do it? If you don't know you are up for rebuilding and restarting every container at least once day.
These kernel visualisation containers were born in Google. In Google I suspect none of this mattered because the software in the container was produced by them, and distributed in a container format. The rest of us run mostly software maintained by upstream distro's, distributed as packages that have to be individually installed and configured. Yes, Docker provides a bridge between the two worlds - producing container images from a distro's packages. But it damned primitive bridge. Doing of deboostrap and by a zillion apt-get install's every time you apply a security update just doesn't cut it.
We need the next step - something that marries the roles of distro and container. I suspect the next big move will have to be from the distro's. It will would allow (say) a Debian host to build a Debian container from Debian packages in a second or so, or alternatively allow the Deban host to maintain (eg, apply security patches) to all Debian containers under it's control it transparently.
An introduction to Clear Containers
Posted May 27, 2015 2:07 UTC (Wed) by dlang (guest, #313) [Link]
build from scratch instead of upgrading to create the new gold copy is a good idea because it means that you can't have 'quick hacks' work their way into your system that you don't find for years (until the next major upgrade when you do have to recreate them from scratch), but it is significantly more expensive to recreate the entire image from scratch than to just upgrade one piece of software.
I take the middle ground, I create a base image that has most of the system in it and just add the application specific parts when creating the per-function images.
if you only have one or two instances of each type of thing, and you are creating them completely from scratch, they it really does waste quite a bit of time (both CPU time and wall clock time)
An introduction to Clear Containers
Posted May 31, 2015 13:54 UTC (Sun) by kleptog (subscriber, #1183) [Link]
You're going to rebuild the entire image every time you make a release that needs to be deployed. You deploy all the latest OS updates at the same time so in practice it's no extra work.
Besides, we have buildbots that do nothing else than build Docker images on every commit. It takes a few minutes per image sure, but the result can be thrown into the test environment to see if it works and if it does you can use the same image to deploy to production.
I would love it if it were possible to create VMs as easily. I'm hoping someone will make a Docker-to-VM converter. Livebuild is good, but relatively slow.
An introduction to Clear Containers
Posted May 31, 2015 15:55 UTC (Sun) by raven667 (subscriber, #5198) [Link]
An introduction to Clear Containers
Posted May 31, 2015 22:57 UTC (Sun) by ras (subscriber, #33059) [Link]
Yes when you are deploying software you developed that makes perfect sense, and I'm guessing how it worked in the company that pioneered this technology - Google.
To me, a lone wolf, who must deploy a variety of stuff I didn't develop, it makes far less sense. I inherited a WordPress instance for example, and it's not the only one - I run many of these packages. If I tried to keep track of all the security vulnerabilities in them and all their dependencies and updated them manually, I'd have no time for anything else. The only thing that makes sense time wise for me is to reply on my distro so keep it patched. Which it does, and I'm guessing it does it more reliability than your updating your package at irregular intervals.
I suspect it's the little guys like me who are continually popping and asking "what good does this newfangled containerisation thing do for me". The answer is not much. In the short term the only real positive it brings is security. The mental model you need to reason about the isolation imposed by containers is far simpler than the alternatives.
The other observation I have is the way containerisation is done now is at odds with how the distro's work. Distro's like Debian are large collection's of little guys, each working on their own packages mostly in isolation. This is necessarily the case because we (I'm a DD) only have so many hours in a day. Thus if it were not possible to divide the large workload into thousands of tiny bite sized chunks, Debian wouldn't exist. Deploying the Debian "container" - ie a new release, is a huge amount of work which is why you see so few of them. Releasing a new one every time a new version of a package comes along (which is effectively what you are doing) is completely out scope.
So containers and distro's are like oil and water. They don't mix very well in most situations - yours being a notable exception. If they are going to mix something has to change. I can't see it being the containers - at the packaging level these isn't much too them. So it has to be the distro's. The first approach that strings to mind is the distro that is hosting the containers automagically keeps them patched. That requires both the host and container to be running the same distro - but I suspect that usually is the case. If that happened it would remove the major impediment to containerising everything for small guys like me.
An introduction to Clear Containers
Posted Jun 1, 2015 0:18 UTC (Mon) by dlang (guest, #313) [Link]
At Google they don't build "containers" and deploy them. They think in terms of "Jobs" that need X instances with Y cores and Z RAM. The fact that the implementation of this is in containers is not something that the normal Google developer or Site Reliability Engineer (the closes they have to sysadmins) ever think about. It's really far closer to the Mainframe job submission mentality than it is the 'traditional' server (even VM) model.
An introduction to Clear Containers
Posted Jun 1, 2015 1:02 UTC (Mon) by dlang (guest, #313) [Link]
Actually I see exactly the opposite. I think that the current mentality of people building containers where they install large chunks of a distro and run what's close to a full machines worth of software in each container is what's going to change.
Containers need to contain only the software actually required to run the software, and that is FAR less than just about anyone is putting in a container.
A decade+ ago I was working to simplify management of software using chroot sandboxes, setting them up so that they only contained files that were actually used by the software in question. (not just the packages listed as dependencies). The result is a much smaller footprint than any of the container definitions I've seen so far. Minimizing the container contents like this does wonders for your security requirements (you don't need to patch things that aren't there)
But containers need to evolve away from "install full packages and much of the OS)" and to something that is much more trailered for the job in question. Figuring out how to build such systems cleanly will help figure out how to build updated versions, but there is still going to be the question of how you update anything that contains enough state that you don't just replace it.
The idea of doing a CoW image as the base of many containers is trying to sidestep this bloat by spreading it's cost across many running containers (even if different ones use different subsets of the image), but it doesn't at all address the upgrade question. Saying that you layer filesystems so that you can replace lower levels in the stack only works until you need to change something higher up to work with a newer version of a lower layer.
An introduction to Clear Containers
Posted Jun 1, 2015 1:40 UTC (Mon) by ras (subscriber, #33059) [Link]
True. But it creates a different problem. Whereas before you had one installation to manage, now you have many. So while it is true each individual container contains less packages, for Debian every container will contain all the Debian essential packages. Or to put it another way, containerisation doesn't cause the total number of packages to drop. If you needed apache2, varnish, ntp and whatever else in the old setup, you will still need them in the containerised setup - albeit not installed in every container.
The net result result is while the total number of packages used doesn't change, but the number of deployments of them you have to manage (read: configure and ensure they are security patched) increases - in fact is multiplied by the number of containers you use in the worst case. On the up side I imagine the configuration of each container much is simpler, but on the down side you now have extra configuration to do - setting up virtual nics, allocating them IP's, mounting file systems inside the container, broadcasting the IP's so they can talk to each other. My guess is on the balance work involved in configuration isn't much different either way.
But this explosion in deployments is big deal if the sysadmin has to update and patch all of the containers, which is the case now. If the distro looked after it the work load reduces to what it was and it doesn't matter so much. And you get the security benefits for free.
In the long term this will be solved, and what I suspect is the real benefit containers have will make itself felt. Containers bring the principle of "private by default" modularisation to system building. The number of "running on the same system" assumptions will drop as a consequence, interdependencies will drop (despite the dbus's mobs valiant efforts to make everything talk to everything else), and things like huge apache2 config files managing 100's of sites will be a thing of the past. But that's a long way away.
An introduction to Clear Containers
Posted Jun 1, 2015 2:29 UTC (Mon) by dlang (guest, #313) [Link]
you are correct for how containers are being built right now.
I am saying that this needs to change
The fast majority of files in those "Debian essential" packages (and actually quite a few of the full packaged) are actually not going to be needed inside the container.
If you create a container, run it for a while (ideally exercising every feature in the software you installed), and then look at what files have an atime newer then when you started up the container, you would find that the vast majority of the files on the system were never accessed.
There is a lot of software that's needed for a 'minimal' system that's running completely self contained than is needed to run a piece of software inside a container that doesn't need to do lots of other things that you need to do on a full system (boot software, daemons, etc). If the software you are running is statically linked, you may not need anything beyond the one binary (in a 'best case' simplified example). Even a lot of the stuff that's typically done inside the container today could actually be done externally (job controls, monitoring, logging are pretty obvious wins), the question is at what point the value of splitting things out of the container is outweighed by the value of having everything bundled together inside the container.
Most of the container contents being created today are full distro installs (or pretty close to that), almost the exact same things that would be in a VM image or an image running on bare metal.
An introduction to Clear Containers
Posted Jun 1, 2015 7:39 UTC (Mon) by kleptog (subscriber, #1183) [Link]
There is the point made further up about how containers are less useful for deploying individual applications that you don't manage yourself like a single wordpress install. In our case we build two or three images but then deploy them a few hundred times with slightly different configurations. This changes the balance significantly and is vastly easier to manage than a few hundred VMs.
An introduction to Clear Containers
Posted Jun 1, 2015 10:08 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]
We tried that here as an experiment. Turns out that unless your application is almost statically linked pure-C app, you can't really remove that much. You still likely need glibc and all of its crap, libstdc++, OpenSSL, libz and so on.
About the only significant redundant piece is python-minimal that is needed for apt. Well and apt itself, of course.
In the end, we simply decided to use the official baseimages since several megabytes worth of dead disk space per container (no RAM overhead unless apt/python are actually used) are not worth maintaining our own images.
An introduction to Clear Containers
Posted Jun 1, 2015 17:16 UTC (Mon) by dlang (guest, #313) [Link]
An introduction to Clear Containers
Posted Jun 3, 2015 0:09 UTC (Wed) by ras (subscriber, #33059) [Link]
The 140Mb [0] that debootstrap installs maintains the debian distribution that lives inside of the container. The way things are done now it's a necessary part of the container. Docker files generally start with "take one minimal debian installation; apt-get install these packages ...". That can't happen without that 140Mb. If you get your containers to install their own security patches, that 140Mb is going to be needed for the life of the container. Even if you don't Debian's policy of not having explicit dependencies on "required" packages means it's very difficult to figure out what you can remove without writing your own software to follow the reverse dependencies (which I have done).
Part of the reason I say the distro's have to change is I agree this stuff shouldn't be in the container. If the distro's become container aware, the host can use it's copy of dpkg and so on to build and later maintain containers. If that happens you get the added benefit of security patches being applied automagically by the host as happens now in the non-container world, rather than having to do this manual rebuilding rubbish.
This is where my statement above, that the next step in move to containers is the distro's change, comes from. At the moment what we have is 1/2 baked.
[0] It's only recently realised that Debian minimal install is 140Mb. That's huge - and it's after I've pruned the cache's debootstrap creates. Damn Small Linux for example crams an entire distribution (kernel, GUI environment, 2(!) x Browser, a plethora of editors) into 120Mb.
An introduction to Clear Containers
Posted Jun 3, 2015 0:34 UTC (Wed) by dlang (guest, #313) [Link]
I think we differ mostly in that as far as you are concerned, fixing this is "the distros becoming container aware" while for me fixing it is "the container system becoming update/distro aware". The difference being which side is responsible for making the changes.
An introduction to Clear Containers
Posted Jun 3, 2015 2:56 UTC (Wed) by ras (subscriber, #33059) [Link]
That gets hard, because your container now has know a lot about the packaging system the distro uses. In Debian this means it would have run dpkg itself, which is possible because dpkg does take a --root parameter. But that means the container would have to handle dependency resolution. All of which is possible of course, and if we were only talking about Debian probably even easy for some definition of easy. [0] But we are talking about tracking every packaging system out there - including things like pypi.
They are not going to do that. Their success so far has been built on them avoiding doing it. Instead the user writes a script, the script uses some magic to build an image. The container's role starts in earnest after the image is built - they can deploy them across server farms, start them, stop them and even provide tools like etcd so they can configure themselves. It all works because the icky details of how to build and extend an image are all held inside the image itself. In that 140MB. That's why it's never going away without something changing.
If you are going to get rid of that 140MB there is one place I am pretty sure it isn't going to migrate to - and that is into the container software - eg docker. Debian providing tools that manipulate packages inside of a container, and the user running those tools from the existing docker script sounds like a much saner route to me. Of course this means the docker script would only work when run in a Debian host. Which is how we get to containers being tied to a particular distribution - while the container software (eg Docker) remains distribution agnostic. In principle the built containers could be distribution agnostic, but since Debian built it, it's not difficult for the Debian host to figure out what containers are effected by a security patch and notify the container software to do something about it. And thus you get to the containers being distribution specific too.
So we get back to my original point. All the changes that must happen to make this work are in Debian, or whatever distro is being used. The container software just continues to do what it does now. Thus my conclusion that the next step in the evolution in containerisation must come from the distro's - not the container software.
[0] The recent discussion on the debian development lists over how poorly aptitude does dependency resolution compared to apt provides a hint. "Easy" here means it could be done by someone - but even software written by Debian itself has trouble getting it right.
An introduction to Clear Containers
Posted May 21, 2015 19:14 UTC (Thu) by cesarb (subscriber, #6266) [Link]
It's a different way of thinking. The old way is the "big monolithic server", where each server is hand-installed, hand-maintained, and hand-updated, with an uptime in the decades range.
The new way of thinking is "every server is discardable". You don't update a server, you discard it and spin up a fresh one with all relevant updates already applied. Having a load spike because your server was mentioned on some popular site? Spin up a few more servers. After the storm passes, simply discard the excess servers. This is all made possible by lightweight virtual machines, or containers.
And you might have thousands of servers, but they are all clones. The number of different server types to manage is significantly smaller.
An introduction to Clear Containers
Posted May 25, 2015 9:30 UTC (Mon) by dgm (subscriber, #49227) [Link]
This is still how *some* stuff is going to be handled in the foreseeable future. The difference is less that old/new thinking dichotomy, but that now there's a new option, where previously you could only do things the old way. The old ways still offer advantages for some scenarios. One example is that server that has been working on the corner for years, just chugging along by itself, without the need for constant attention.
Other services are not susceptible of being containerized, disk for instance, but also routing or specialized hardware access.
All in all, containers seem like a great tool for flexibility, and sure they will replace "monolithic servers" where it makes sense. But not everywhere.
An introduction to Clear Containers
Posted May 22, 2015 3:14 UTC (Fri) by lyda (guest, #7429) [Link]
So yes, your sysadminly worries have been addressed.
The current Docker wrapper around the container primitives is OK. It's not great, but it's a start (and definitely better than lxc - yeech). It's still a bit thick as containers go, but it's far thinner than a VM.
There's less to manage. In the docker world you specify the container with a Dockerfile. Want to update a container? Rebuild it from the docker file and then restart it.
That's for a single container. Once you start getting more you can use a CI system to launch new versions to test and then deploy. Eventually you can move to a system like kubernettes or mesos to manage the containers.
You end up with far, far less to manage. And all of it provides apis and tools that make it far more scriptable so you can automate away loads of tasks. And yes, that does require sysadmins who can write code. But then I never quite understood why we started having sysadmins who couldn't code in the first place.
An introduction to Clear Containers
Posted May 22, 2015 4:22 UTC (Fri) by ghane (guest, #1805) [Link]
For the much the same reason, I suppose, that we have sysadmins who cannot build a PC from scratch.
It is a "layers" thing, or "abstraction", or some such. Each team handles its own layer in the stack.
An introduction to Clear Containers
Posted May 22, 2015 22:42 UTC (Fri) by motk (subscriber, #51120) [Link]
>You end up with far, far less to manage. And all of it provides apis and tools that make it far more
>scriptable so you can automate away loads of tasks. And yes, that does require sysadmins who can
>write code. But then I never quite understood why we started having sysadmins who couldn't code in
>the first place.
Oh, I can code, but I could no longer by any means call myself a developer. The right tool for the right job, and if I need systems development programming done to glue stacks together it's time for a real developer.
An introduction to Clear Containers
Posted Sep 4, 2015 16:09 UTC (Fri) by bmullan (guest, #88168) [Link]
With lxc 1.x release last year, it supports both unprivileged & privileged containers, pre-built container templates for centos, debian, oracle, ubuntu etc linux so I can have say an Ubuntu Host & any other Linux in an LXC container, CRIU, "nested" containers, security w/apparmor, selinux & seccomp.
With the introduction of LXC (lexdee) to manage LXC containers locally or remotely, LXC gained a RESTful API.
There's now an LXD/LXC plugin for Openstack (nclxd) so Openstack can spin up local/remote LXC containers as "VM's" instead of KVM, etc VMs.
You can today already use Canonical's Juju to spin up a complete openstack on your laptop all running in LXC.
LXC is also dead simple to use from the CLI perspective.
Just thought I'd highlight that not all innovation is limited to Docker.
Stephane Graber is one of the core LXC developers and he wrote a great 10 part series last year to introduce all the new LXC features:
https://www.stgraber.org/2013/12/20/lxc-1-0-blog-post-ser...
An introduction to Clear Containers
Posted Sep 29, 2015 19:03 UTC (Tue) by einstein (guest, #2052) [Link]
Docker has a completely different focus than lxc, or openvz. Docker seems, to me, primarily a way to launch a single application, so it's basically this little wrapper around an executable. In stark contrast, a typically use case for e.g. openvz is to run a full-blown, multi-user, mutli-function server, and lxc has the same sort of capabilities. Each approach has its supporters and legitimate use cases.
An introduction to Clear Containers
Posted May 30, 2015 20:23 UTC (Sat) by toyotabedzrock (guest, #88005) [Link]
An introduction to Clear Containers
Posted May 30, 2015 22:01 UTC (Sat) by dlang (guest, #313) [Link]
Used properly containers can be somethting very different.
One way of looking at containers is that they give the datacenter management similar capabilities to what Mainframes had in that they can just submit 'jobs' to be run, and the different jobs can be scheduled to run as best benefits the datacenter. The different 'jobs' can be shuffled from machine to machine as needed for load, failures, maintenance etc. and the 'job owner' isn't going to care, as long as the job is running somewhere.
yes, VMs can do the same thing, but at a significant cost in overhead (cpu, memory allocation inefficiency, etc)
An introduction to Clear Containers
Posted May 31, 2015 14:24 UTC (Sun) by dmarti (subscriber, #11625) [Link]
Containers can let you have *parallel stacks of clean packages*. First write your RPM specfile (or use your package manager of choice) to make a clean, repeatable install of known software. Then wrap a simple Dockerfile (or config for whatever container flavor is hot at deploy time) around that.
Sometimes you see containers used for parallel stacks of "curl | sh" which is a monster time-suck ( http://blog.neutrondrive.com/posts/235158-docker-toxic-he... ) but they don't have to be that way.
Packages for clean, repeatable installs. Wrapped in containers for when you need multiple trees of dependencies on the same box.
An introduction to Clear Containers
Posted Dec 23, 2015 2:23 UTC (Wed) by gdamjan (subscriber, #33634) [Link]
1) Is there some checklist for building a kernel without the legacy stuff, and with the necessary stuff for kvmtool/lkvm?
for ex. is:
# CONFIG_PCI is not set
CONFIG_NET_9P_VIRTIO=y
CONFIG_VIRTIO_BLK=y
CONFIG_VIRTIO_NET=y
CONFIG_VIRTIO_CONSOLE=y
ok? enough? more is needed, less?
2) Also, what does the userspace need to do to initialize the network and the 9pfs shared directory
An introduction to Clear Containers
Posted Feb 1, 2016 13:09 UTC (Mon) by PradeepJagadeesh (guest, #106732) [Link]
I am new to this clear containers. I am experimenting the memory foot print of the VMs. It is mentioned in the article that memory foot print per container is 18-20MB. Can some one please help me to understand? Even if I use the demo images which are part of this article I cant get those numbers. I always get > 60MB per image. Even if I launch 100 instances it will not be less than 50MB. Please help me to understand. Am I missing something here?
When you say over head it is hypervisor+guest?.
Also please let me know which kernel you guys used to come to this number (18MB) and cli options you used for running the container.
Thanks in advance.
Regards,
Pradeep