Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfiles should have a way to perform multiple build actions in one commit #2439

Closed
bwilkins opened this issue Oct 29, 2013 · 17 comments
Closed
Labels
area/builder kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@bwilkins
Copy link

I'm using multiple docker files to build up an environment (with the aim of a set of environments). Unfortunately I blow through AUFS's 42 layer limitation with this. I would like to be able to collapse a set of actions into a single commit.

I envision this being done as perhaps BEGIN and COMMIT commands (akin to SQL's transaction commands). I may look at doing this myself if I can find the time, but if this doesn't fit into the ideal of the app then I don't want to put too much effort into it.

My work-around in the meantime is to have a bash script that builds each Dockerfile then exports to tar and reimports into the same image tag, before beginning the next build.

@jpetazzo
Copy link
Contributor

Did you see #1799 (Multiline Dockerfile syntax) and (slightly related) #332 (merge multiple layers) ?

@bwilkins
Copy link
Author

I hadn't seen #1799, but I've seen that option and do not find it overly appealing; I don't like run-on commands (using &&), and the only reason I see to have line continuations is for a single command invocation with a lot of arguments (such as apt-get install with a massive list of packages). I might seem to be coming off something of a shell-script prude here, but I like things to be "clean".

I have seen #332, or something like it, as I recall having seen shykes' message about exporting to tarball and re-importing, which is exactly what I'm doing currently.

Unfortunately, by exporting and reimporting at every step, the base image becomes increasingly massive, to the point that I've got a 1.5GB base image for the final set of changes to build from, and that now seems to be having a checksum mismatch issue when pushing to a local private registry.

So I have a couple of options. I can rearrange the commands and liberally apply && and -line continuations in an effort to decrease the number of layers, which just seems hacky and prone to error to me. Or, I can look at improving the system itself (assuming others see it as an improvement?)

I have posted to the mailing list here: https://groups.google.com/forum/?fromgroups#!topic/docker-dev/CbmK1KUS8Sk

@jpetazzo
Copy link
Contributor

Ah, sorry, I forgot to mention: the problem should go away with Docker 0.7 since AUFS will be replaced by another backend which doesn't have that limitation! That's why you don't see tremendous efforts to work around it.

The other options (explicit COMMIT, squashing image histories...) are still nice in the long run; but we have a short-to-mid-run option coming fast that will alleviate the issue :-)

@binaryphile
Copy link

I'll chime in here to say that in my opinion, AUFS's limit is not the problem. Storing multiple layers which have no intrinsic value is the problem.

Let's say I run one command that uses up a tremendous number of file descriptors for temporary files, perhaps something like apt-get or make. If a layer is captured after the completion of that command, there is never an opportunity to remove those file descriptors from the layer. Doing so on the next command simply hides them rather than actually removing them. This leads to unnecessary consumption of disk space in that now-preserved layer, along with the performance penalty of having to union that filesystem index when searching for a file.

Simply increasing the available layers without the ability to expunge useless ones is just going to make the penalties more prevalent. There should be a good way to a) indicate to the dockerfile that you don't want intermediary layers and b) squash useless layers after the fact without blowing your inheritance from other images. Continuation lines in dockerfiles aren't a good solution either, as they just result in syntactic eccentricities like this: https://gist.github.com/SamSaffron/7208665.

My $.02.

@shykes
Copy link
Contributor

shykes commented Dec 12, 2013

I agree there is a final fix (don't require a commit for each build step), and there are intermediary fixes (for example 0.7.2 will raise the layers limit to 127).

I'm tentatively scheduling this for 0.8.

@borromeotlhs
Copy link

I believe that, perhaps, extra commands in the docker vocabulary would be apt? Perhaps even allowing the use of git-style add/commit/merge/rebase/stash/cherry-pick/etc commands?

The more I see docker, the more I think of version control for app environments. But, being that I love git for version control, I then find myself yearning for git-style level of control with git-porcelain/git-plumbing extensibility.

This makes me wonder if I can set up a git repository with multiple branches, and on each branch, attempt different docker pull image commands? Would that work?

@timthelion
Copy link
Contributor

I am very fond of this idea. I've written out a detailed proposal bellow.

Abstract: Add a LAYER keyword with smart indentation to Dockerfile which allows one to preform multiple docker commands within one layer:

Example Dockerfile for vim

FROM ubuntu
LAYER install-vim
 RUN apt-get update
 RUN apt-get install -yq vim
 RUN apt-get clean
ENV LANG=en_US.UTF-8

The last command apt-get clean would actually save space, because everything is happening within one layer.

Motivation: There two clear reasons why a person would want to combine multiple steps into one layer:

Current work around: docker run and docker commit. Exporting and re-importing tarballs.

Proposal proper:

A new LAYER keyword should be added. It will take one argument, a layer name. Layer names are no-ops, but they are used to prettify build output. The LAYER keyword is special in that it is followed by an indented code block. All commands in that block are run normally then squashed into a single layer after they have finished.

You can use this keyword to save space:

FROM debian
LAYER install-web-server
 ADD http://example.com/web-server.tar.gz /web-server-source
 RUN cd /web-server-source ; make install
 RUN rm -rf /web-server-source
ADD ./conf-file /etc/web-server/conf-file

Or to use some private file that you don't want in the resulting image:

FROM debian
RUN apt-get update
RUN apt-get install web-server
ADD ./access-list /etc/web-server/access-list
LAYER sign-access list
 ADD ./my-gpg-key-private-key.gpg /root/.gpg/gpg-private.gpg
 RUN gpg --output /etc/web-server/access-list.sig --sign /etc/web-server/access-list
 RUN rm -r /root/.gpg/

This is used to sign an access list with your private gpg key without leaving your gpg key in the resulting image.

Implementation sketch: Many of the things we need to implement this are here: #4232 I haven't looked at the parsing code.

@borromeotlhs
Copy link

With a 'layer' that is opaque, and a method to move commands into and out of the layer, there would be a compelling workflow that would allow images to docker file and back. The biggest hurdle I've seen from even using docker is that, even if I could hide private info, once I decide on an image or a docker file format, I'm kind of stuck: (

With layer and import/export, there could be some greater reuse and deployment.

This doesn't touch upon docker load time settings APIs that should exist, but it's a start :)

@cyphar
Copy link
Contributor

cyphar commented Aug 2, 2014

@timthelion I agree with this with only one exception: we should use { and } to denote the start and end of blocks. This IMO makes the syntax more robust (Python's parsing code for indentation is hardly pretty and we'd have to implement something similar if we want a solid parser for Dockerfiles).

@cyphar
Copy link
Contributor

cyphar commented Aug 2, 2014

@timthelion Also, why the no-op argument? If we want to have an argument, make docker tag the resultant layer with the given argument. If that is not intended functionality, then don't include the argument at all. There is no Dockerfile directive that has no-op arguments (AFAIK), why add one now? Users expect that arguments to a directive actually affect the directive. Comments are used for no-ops.

@bwilkins
Copy link
Author

bwilkins commented Aug 2, 2014

@cyphar I'd suggest that BEGIN and END would be similarly reasonable tokens for denoting the beginning and end of a block. I only suggest this over {} because the rest of the Dockerfile syntax thus far has been word-based.

@cyphar
Copy link
Contributor

cyphar commented Aug 2, 2014

@bwilkins I say {} for two reasons:

  • @shykes' proposal for IN (Proposal: Nested builds #7115) uses { and }.
  • Lots of languages and projects use { to } represent blocks. If we don't want to use the pseudo-standard for blocks we need to have a good reason for it.

@bwilkins
Copy link
Author

bwilkins commented Aug 2, 2014

@cyphar in that case, { and } makes complete sense.

Coming from ruby myself, I tend to synonymise {...} with do...end, and begin...end similarly.

The benefits of {} are the terseness and familiarity for users coming from other languages.

@emilymaier
Copy link
Contributor

What's the status of this? I would love this feature and could try writing it if it's wanted but not being implemented.

@cyphar
Copy link
Contributor

cyphar commented Nov 24, 2014

@erikh can comment on if his new Dockerfile parser can deal with recursively defined grammars (assuming we want to allow for nested {}). As for the actual implementation, I'm not sure if anyone has said they wanted to work on it.

As an aside, I'd think that the best way of going about this is by calling each instruction "atomic", and that blocks allow users to run multiple instructions such that the whole thing is "atomic".

However, I think that changes to the builder would be quite drastic, because while it may seem that the only difference for the instructions in blocks is that they don't commit the changes until the block is exited -- there is a problem. Docker is oriented around images and running multiple instructions on a single container would prove to be an annoying thing to implement.

@jessfraz jessfraz added Proposal kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels Feb 26, 2015
@ghost
Copy link

ghost commented Jun 2, 2015

I think this is really needed for blocks like
ADD web-server.tar.gz /web-server-source/
RUN cd /web-server-source ; make install
RUN rm -rf /web-server-source

Because web-server.tar.gz can be a big file. And it'll be stored in some of layers

@jessfraz
Copy link
Contributor

Hello!
We are no longer accepting patches to the Dockerfile syntax as you can read about here: https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax

Mainly:

Allowing the Builder to be implemented as a separate utility consuming the Engine's API will open the door for many possibilities, such as offering alternate syntaxes or DSL for existing languages without cluttering the Engine's codebase

Then from there, patches/features like this can be re-thought. Hope you can understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests

10 participants