New feature request: Selectively disable caching for specific RUN commands in Dockerfile #1996

mohanraj-r · 2013-09-24T20:18:45Z

branching off the discussion from #1384 :

I understand -no-cache will disable caching for the entire Dockerfile. But would be useful if I can disable cache for a specific RUN command? For example updating repos or downloading a remote file .. etc. From my understanding that right now RUN apt-get update if cached wouldn't actually update the repo? This will cause the results to be different than from a VM?

If disable caching for specific commands in the Dockerfile is made possible, would the subsequent commands in the file then not use the cache? Or would they do something a bit more intelligent - e.g. use cache if the previous command produced same results (fs layer) when compared to a previous run?

tianon · 2013-09-24T20:51:38Z

I think the way to combat this is to take the point in the Dockerfile you do want to be cached to and tag that as an image to use in your future Dockerfile's FROM, that can then be built with -no-cache without consequence, since the base image would not be rebuilt.

mohanraj-r · 2013-10-03T03:43:59Z

But wouldn't this limit interleaving cached and non-cached commands with ease ?

For e.g. lets say I want to update my repo and wget files from a server and perform bunch of steps in between - e.g. install software from the repo (that could have been updated) - perform operations on the downloaded file (that could have changed in the server) etc.

What would be ideal is for a way to specify to docker in the Dockerfile to run specific commands without cache every time and the only reuse previous image if there is no change (for e.g no update in repo).

Wouldn't this be useful to have ?

joelreymont · 2013-10-18T19:29:42Z

What about CACHE ON and CACHE OFF in the Dockerfile? Each instruction would affect subsequent commands.

konklone · 2013-10-29T02:55:01Z

Yeah, I'm using git clone commands in my Dockerfile, and if I want it to re-clone with updates, I need to, like, add a comment at the end of the line to trigger a rebuild from that line. I shouldn't need to create a whole new base container for this step.

githart · 2013-11-06T06:20:26Z

Can a container ID be passed to 'docker build' as a "do not cache past this ID" instruction? Similar to the way in which 'docker build' will cache all steps up to a changed line in a Dockerfile?

shykes · 2014-01-06T19:23:49Z

I agree we need more powerful and fine-grained control over the build cache. Currently I'm not sure exactly how to expose this to the user.

I think this will become easier with the upcoming API extensions, specifically naming and introspection.

timruffles · 2014-02-06T14:08:19Z

Would be a great feature. Currently I'm using silly things like RUN a=a some-command, then RUN a=b some-command to break the cache

rogernolan · 2014-02-07T10:22:16Z

Getting better control over the cache would make using docker from CI a lot happier.

crosbymichael · 2014-02-07T17:17:01Z

@shykes

What about changing --no-cache from a bool to a string and have it take a regex for where in the docker we want to bust the cache?

docker build --no-cache "apt-get install" .

shykes · 2014-02-07T17:29:40Z

I agree and suggested this exact feature on IRC.

Except I think to preserve reverse compatibility we should create a new flag (say "--uncache") so we can keep --cached as a (deprecated) bool flag that resolves to "--uncache .*"

On Fri, Feb 7, 2014 at 9:17 AM, Michael Crosby notifications@github.com
wrote:

@shykes
What about changing --no-cache from a bool to a string and have it take a regex for where in the docker we want to bust the cache?

docker build --no-cache "apt-get install" .

Reply to this email directly or view it on GitHub:
#1996 (comment)

crosbymichael · 2014-02-07T17:31:29Z

What does everyone else think about this? Anyone up for implementing the feature?

timruffles · 2014-02-08T12:11:50Z

I'm up for having a stab at implementing this today if nobody else has started?

timruffles · 2014-02-09T13:50:01Z

I've started work on it - wanted to validate the approach looks good.

The noCache field of buildfile becomes a *regexp.Regexp.
- A nil value there means what utilizeCache = true used to.
Passing a string to docker build --no-cache now sends a validate regex string to the server.
Just calling --no-cache results in a default of .*
The regex is then used in a new method buildfile.utilizeCache(cmd []string) bool to check commands that ignore cache

One thing: as far as I can see, the flag/mflag package doesn't support string flags without a value, so I'll need to do some extra fiddling to support both --no-cache and --no-cache some-regex

tianon · 2014-02-25T03:50:06Z

I really think this ought to be a separate new flag. The behavior and syntax of --no-cache is already well defined and used in many, many places by many different people. I'd vote for --break-cache or something similar, and have --no-cache do exactly what it does today (since that's very useful behavior that many people rely on and still want).

Anyways, IANTM (I am not the maintainer) so these are just my personal thoughts. :)

timruffles · 2014-02-25T08:46:28Z

@tianon --no-cache is currently bool, so this simply extends the existing behaviour.

docker build --no-cache - same behaviour as before: ignores cache
docker build --no-cache someRegex - ignores any RUN or ADD commands that match someRegex

tianon · 2014-02-25T15:15:27Z

Right, that's all fine. The problem is that --no-cache is a bool, so the existing behavior is actually:

--no-cache=true - explicitly disable cache
--no-cache=false - explicitly enable cache
--no-cache - shorthand for --no-cache=true

I also think we'd be doing ourselves a disservice by making "true" and "false" special case regex strings to solve this, since that will create potentially surprising behavior for our users in the future. ("When I use --no-cache with a regex of either 'true' or 'false', it doesn't work like it's supposed to!")

timruffles · 2014-03-01T12:02:30Z

@tianon yes you're right. Had a quick look and people are using =true/false.

Happy to modify the PR to add new flag as you suggest, what do the maintainers think (@crosbymichael, @shykes)? This would also mean I could remove the code added to mflag to allow string/bool flags.

crazyscience · 2014-03-13T21:52:40Z

+1 for @wagerlabs approach

marcuslinke · 2014-04-11T20:27:06Z

@crosbymichael, @timruffles Wouldn't it be better if the author of the Dockerfile decides which build step should be cached and which should not? The person that creates the Dockerfile is not necessarily the same that builds the image. Moving the decision to the docker build command demands detailed knowledge from the person that just want to use a specific Dockerfile.

Consider a corporate environment where someone just want to rebuild an existing image hierarchy to update some dependencies. The existing Dockerfile tree may be created years ago by someone else.

hunterloftis · 2014-04-13T00:42:58Z

+1 for @wagerlabs approach

cressie176 · 2014-04-14T05:22:50Z

+1 for @wagerlabs approach although it would be even nicer if there was a way to cache bust on a time interval too, e.g.

CACHE [interval | OFF]
RUN apt-get update
CACHE ON

I appreciate this might fly against the idea of containers being non deterministic, however it's exactly the sort of thing you want to do in a continuous deployment scenario where your pipeline has good automated testing.

As a workaround I'm currently generating cache busters in the script I use to run docker build and adding them in the dockerfile to force a cache bust

FROM ubuntu:13.10
ADD ./files/cachebusters/per-day /root/cachebuster
...
ADD ./files/cachebusters/per-build /root/cachebuster
RUN git clone git@github.com:cressie176/my-project.git /root/my-project

tfoote · 2014-04-19T01:00:55Z

I'm looking to use containers for continuous integration and the ability to set timeouts on specific elements in the cache would be really valuable. Without this I cannot deploy. Forcing a full rebuild every time is much too slow.

My current plan to work around this is to dynamically inject commands such as RUN echo 2014-04-17-00:15:00 with the generated line rounded down to the last 15 minutes to invalidate cache elements when the rounded number jumps. ala every 15 minutes. This works for me because I have a script generating the dockerfile every time, but it won't work without that script.

amarnus · 2014-05-02T15:43:06Z

+1 for the feature.

hiroprotagonist · 2014-05-07T12:13:42Z

I also want to vote for this feature. The cache is annoying when building parts of a container from git repositories which updates only on the master branch.
👍

amarnus · 2014-05-07T12:15:45Z

@hiroprotagonist Having a git pull in your ENTRYPOINT might help?

hiroprotagonist · 2014-05-08T17:07:19Z

@amarnus I've solved it similar to the idea @tfoote had. I am running the build from a jenkins job and instead of running the docker build command directly the job starts a build skript wich generates the Dockerfile from a template and adds the line 'RUN echo currentsMillies' above the git commands. Thanks to sed and pipes this was a matter of minutes. Anyway, i still favor this feature as part of the Dockerfile itself.

olavurmortensen · 2021-05-25T09:13:34Z

I agree that this feature would be very helpful.

At the moment, I use the solution suggested above using the ARG command, to increment a build number. As shown below.

FROM somthing
ARG build=1
RUN some-non-deterministic-command

This works fine. But the problem with this solution is that it requires you to remember to increment the build variable. It is certain that one will, at some point, forget this, and spend two days figuring out what is going wrong.

thomas10-10 · 2021-06-13T16:13:42Z

Why was this issue closed?

caio-vinicius · 2021-10-16T15:06:46Z

I agree that this feature would be very helpful.

At the moment, I use the solution suggested above using the ARG command, to increment a build number. As shown below.
FROM somthing
ARG build=1
RUN some-non-deterministic-command
This works fine. But the problem with this solution is that it requires you to remember to increment the build variable. It is certain that one will, at some point, forget this, and spend two days figuring out what is going wrong.

You can use the $RANDOM env variable.

caio-vinicius · 2021-10-16T15:08:37Z

Would love this feature.

zkscpqm · 2021-12-23T15:18:21Z

For anyone that has the luxury to automate their builds, this is what I like to do:

I put a placeholder in the Dockerfile template like: https://github.com/zkscpqm/Car-Zix/blob/master/Dockerfile_template#L9 which dictates where my cache ends.

I then spawn a unique Dockerfile each time I do a build and I replace the placeholder with some hash:
https://github.com/zkscpqm/Car-Zix/blob/master/run_tests_docker.py#L38-L44

Hope this helps))

See moby/moby#1996 (comment). Closes #20.

koplenov · 2022-06-09T15:04:45Z

Nine years have there been any changes?

In 2013, the CACHE ON and CACHE OFF commands were proposed.
Each instruction will affect subsequent commands.

How is it now?

HariSekhon · 2022-06-10T15:08:20Z

Solutions I came up with in the interim:

To invalidate the cache at a specific step every time:

ADD http://date.jsontest.com /etc/builddate

or

ADD http://worldclockapi.com/api/json/utc/now /etc/builddate

For GitHub repos, only invalidate the cache at this step if the repo, in this case HariSekhon/DevOps-Python-tools, has had new commits since the last build:

ADD https://api.github.com/repos/HariSekhon/DevOps-Python-tools/git/refs/heads/master /.git-hashref

I use these sorts of tricks a lot in my large Dockerfiles repo containing lots of different apps and builds, including packaging my GitHub repos tools, scripts and dependencies:

https://github.com/HariSekhon/Dockerfiles

These and other tricks are most succinctly shown in my master Dockerfile template in my Templates repo which has templates for lots of the most popular DevOps technologies like Make, Jenkins, GitHub Actions, Docker, Kubernetes etc...:

https://github.com/HariSekhon/Templates/blob/master/Dockerfile

MaxTranced · 2022-06-13T10:37:44Z

To invalidate the cache at a specific step every time:

Neat!

Does the worldclockapi.com service have a documentation page (I could not find it)... Or do you know of any way to invalidate the cache once every Monday? I'm guessing and API endpoint that returns the current week number would achieve that...

Thank you so much!

HariSekhon · 2022-06-13T12:30:31Z

@MaxTranced I've used date.jsontest.com more than worldclockapi.com (which is giving me a 503 error right now), but there are some others that should world at the top of a Google Search:

https://timeapi.io/swagger/index.html

http://worldtimeapi.org/pages/examples

The latter seems like it can do week of the year as week_number according to its schema documentation:

http://worldtimeapi.org/pages/schema

This one has an API to return just the week:

https://timezoneapi.io

eg.

https://timezoneapi.io/api/ip/?token=<YOUR_TOKEN>&only=datetime(week)

but it unfortunately also returns the execution time which would bust the cache on every request because the millisecond timing would be different. You might want to contact them and see if there is an option to not do that and point them to this thread as the Dockerfile use case.

Another solution is to wrap your docker build in a Makefile or CI/CD step which does a date '+%W' > week_of_year.txt before the docker build step and then have your Dockerfile COPY week_of_year.txt /etc/ or similar to break the cache once a week.

MaxTranced · 2022-06-15T09:51:33Z

@MaxTranced I've used date.jsontest.com more than worldclockapi.com (which is giving me a 503 error right now), but there are some others that should world at the top of a Google Search:
[...]
Another solution is to wrap your docker build in a Makefile or CI/CD step which does a date '+%W' > week_of_year.txt before the docker build step and then have your Dockerfile COPY week_of_year.txt /etc/ or similar to break the cache once a week.

Thank you so much for the suggestions! I did search for a while but did not find the timezoneapi.io service. I will put the advice to good use!

douglasg14b · 2022-11-11T01:39:52Z

Using the ARG with a random value doesn't seem to prevent COPY commands from caching what they are copying....

How can I prevent caching of certain COPY commands in my Dockerfile?

ThomasParistech · 2022-11-15T16:30:43Z

I tried using the same trick as
ADD https://api.github.com/repos/HariSekhon/DevOps-Python-tools/git/refs/heads/master /.git-hashref

but got the following error when building the image for the second time.
failed to load cache key: invalid not-modified ETag: "5366c7b8ba2a8e3a77f127e5cf2839fcf610582492997674ea17ab659df1cce3"

Any clue ?

HariSekhon · 2022-11-15T20:00:00Z

Using the ARG with a random value doesn't seem to prevent COPY commands from caching what they are copying....

How can I prevent caching of certain COPY commands in my Dockerfile?

@douglasg14b

Is the COPY command definitely below the ARG command in the Dockerfile in that case? If so then perhaps that's an optimization that Docker has made more recently... on which version of Docker do you see that behaviour?

HariSekhon · 2022-11-15T20:02:44Z

@ThomasParistech

Which version of Docker is that happening for you?

I've definitely used that before... perhaps the behaviour has been changed to reference a cache key load but I'm unsure how that could be interpreted that way given this is the current output of the sample URL I gave above:

$ curl https://api.github.com/repos/HariSekhon/DevOps-Python-tools/git/refs/heads/master
{
  "ref": "refs/heads/master",
  "node_id": "MDM6UmVmNDUwNDkwMjY6cmVmcy9oZWFkcy9tYXN0ZXI=",
  "url": "https://api.github.com/repos/HariSekhon/DevOps-Python-tools/git/refs/heads/master",
  "object": {
    "sha": "dc4b1ce2b2fbee3797b66501ba3918a900a79769",
    "type": "commit",
    "url": "https://api.github.com/repos/HariSekhon/DevOps-Python-tools/git/commits/dc4b1ce2b2fbee3797b66501ba3918a900a79769"
  }
}

Are you querying a different URL that is returning only a hashref that Docker is interpreting differently or are you targeting a /git/refs/heads/master github URL which is returning JSON as shown above?

ThomasParistech · 2022-11-16T08:01:33Z

@HariSekhon
I'm using following config:
Docker version 20.10.12, build 20.10.12-0ubuntu2~20.04.1
And got the same kind of JSON as you but I run my command

I'm also using BuildKit, # syntax=docker/dockerfile:1.3

This is a private repo, so I pass my GitHub personal access token as well, but I don't think this explains the difference

ImanolSantiago · 2023-03-01T22:58:32Z

Nine years have there been any changes?

In 2013, the CACHE ON and CACHE OFF commands were proposed. Each instruction will affect subsequent commands.

How is it now?

greetings from the future
Bad news... we still don't have a practical solution

I'm surprised they can't or don't want to implement it

Simply running git clone as a layer meant that the cached repository is always used, even when the bench_repo has been changed. To work around this, we use the GitHub refs API to see if the repo has changed, to decide whether to use the cached bench_repo or make a new clone of it. The trick here is from this GitHub [issue comment](moby/moby#1996 (comment)) Also, this commit adds support for specifying a specific branch of the bench_repo to use for running the benchmarks. The branch can be specified using the `/tree/<branch-name>` suffix in the bench_repo URL.

* Skip installing opam dependencies when using separate bench_repo * Ensure cached bench_repo is used only when repo has no changes Simply running git clone as a layer meant that the cached repository is always used, even when the bench_repo has been changed. To work around this, we use the GitHub refs API to see if the repo has changed, to decide whether to use the cached bench_repo or make a new clone of it. The trick here is from this GitHub [issue comment](moby/moby#1996 (comment)) Also, this commit adds support for specifying a specific branch of the bench_repo to use for running the benchmarks. The branch can be specified using the `/tree/<branch-name>` suffix in the bench_repo URL.

Aiosa · 2023-04-29T08:07:23Z

if a new commit is pushed to that repository, the step won't be executed again, because the RUN itself didn't change. Docker has no way to determine "what" is executed in a RUN instruction, other than what's in the Dockerfile. Generally, recommendation for that is to make the Dockerfile deterministic, e.g. using a build-arg;

This is all theoretically nice, but then you enter a real world use cases in for example kubernetes and you want to be able to run the same image as both a job and a service for example. Then nothing like this works well since you are then forced to keep a bunch of variables and arguments up to date in various configuration files (e.g., yaml). If you have multiple repositories and change stuff frequently (development on cloud with containers with 100GB+ RAM) you realize a theory and a practice all two different things. And the only thing you wanted was to have an up-to-date git repository clone.

HariSekhon · 2023-04-29T14:15:53Z

@Aiosa I agree that with git repo clones you want to get an up to date clone... did you see my solution for that above, I thought it was quite novel:

#1996 (comment)

Edwardius · 2023-08-30T04:30:10Z

Sad that there's no current solution :(. I love using dev containers, but I have a couple of build commands in the Dockerfile that I wish not to be cached whenever I change my code. It's annoying to find that Docker has cached a past build.

If only there were a way to specify to Docker only to try caching up to a certain build stage.

tonistiigi · 2023-10-15T18:24:24Z

Setting no-cache to specific commands can be done with a --no-cache-filter option. You can put the commands you want to control under a named stage and then specify all the stages with the flag.

FROM alpine AS base
RUN cmd-that-keeps-cache

FROM base AS pkgs
RUN install-cmd-every-time

build --no-cache-filter=pkgs

joelreymont mentioned this issue Nov 13, 2013

Proper parser for Dockerfile #2266

Closed

unclejack mentioned this issue Feb 11, 2014

Allow specifying --no-cache to RUN command #3673

Closed

timruffles mentioned this issue Feb 24, 2014

Feature: --no-cache takes a regexp to ignore matching lines in Dockerfile #4322

Closed

cressie176 mentioned this issue May 2, 2014

Allow files and paths to be ignored when uploading context #2224

Closed

brandonmpetty mentioned this issue Aug 29, 2021

Feature: CACHE OFF support, take II #42799

Open

TheTeXnician pushed a commit to islandoftex/texlive that referenced this issue May 19, 2022

Add ARG CACHEBUST in Dockerfile to invalidate cache in latest images

1446c57

See moby/moby#1996 (comment). Closes #20.

NikhilReddy-2 mentioned this issue Jul 4, 2022

Added docker integration with holon-niova 00pauln00/niova-core#224

Merged

liuzeming-yuxi mentioned this issue Mar 27, 2023

[BUG]: The version of ColossalAI in docker image is error hpcaitech/ColossalAI#3264

Closed

punchagan mentioned this issue Apr 19, 2023

Couple of improvements to using a separate bench_repo ocurrent/current-bench#431

Merged

thaJeztah mentioned this issue Oct 1, 2023

Proposal: Allow setting cache expiration time for build steps moby/buildkit#4294

Open

innobead mentioned this issue Jan 9, 2024

[CI] Use CACHEBUST to rebuild dapper build image when using dapper for dependency management longhorn/longhorn#7594

Open

New feature request: Selectively disable caching for specific RUN commands in Dockerfile #1996

New feature request: Selectively disable caching for specific RUN commands in Dockerfile #1996

Comments

mohanraj-r commented Sep 24, 2013

tianon commented Sep 24, 2013

mohanraj-r commented Oct 3, 2013

joelreymont commented Oct 18, 2013

konklone commented Oct 29, 2013

githart commented Nov 6, 2013

shykes commented Jan 6, 2014

timruffles commented Feb 6, 2014

rogernolan commented Feb 7, 2014

crosbymichael commented Feb 7, 2014

shykes commented Feb 7, 2014

docker build --no-cache "apt-get install" .

crosbymichael commented Feb 7, 2014

timruffles commented Feb 8, 2014

timruffles commented Feb 9, 2014

tianon commented Feb 25, 2014

timruffles commented Feb 25, 2014

tianon commented Feb 25, 2014

timruffles commented Mar 1, 2014

crazyscience commented Mar 13, 2014

marcuslinke commented Apr 11, 2014

hunterloftis commented Apr 13, 2014

cressie176 commented Apr 14, 2014

tfoote commented Apr 19, 2014

amarnus commented May 2, 2014

hiroprotagonist commented May 7, 2014

amarnus commented May 7, 2014

hiroprotagonist commented May 8, 2014

olavurmortensen commented May 25, 2021

thomas10-10 commented Jun 13, 2021

caio-vinicius commented Oct 16, 2021

caio-vinicius commented Oct 16, 2021

zkscpqm commented Dec 23, 2021 • edited

koplenov commented Jun 9, 2022

HariSekhon commented Jun 10, 2022 • edited

MaxTranced commented Jun 13, 2022

HariSekhon commented Jun 13, 2022

MaxTranced commented Jun 15, 2022

douglasg14b commented Nov 11, 2022

ThomasParistech commented Nov 15, 2022

HariSekhon commented Nov 15, 2022 • edited

HariSekhon commented Nov 15, 2022 • edited

ThomasParistech commented Nov 16, 2022

ImanolSantiago commented Mar 1, 2023

Aiosa commented Apr 29, 2023 • edited

HariSekhon commented Apr 29, 2023

Edwardius commented Aug 30, 2023

tonistiigi commented Oct 15, 2023

`docker build --no-cache "apt-get install" .`

zkscpqm commented Dec 23, 2021 •

edited

HariSekhon commented Jun 10, 2022 •

edited

HariSekhon commented Nov 15, 2022 •

edited

HariSekhon commented Nov 15, 2022 •

edited

Aiosa commented Apr 29, 2023 •

edited