Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for referring to images by digest #11109

Merged
merged 1 commit into from Mar 17, 2015

Conversation

ncdc
Copy link
Contributor

@ncdc ncdc commented Mar 2, 2015

Overview

When I create a container, I may specify an image such as mysql:latest. When the image is pulled, latest is resolved to a particular image at that point in time. If I later want to add more containers (e.g. possible read slaves in the MySQL case), all the new containers must use the exact same image as my first container. Using a tag isn't sufficient as the tag can be updated to point at a different image. We need a way to refer to images using immutable identifiers.

With the v2 image format, image manifests have content-addressable, immutable digests. The v2 registry supports retrieving manifests by digest; this pull request adds the corresponding support to the Docker Core.

Digest reference format

We'll need to provide a means to reference an image by its digest. I'd like to propose the following format:

namespace/repository@digest

Supported commands

We'll need to make sure the following commands continue to work as they currently do, as well as with an optional digest:

  • docker build
  • docker pull
  • docker create
  • docker run
  • docker rmi

This list should eventually be comprehensive; anywhere you can refer to an image, you should be able to do so by digest. For the time being, the commands listed above should be sufficient for the use case listed in the overview.

docker images

If you pull an image from a v2 registry, the registry provides the image manifest's digest as a response header. If you pull by tag, you'll have both the tag and the digest. If you pull only by digest, you won't have the tag information. 1 question to resolve is how to display images, tags, digests, and v1 image IDs in the docker images command.

It is possible (likely?) that a variety of code exists to scrape the output of docker images. If we change the format, e.g. by adding a new column, or overloading an existing column to sometimes show a v1 image ID vs a digest, we will probably break whatever code is out there doing the scraping. We need to determine how critical it is to retain the existing column and data formats in the docker images command.

I have the following ideas regarding this output:

  • if an image was pulled by tag, and the registry it's pulled from includes the image's digest, display repository, tag, digest, and image id
  • if an image was pulled by digest, display repository, for tag, digest, and image id

docker images questions

  • should we add a new DIGEST column?
  • if yes, should we always display it, or only display it if you indicate that you want to see digests (e.g. via a filter)
    • there can be confusion if you pull an image by digest and then run docker images - you won't see the image you just pulled
  • should we display the digest instead of the v1 image ID (or some other value)?

Questions

What about v1 registry support?
The v1 registry won't support this feature.

If I create an image locally via docker tag or docker commit, can I refer to it by name@digest?
As proposed in distribution/distribution#46, the registry is responsible for determining an image's digest and assigning it to the image. For an image that has not yet been pushed to a v2 registry, it may not be possible to refer to it by name@digest. This is unlikely to be a significant issue, as the use case for name@digest is consistent deployments using images pulled from registries. Or, if the community thinks this should be supported, we can revisit what component(s) are responsible for calculating digests.

Additional information

See #10740 for more backstory.

Todo:

  • Test cases
  • Update documentation

@GordonTheTurtle
Copy link

Can you please sign your commits following these rules:

https://github.com/docker/docker/blob/master/CONTRIBUTING.md#sign-your-work

The easiest way to do this is to amend the last commit:

$ git clone -b "pull-by-digest" git@github.com:ncdc/docker.git somewhere
$ cd somewhere
$ git rebase -i HEAD~3
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

This will update the existing PR, so you do not need to open a new one.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 2, 2015

Requires distribution/distribution#211 in the v2 registry

Test cases are still being written, but I wanted to submit this now given the March 18th code freeze so the maintainers can begin to review it and provide feedback.

@icecrime @jfrazelle @crosbymichael @stevvooe @vbatts

@dmcgowan
Copy link
Member

dmcgowan commented Mar 3, 2015

Can you list some example cli use cases. I see NAME[:TAG|@DIGEST] but don't see where @ symbol would be used. Since this PR is still in the design-review state, would be helpful to enumerate what visible changes are.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

Sure thing! Basically any place you potentially want to refer to an image using a tag, you can do it using a digest:

docker push localhost:5000/andy/busybox@sha256:2aac5e7514fbc77125bd315abe9e7b0257db05fe498af01a58e239ebaccf82a8
docker pull localhost:5000/andy/busybox@sha256:2aac5e7514fbc77125bd315abe9e7b0257db05fe498af01a58e239ebaccf82a8
docker run localhost:5000/andy/busybox@sha256:2aac5e7514fbc77125bd315abe9e7b0257db05fe498af01a58e239ebaccf82a8
docker create localhost:5000/andy/busybox@sha256:2aac5e7514fbc77125bd315abe9e7b0257db05fe498af01a58e239ebaccf82a8
docker rmi localhost:5000/andy/busybox@sha256:2aac5e7514fbc77125bd315abe9e7b0257db05fe498af01a58e239ebaccf82a8

and in a Dockerfile:

FROM localhost:5000/andy/busybox@sha256:2aac5e7514fbc77125bd315abe9e7b0257db05fe498af01a58e239ebaccf82a8

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

I also think we probably need to update the responses from the daemon to the client to include the digest (so things like go-dockerclient and other libraries can use it). This would be for push and pull initially, and later on possibly for build and tag as well (if those commands end up generating manifests at some point).

And we may want to display the digest to stdout as well.

@dmcgowan
Copy link
Member

dmcgowan commented Mar 3, 2015

Can you define a few of these other changes as well...

What happens when deleting all tags?
...I observed the image is gone, but digest still remains in /var/lib/docker/repositories-(driver)

What happens when pulling an image which does not exist locally?
...I got a dangling image, which may be expected, however the UI is does not show the REPOSITORY
<none> <none> e45a5af57b00 8 weeks ago 910 B
These images would then also get cleaned up with docker rmi $(docker images -q --filter=dangling=true)

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

What happens when deleting all tags?
...I observed the image is gone, but digest still remains in /var/lib/docker/repositories-

This is one I wanted input on. This is what's coded now, but I'm not sure what the appropriate desired behavior should be. I could use some help figuring out what makes the most sense here.

What happens when pull an image which does not exist locally?
...I got a dangling image, which may be expected, however the UI is does not show the REPOSITORY
e45a5af57b00 8 weeks ago 910 B
These images would then also get cleaned up with docker rmi $(docker images -q --filter=dangling=true)

I don't like this behavior, and am hoping we can come to a decision on what to do in this situation. We could

  • not display "dangling" images that were pulled by digest but don't have any corresponding tag pointing to them
  • display these images, perhaps like this
REPOSITORY TAG DIGEST IMAGE ID CREATED VIRUTAL SIZE
busybox latest sha256:6b019df8c73bb42e606225ef935760b9c428521eba4ad2519ef3ff4cdb3dbd69 4986bf8c1536 8 weeks ago 2.43 MB
busybox sha256:9bcaeba3d43193c3a2b61f5dc70e14d76e194f4766506b57610a10619919b192 4d2ed1de264f 8 weeks ago 2.43 MB

That makes the output quite wide, however, and we don't yet support shortened references to digests (TBD in distribution/distribution#194).

  • consider adding a new flag to docker images or a new command to display digests

Do any of these sound good, or does anyone have other ideas?

@dmcgowan
Copy link
Member

dmcgowan commented Mar 3, 2015

Currently the digests are only computed on the registry side, so then most the Digest fields would be blank unless all images were push or pulled with a v2 registry. For the second case I think I would prefer to just see the Repository and <none>. I also like the idea of having an optional argument to display digests.

My only worry with the first case is that it could lead to garbage building up in that file. I think removing the Digest link when the image is removed is the right thing to do. But then that relates to the second issue, what does it mean for an image to exist locally without a tag (why does rmi last tag delete an image and not put it in the same state as pulling without a tag?).

Will need consensus from @icecrime @jfrazelle @crosbymichael

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

In order to be able to write integration tests for this, we'll need distribution/distribution#211 merged, then we'll need to bump the commit in the Dockerfile for the distribution repo so we can spin up a v2 registry with support for pull by digest.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

I'm copying this comment from #10740 here as this is a better place for the discussion:

@stevvooe wrote in #10740 (comment) about having a parser that can return an "image object reference". Right now, parsers.ParseRepositoryTag takes a string, parses it, and returns a repository string and a tag string. In my prototype for this feature, I modified this method to return either the digest or the tag in the 2nd returned string. Doing it this way means that the remaining changes to support referring to images either by tag or by digest relatively minimal; however, it does muddy the waters a bit, since ParseRepositoryTag is now returning something that is either a digest or a tag. I've thought about a few possibilities for making this cleaner:

  1. Rename ParseRepositoryTag to ParseRepositoryReference, so it's clearer that it's not always a tag that comes back
  2. Do the rename from above, but also modify the signature to return (repository, tag, digest), where only 1 of tag and digest is ever set at a time
  3. Return a type, perhaps called ImageReference, that looks like this:
type ImageReference struct {
  Repository string // or possibly registry.RepositoryInfo instead of string
  Tag string
  Digest string
}

This ImageReference option would be a more invasive change, as anywhere ParseRepositoryTag is called will have to be modified to work with a struct instead of 2 strings.

@stevvooe indicated his desire to see ImageReference in #10740 (comment). Does anyone else want to weigh in?

@@ -12,6 +12,8 @@ import (
"github.com/docker/docker/utils"
)

const DockerDigestHeader = "Docker-Digest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we prefer this as Docker-Content-Digest?

cc @ncdc

@stevvooe
Copy link
Contributor

stevvooe commented Mar 3, 2015

@ncdc This is really up to the use case. In the digest package, where the string is mostly used as a unit, but is sometimes pulled apart, I've used the string + methods approach. If the primary use case is to use the exploded components, the struct definition is appropriate. I cannot tell from this PR which is better.

@shaded-enmity
Copy link

+1 For the ImageReference, my original rationale for a class like this was that the logic that handles repositories and tags (+now digests) is largely duplicated throughout Docker codebase - just see how many times in the code appears repoName + ":" + tag or in how many places a default tag is appended to the string, which might be easily handled as ir.String() in one place.

As for the output format, I think that changing the default output of docker images is dangerous, people may already be parsing it and the digest addition makes it hardly readable (sure depends on actual width of your terminal).
What do you think about adding a switch, say, docker images --provenance or docker images --identifiers that'd produce non-truncated output by default:

Image ID Tag(s) Digest Repository
4986bf8c15363d1c5d15512d5266f8777bfba4974ac56e3270e7760f6f0a8125 latest, newest sha256:9bcaeba3d43193c3a2b61f5dc70e14d76e194f4766506b57610a10619919b192 busybox

The output is: Image ID, All local tags for the given ID, Digest, RepoName

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

I agree that changing the output of docker images could potentially break a lot of scraping scripts. I'm in favor of a new flag as well.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

I've done a bit of work attempting to replace ParseRepositoryTag with ParseImageReference. Given that 1.6 code freeze is 15 days away, I think it's too risky a change to try to include in this PR. There are places, such as graph/tags.go#Delete() where the image reference is actually a v1 image ID, and there is logic in Delete that keys off of that scenario to support delete-by-v1-image-ID. And that's just 1 example.

While it would be great to have, creating ImageReference as part of this PR increases complexity, QA scope, and risk. I would like to get the OK to defer that until after 1.6. Thoughts?

@stevvooe @icecrime @jfrazelle @crosbymichael @vbatts

@vbatts
Copy link
Contributor

vbatts commented Mar 3, 2015

@ncdc while I think the ImageReference would be more clear and would help in abstracting the v1 and v2 references, it's probably best to hold off on that while the manifest discussion continues. There will likely be reusable structures that come from that work, and then we can later re-use them.

For now lets get this ready for the 1.6 window.

@dmcgowan
Copy link
Member

dmcgowan commented Mar 3, 2015

Is push by digest intended to be part of this PR?

Along the same lines may also be tag by digest. Currently to push an image I pulled by digest, I must tag by image id then push using that tag.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

Push by digest is supposed to be part of this PR. If it's not working, I might have overlooked testing that specific feature. I don't think it's critical to our workflow (OpenShift), but I think it makes sense to include it.

Are you saying that docker tag foo/bar@<digest> <new name> isn't working? That wasn't something that I tested, so there's probably a code path that I missed.

@dmcgowan
Copy link
Member

dmcgowan commented Mar 3, 2015

I don't think push by digest should be part of this PR and rather excluded until a later version which supports schema version 2 of the manifest. The current version of the manifest requires a tag to be set. Since the digest represents a manifest which contained a tag, that tag is what should be used for the push. However pull by digest currently does not use the manifest tag (perhaps it should) and it does not store the manifest. Since the manifest is not stored, pushing by digest will always recreate a new manifest and tars to push, causing a new digest to be created for the image id. @jlhawn is also working on a new implementation of storage of layers and manifest which would enable deterministic pushing of layers and manifests. It is probably best to wait until a new storage backend to support this.

The tag command should be supported if the digests are going to stick around for docker run

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

I'm happy to omit push by digest for now.

FYI, pushing the same manifest content multiple times should result in the same digest, as the unsigned portion of the (recreated) manifest should remain the same if nothing else has changed.... right? That's what I'm seeing locally, at least.

The tag command should be supported if the digests are going to stick around for docker run

I'm not against adding support for tagging a digest to a new tag, but I'm not sure how docker run plays into this?

@dmcgowan
Copy link
Member

dmcgowan commented Mar 3, 2015

We are not guaranteeing in this version of the manifest that the unsigned portion will not change. It shouldn't but there have been issues with using tarsum.

I just meant that if the digest is intended to be used for commands after pull, such as docker run, the image should be taggable.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

I just meant that if the digest is intended to be used for commands after pull, such as docker run, the image should be taggable.

We can certainly do that, but, at least for OpenShift, we plan on passing name@digest to docker run.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

@dmcgowan I'm able to docker tag <name>@<digest> <new name> - are you not?

@SvenDowideit
Copy link
Contributor

In the PR summary, you document what this feature is useful for - but this information is not reflected in the documentation changes you've made.

Can you perhaps add the use case example and explanation to the docs too?

@ncdc
Copy link
Contributor Author

ncdc commented Mar 16, 2015

I can do that. Which doc do you think that should go in?
On Mon, Mar 16, 2015 at 2:45 AM Sven Dowideit notifications@github.com
wrote:

In the PR summary, you document what this feature is useful for - but this
information is not reflected in the documentation changes you've made.

Can you perhaps add the use case example and explanation to the docs too?


Reply to this email directly or view it on GitHub
#11109 (comment).

@ncdc
Copy link
Contributor Author

ncdc commented Mar 16, 2015

Actually, could we do that use case doc in a follow-up PR?

@dmcgowan
Copy link
Member

@SvenDowideit I agree with @ncdc since this is not a feature users must use or get familiar with right away, it might be something many users may want to use later on as we add more "by digest" functionality.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 16, 2015

Rebased

@icecrime
Copy link
Contributor

Ping @SvenDowideit: are we good?

@SvenDowideit
Copy link
Contributor

@icecrime @ncdc at this point, your reader is going to spend more time trying to think up reasons why this digest thing exists, and how and when they'd use it - so IMO, if you add a small statement, they will pass over it much faster than if you leave it as is.

But other than distracting the reader, LGTM.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 17, 2015

@icecrime is anything else needed before this can be merged?

Add ability to refer to an image by repository name and digest using the
format repository@digest. Works for pull, push, run, build, and rmi.

Signed-off-by: Andy Goldstein <agoldste@redhat.com>
@vbatts
Copy link
Contributor

vbatts commented Mar 17, 2015

looks to me like this is ready to merge

@icecrime
Copy link
Contributor

Cheers

icecrime pushed a commit that referenced this pull request Mar 17, 2015
Add support for referring to images by digest
@icecrime icecrime merged commit 0c3c645 into moby:master Mar 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet