Improved push and pull with upload manager and download manager #18353

aaronlehmann · 2015-12-02T00:28:47Z

This commit adds a transfer manager which deduplicates and schedules
transfers, and also an upload manager and download manager that build on
top of the transfer manager to provide high-level interfaces for uploads
and downloads. The push and pull code is modified to use these building
blocks.

Some benefits of the changes:

Simplification of push/pull code
Pushes can upload layers concurrently
Failed downloads and uploads are retried after backoff delays
Cancellation is supported, but individual transfers will only be
cancelled if all pushes or pulls using them are cancelled.
The distribution code is decoupled from Docker Engine packages and API
conventions (i.e. streamformatter), which will make it easier to split
out.

This commit also includes unit tests for the new distribution/xfer
package. The tests cover 87.8% of the statements in the package.

fixes #18498

LK4D4 · 2015-12-02T00:47:26Z

It doesn't change design really. I think we can move to code-review.

jessfraz · 2015-12-02T00:47:44Z

agreed

LK4D4 · 2015-12-02T01:02:28Z

daemon/daemon.go

@@ -731,7 +735,8 @@ func NewDaemon(config *Config, registryService *registry.Service) (daemon *Daemo
 		return nil, err
 	}

-	distributionPool := distribution.NewPool()
+	d.downloadManager = xfer.NewLayerDownloadManager(d.layerStore, 5)


Maybe 5 should be a const. Also why 5?

Yes, this should probably be defined in a constant.

The number 5 is arbitrary and just seemed like a sane number of simultaneous transfers. It could be made tunable if people think it's worth the effort.

aaronlehmann · 2015-12-02T18:16:26Z

I did some benchmarks comparing this PR with the existing code on master. The numbers were almost exactly the same. I pulled golang:latest repeatedly from the hub, and from a local registry.

Using the hub, the average pull time with this patch was 27.55 seconds, versus 27.74 with master.

Using a local registry, the average pull time with this patch was 20.21 seconds, versus 20.01 seconds with master.

In both cases, the runtime is dominated by registering the layers, which can't be parallelized. The patched version could be somewhat faster in some cases, because it prioritizes downloading the bottom layers so registration can start earlier. My internet connection is fast enough that this doesn't show up much in my benchmark. With a slower connection, I think it could make a big difference.

This patch wraps downloads and registrations with an io.Pipe to support cancellation (see cancelReadCloser). Fortunately, this doesn't seem to have a significant performance impact.

aaronlehmann · 2015-12-02T19:17:31Z

After some experimentation, I've found that changing the download concurrency limit from 5 to 3 helps with slower links. I've made this change.

I used Network Link Conditioner to simulate a 40 mbps network connection. Pulling wordpress:latest with the latest version of the PR takes 47.06s, vs 52.16s on master. Pulling golang:latest takes 62.42s, vs 66.19s on master.

The differences here are caused by fewer simulatenous downloads allowing the base layers to complete first. Then they can be unpacked and registered while other layers are still downloading.

dmcgowan · 2015-12-02T19:20:11Z

distribution/pull_v2.go

+			logrus.Errorf("Failed to remove temp file: %s", tmpFile.Name())
+		}
+
+		return nil, 0, err


Will this cause a retry?

Yes. The thinking is that if we receive something that doesn't match the digest, it could have been corrupted in transit, for example by a misbehaving proxy, or a broken server in a load balancing rotation. Retrying has a chance of getting the expected content.

There is also an argument to be made that if we don't get the right data the first time, we aren't likely to get it after a retry either.

I'm not sure what makes most sense in this case.

aaronlehmann · 2015-12-03T03:05:09Z

I've added another commit that cleans up the progress monitoring.

It unifies the channel-based ProgressReader under distribution/xfer with the traditional version in pkg/progressreader, creating pkg/progress which replaces both. It adds a method to StreamFormatter that creates a channel to pass to NewProgressReader for automatic stream formatting of progress items. I've updated existing progressreader users to use this, and also replaced writeDistributionProgress. Finally, there are new utility methods in pkg/progress to output simple or printf-formatted messages or updates to a progress channel.

If people feel like this is the right direction, I'll squash the commits before merge.

aaronlehmann · 2015-12-03T22:31:48Z

Updated the new commit after some discussion with @tonistiigi. Now the common ProgressReader uses an interface to send updates instead of writing to a channel directly. This keeps it synchronous for the use cases that output directly to StreamFormatter. PTAL.

LK4D4 · 2015-12-04T03:57:21Z

I think previous implementation of progress was cleaner. It was so nice to have progress without crappy streamformatter.

thaJeztah · 2015-12-10T12:00:04Z

thanks! docs LGTM

ping @moxiegirl @SvenDowideit for review

moxiegirl · 2015-12-10T22:45:40Z

I hear this is expedited work. I'll LGTM this and create a carry PR.

LK4D4 · 2015-12-10T22:52:46Z

@moxiegirl thanks

Improved push and pull with upload manager and download manager

Concurrent uploads which share layers worked correctly as of moby#18353, but unfortunately moby#18785 caused a regression. This PR removed the logic that shares digests between different push sessions. This overlooked the case where one session was waiting for another session to upload a layer. This commit adds back the ability to propagate this digest information, using the distribution.Descriptor type because this is what is received from stats and uploads, and also what is ultimately needed for building the manifest. Surprisingly, there was no test covering this case. This commit adds one. It fails without the fix. See recent comments on moby#9132. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

dts · 2016-03-02T21:22:20Z

This feature has completely broken docker for my use case - I can no longer upload from my home computer, as large layers compete for limited bandwidth and time each other out. Is there a way to disable this behavior at the command line?

stevvooe · 2016-03-02T23:34:34Z

@dts Without more information, it is unlikely that this PR is the root cause of your particular issue without completing an analysis. Typically, reporting issues on merged PRs is an ineffective venue for solving your problem.

I'd recommend submitting another issue or reaching out the mailing list for help. Submit the issue and reference this as a possible cause. However, please ensure that you separate the description of the problem from the perceived cause, as it may be another cause.

dts · 2016-03-04T04:32:12Z

@stevvooe, thanks for your patience and guidance - I've added an issue here: #20936.

stevvooe · 2016-03-04T05:07:07Z

@dts No problem. Make sure to fill in that issue with as many details as possible. At least, it should include the docker version and the exact output when the error is encountered. From there, I'd recommend including logs from daemon in debug mode, as that will provide clues as to how many simultaneous streams are starting and their actual scope.

dts · 2016-03-04T05:25:58Z

@stevvooe - It's more of a feature request for me - I'd like to reduce the concurrency of sending images, does having my docker version and output text help with that?

stevvooe · 2016-03-04T08:29:45Z

@dts Its preferable to understand the root cause of the issue rather than adding another feature that may or may not address your problem. This PR is not really a feature and fixes to its behavior would be a feature, either.

dts · 2016-03-04T15:00:16Z

@stevvooe makes sense - context is added to the fresh new issue, thanks again for your help!

Concurrent uploads which share layers worked correctly as of moby#18353, but unfortunately moby#18785 caused a regression. This PR removed the logic that shares digests between different push sessions. This overlooked the case where one session was waiting for another session to upload a layer. This commit adds back the ability to propagate this digest information, using the distribution.Descriptor type because this is what is received from stats and uploads, and also what is ultimately needed for building the manifest. Surprisingly, there was no test covering this case. This commit adds one. It fails without the fix. See recent comments on moby#9132. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com> (cherry picked from commit 5c99eeb)

GordonTheTurtle added area/distribution status/0-triage labels Dec 2, 2015

thaJeztah mentioned this pull request Dec 2, 2015

docker push always fails with "push my-repo/my-app is already in progress" #5113

Closed

calavera added status/1-design-review and removed status/0-triage labels Dec 2, 2015

jessfraz added status/2-code-review and removed status/1-design-review labels Dec 2, 2015

LK4D4 reviewed Dec 2, 2015
View reviewed changes

aaronlehmann force-pushed the transfer-manager branch 2 times, most recently from 08d2bb8 to 0d0c39e Compare December 2, 2015 18:08

aaronlehmann force-pushed the transfer-manager branch from 0d0c39e to f649b31 Compare December 2, 2015 19:14

dmcgowan reviewed Dec 2, 2015
View reviewed changes

aaronlehmann force-pushed the transfer-manager branch 2 times, most recently from 4d4d073 to 3d4f50e Compare December 3, 2015 02:58

aaronlehmann force-pushed the transfer-manager branch from 3d4f50e to 8059796 Compare December 3, 2015 22:30

aaronlehmann force-pushed the transfer-manager branch from 8059796 to 317e99c Compare December 4, 2015 00:31

This was referenced Dec 4, 2015

Add progressbar for docker load #17329

Closed

api/client/cp: provide progress indicator for docker cp #17014

Closed

aaronlehmann force-pushed the transfer-manager branch from 77e3f2a to 7a37cee Compare December 4, 2015 23:23

aaronlehmann force-pushed the transfer-manager branch from 02567ea to d92e7ad Compare December 10, 2015 03:13

moxiegirl added status/4-merge and removed status/3-docs-review labels Dec 10, 2015

LK4D4 added a commit that referenced this pull request Dec 10, 2015

Merge pull request #18353 from aaronlehmann/transfer-manager

ac453a3

Improved push and pull with upload manager and download manager

LK4D4 merged commit ac453a3 into moby:master Dec 10, 2015

thaJeztah added this to the 1.10 milestone Dec 16, 2015

thaJeztah mentioned this pull request Dec 16, 2015

docs: remove ubuntu version in dockerizing userguide #18720

Merged

cpuguy83 mentioned this pull request Dec 17, 2015

UX for client download aborts #6928

Closed

aaronlehmann deleted the transfer-manager branch January 12, 2016 18:29

chekunkov mentioned this pull request Jan 26, 2016

Pushing to docker-hub is very slow, for existing images. #13454

Closed

thaJeztah mentioned this pull request Feb 28, 2016

switch to http timeout for registry #6263

Closed

aaronlehmann mentioned this pull request Mar 1, 2016

Fix concurrent uploads that share layers #20831

Merged

nuxlli mentioned this pull request Mar 3, 2016

Add support for quay.io azukiapp/azk#581

Closed

caglar10ur mentioned this pull request Oct 7, 2016

Imagec library enabling concurrent downloads vmware/vic#2587

Merged

arm4b mentioned this pull request Feb 27, 2017

Retry docker pull 1 time StackStorm/st2-packages#430

Merged

benmoss mentioned this pull request Jul 6, 2022

imgpkg copy --to-tar should detect already existing .tar (performance improvement) carvel-dev/imgpkg#330

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved push and pull with upload manager and download manager #18353

Improved push and pull with upload manager and download manager #18353

aaronlehmann commented Dec 2, 2015

LK4D4 commented Dec 2, 2015

jessfraz commented Dec 2, 2015

LK4D4 Dec 2, 2015

aaronlehmann Dec 2, 2015

aaronlehmann commented Dec 2, 2015

aaronlehmann commented Dec 2, 2015

dmcgowan Dec 2, 2015

aaronlehmann Dec 2, 2015

aaronlehmann commented Dec 3, 2015

aaronlehmann commented Dec 3, 2015

LK4D4 commented Dec 4, 2015

thaJeztah commented Dec 10, 2015

moxiegirl commented Dec 10, 2015

LK4D4 commented Dec 10, 2015

dts commented Mar 2, 2016

stevvooe commented Mar 2, 2016

dts commented Mar 4, 2016

stevvooe commented Mar 4, 2016

dts commented Mar 4, 2016

stevvooe commented Mar 4, 2016

dts commented Mar 4, 2016

Improved push and pull with upload manager and download manager #18353

Improved push and pull with upload manager and download manager #18353

Conversation

aaronlehmann commented Dec 2, 2015

LK4D4 commented Dec 2, 2015

jessfraz commented Dec 2, 2015

LK4D4 Dec 2, 2015

Choose a reason for hiding this comment

aaronlehmann Dec 2, 2015

Choose a reason for hiding this comment

aaronlehmann commented Dec 2, 2015

aaronlehmann commented Dec 2, 2015

dmcgowan Dec 2, 2015

Choose a reason for hiding this comment

aaronlehmann Dec 2, 2015

Choose a reason for hiding this comment

aaronlehmann commented Dec 3, 2015

aaronlehmann commented Dec 3, 2015

LK4D4 commented Dec 4, 2015

thaJeztah commented Dec 10, 2015

moxiegirl commented Dec 10, 2015

LK4D4 commented Dec 10, 2015

dts commented Mar 2, 2016

stevvooe commented Mar 2, 2016

dts commented Mar 4, 2016

stevvooe commented Mar 4, 2016

dts commented Mar 4, 2016

stevvooe commented Mar 4, 2016

dts commented Mar 4, 2016