Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved push and pull with upload manager and download manager #18353

Merged
merged 2 commits into from Dec 10, 2015

Conversation

aaronlehmann
Copy link
Contributor

This commit adds a transfer manager which deduplicates and schedules
transfers, and also an upload manager and download manager that build on
top of the transfer manager to provide high-level interfaces for uploads
and downloads. The push and pull code is modified to use these building
blocks.

Some benefits of the changes:

  • Simplification of push/pull code
  • Pushes can upload layers concurrently
  • Failed downloads and uploads are retried after backoff delays
  • Cancellation is supported, but individual transfers will only be
    cancelled if all pushes or pulls using them are cancelled.
  • The distribution code is decoupled from Docker Engine packages and API
    conventions (i.e. streamformatter), which will make it easier to split
    out.

This commit also includes unit tests for the new distribution/xfer
package. The tests cover 87.8% of the statements in the package.

fixes #18498

@LK4D4
Copy link
Contributor

LK4D4 commented Dec 2, 2015

It doesn't change design really. I think we can move to code-review.

@jessfraz
Copy link
Contributor

jessfraz commented Dec 2, 2015

agreed

@@ -731,7 +735,8 @@ func NewDaemon(config *Config, registryService *registry.Service) (daemon *Daemo
return nil, err
}

distributionPool := distribution.NewPool()
d.downloadManager = xfer.NewLayerDownloadManager(d.layerStore, 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 5 should be a const. Also why 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should probably be defined in a constant.

The number 5 is arbitrary and just seemed like a sane number of simultaneous transfers. It could be made tunable if people think it's worth the effort.

@aaronlehmann aaronlehmann force-pushed the transfer-manager branch 2 times, most recently from 08d2bb8 to 0d0c39e Compare December 2, 2015 18:08
@aaronlehmann
Copy link
Contributor Author

I did some benchmarks comparing this PR with the existing code on master. The numbers were almost exactly the same. I pulled golang:latest repeatedly from the hub, and from a local registry.

Using the hub, the average pull time with this patch was 27.55 seconds, versus 27.74 with master.

Using a local registry, the average pull time with this patch was 20.21 seconds, versus 20.01 seconds with master.

In both cases, the runtime is dominated by registering the layers, which can't be parallelized. The patched version could be somewhat faster in some cases, because it prioritizes downloading the bottom layers so registration can start earlier. My internet connection is fast enough that this doesn't show up much in my benchmark. With a slower connection, I think it could make a big difference.

This patch wraps downloads and registrations with an io.Pipe to support cancellation (see cancelReadCloser). Fortunately, this doesn't seem to have a significant performance impact.

@aaronlehmann
Copy link
Contributor Author

After some experimentation, I've found that changing the download concurrency limit from 5 to 3 helps with slower links. I've made this change.

I used Network Link Conditioner to simulate a 40 mbps network connection. Pulling wordpress:latest with the latest version of the PR takes 47.06s, vs 52.16s on master. Pulling golang:latest takes 62.42s, vs 66.19s on master.

The differences here are caused by fewer simulatenous downloads allowing the base layers to complete first. Then they can be unpacked and registered while other layers are still downloading.

logrus.Errorf("Failed to remove temp file: %s", tmpFile.Name())
}

return nil, 0, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause a retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The thinking is that if we receive something that doesn't match the digest, it could have been corrupted in transit, for example by a misbehaving proxy, or a broken server in a load balancing rotation. Retrying has a chance of getting the expected content.

There is also an argument to be made that if we don't get the right data the first time, we aren't likely to get it after a retry either.

I'm not sure what makes most sense in this case.

@aaronlehmann aaronlehmann force-pushed the transfer-manager branch 2 times, most recently from 4d4d073 to 3d4f50e Compare December 3, 2015 02:58
@aaronlehmann
Copy link
Contributor Author

I've added another commit that cleans up the progress monitoring.

It unifies the channel-based ProgressReader under distribution/xfer with the traditional version in pkg/progressreader, creating pkg/progress which replaces both. It adds a method to StreamFormatter that creates a channel to pass to NewProgressReader for automatic stream formatting of progress items. I've updated existing progressreader users to use this, and also replaced writeDistributionProgress. Finally, there are new utility methods in pkg/progress to output simple or printf-formatted messages or updates to a progress channel.

If people feel like this is the right direction, I'll squash the commits before merge.

@aaronlehmann
Copy link
Contributor Author

Updated the new commit after some discussion with @tonistiigi. Now the common ProgressReader uses an interface to send updates instead of writing to a channel directly. This keeps it synchronous for the use cases that output directly to StreamFormatter. PTAL.

@LK4D4
Copy link
Contributor

LK4D4 commented Dec 4, 2015

I think previous implementation of progress was cleaner. It was so nice to have progress without crappy streamformatter.

@thaJeztah
Copy link
Member

thanks! docs LGTM

ping @moxiegirl @SvenDowideit for review

@moxiegirl
Copy link
Contributor

I hear this is expedited work. I'll LGTM this and create a carry PR.

@LK4D4
Copy link
Contributor

LK4D4 commented Dec 10, 2015

@moxiegirl thanks

LK4D4 added a commit that referenced this pull request Dec 10, 2015
Improved push and pull with upload manager and download manager
@LK4D4 LK4D4 merged commit ac453a3 into moby:master Dec 10, 2015
@thaJeztah thaJeztah added this to the 1.10 milestone Dec 16, 2015
@aaronlehmann aaronlehmann deleted the transfer-manager branch January 12, 2016 18:29
aaronlehmann added a commit to aaronlehmann/docker that referenced this pull request Mar 1, 2016
Concurrent uploads which share layers worked correctly as of moby#18353,
but unfortunately moby#18785 caused a regression. This PR removed the logic
that shares digests between different push sessions. This overlooked the
case where one session was waiting for another session to upload a
layer.

This commit adds back the ability to propagate this digest information,
using the distribution.Descriptor type because this is what is received
from stats and uploads, and also what is ultimately needed for building
the manifest.

Surprisingly, there was no test covering this case. This commit adds
one. It fails without the fix.

See recent comments on moby#9132.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@dts
Copy link

dts commented Mar 2, 2016

This feature has completely broken docker for my use case - I can no longer upload from my home computer, as large layers compete for limited bandwidth and time each other out. Is there a way to disable this behavior at the command line?

@stevvooe
Copy link
Contributor

stevvooe commented Mar 2, 2016

@dts Without more information, it is unlikely that this PR is the root cause of your particular issue without completing an analysis. Typically, reporting issues on merged PRs is an ineffective venue for solving your problem.

I'd recommend submitting another issue or reaching out the mailing list for help. Submit the issue and reference this as a possible cause. However, please ensure that you separate the description of the problem from the perceived cause, as it may be another cause.

@dts
Copy link

dts commented Mar 4, 2016

@stevvooe, thanks for your patience and guidance - I've added an issue here: #20936.

@stevvooe
Copy link
Contributor

stevvooe commented Mar 4, 2016

@dts No problem. Make sure to fill in that issue with as many details as possible. At least, it should include the docker version and the exact output when the error is encountered. From there, I'd recommend including logs from daemon in debug mode, as that will provide clues as to how many simultaneous streams are starting and their actual scope.

@dts
Copy link

dts commented Mar 4, 2016

@stevvooe - It's more of a feature request for me - I'd like to reduce the concurrency of sending images, does having my docker version and output text help with that?

@stevvooe
Copy link
Contributor

stevvooe commented Mar 4, 2016

@dts Its preferable to understand the root cause of the issue rather than adding another feature that may or may not address your problem. This PR is not really a feature and fixes to its behavior would be a feature, either.

@dts
Copy link

dts commented Mar 4, 2016

@stevvooe makes sense - context is added to the fresh new issue, thanks again for your help!

tiborvass pushed a commit to tiborvass/docker that referenced this pull request Mar 7, 2016
Concurrent uploads which share layers worked correctly as of moby#18353,
but unfortunately moby#18785 caused a regression. This PR removed the logic
that shares digests between different push sessions. This overlooked the
case where one session was waiting for another session to upload a
layer.

This commit adds back the ability to propagate this digest information,
using the distribution.Descriptor type because this is what is received
from stats and uploads, and also what is ultimately needed for building
the manifest.

Surprisingly, there was no test covering this case. This commit adds
one. It fails without the fix.

See recent comments on moby#9132.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
(cherry picked from commit 5c99eeb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docker push is not parallel