Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

stage0: added ability to resume interrupted downloads #1444

Merged
merged 1 commit into from Nov 11, 2015

Conversation

cgonyeo
Copy link
Member

@cgonyeo cgonyeo commented Sep 18, 2015

Downloads for a given url are now saved to a file named after a sha512
hash of the url. If the file already exists for a requested download, a
check will be made to see if the server supports HTTP range requests. If
such requests are supported then only the missing part of the file will
be downloaded, otherwise the entire file will be downloaded as before.

@cgonyeo
Copy link
Member Author

cgonyeo commented Sep 18, 2015

I don't think quay.io supports Range requests, so to see this in action you'll need to pull an aci from somewhere that does.

I've currently hosting one via nginx (will need --insecure-skip-verify) at haruko.gonyeo.com/cshrtp-members

@cgonyeo
Copy link
Member Author

cgonyeo commented Sep 18, 2015

This is to (partially) address #1280.

@steveej
Copy link
Contributor

steveej commented Sep 19, 2015

What if the file at the URL changes between download start and resume?

@cgonyeo
Copy link
Member Author

cgonyeo commented Sep 19, 2015

Not completely sure until I'm back at my desk on Monday, but I imagine that it'd either fail verification, or if that's been turned off it would (I hope) no longer be a valid tar file and thus rkt would fail to run it.

If that's the case, perhaps I should add logic to tell when one of these failures happens from an image that was downloaded in multiple parts, and then retry the download?

@vcaputo
Copy link
Contributor

vcaputo commented Sep 21, 2015

Can't we treat the file being resumed from as a cached partial copy of the remote resource and use the last modified or etag header to invalidate the cached copy?

@cgonyeo
Copy link
Member Author

cgonyeo commented Sep 21, 2015

I imagine so, I'll go implement that. Those headers could be not supported though, right? Think the behavior in that case should be to just redownload the whole thing?

@jonboulle jonboulle added this to the v0.9.0 milestone Sep 21, 2015
@vcaputo
Copy link
Contributor

vcaputo commented Sep 22, 2015

It seems reasonable to me that we'd emit an explanation and re-download the whole thing if correctly (in)validating the cache isn't possible. Should probably get consensus from the rest of the rkt team on that decision.

@jonboulle
Copy link
Contributor

SGTM

1 similar comment
@iaguis
Copy link
Member

iaguis commented Sep 23, 2015

SGTM

@jonboulle
Copy link
Contributor

@dgonyeo where are we on this?

@cgonyeo
Copy link
Member Author

cgonyeo commented Oct 1, 2015

I've been distracted with acbuild, I'll give this some attention today.

On Sep 30, 2015, at 22:40, Jonathan Boulle notifications@github.com wrote:

@dgonyeo where are we on this?


Reply to this email directly or view it on GitHub.

@jonboulle
Copy link
Contributor

@dgonyeo is this ready?

@cgonyeo
Copy link
Member Author

cgonyeo commented Oct 6, 2015

Nope, but I made a little more progress today. I needed to come up with a way to store a download's etag before it finishes downloading, in case of rkt being killed during the download. I should have it done tomorrow.

@cgonyeo
Copy link
Member Author

cgonyeo commented Oct 6, 2015

And by "tomorrow" I meant "in a half hour". Should be ready for another pass again.

@@ -226,6 +226,21 @@ func (s Store) TmpFile() (*os.File, error) {
return ioutil.TempFile(dir, "")
}

// TmpNamedFile returns an *os.File with the specified name local to the same
// filesystem as the Store, or any error encountered. If the file already
// exists it will return the existing file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention it'll be in append mode

@jonboulle jonboulle added this to the v0.10.0 milestone Oct 6, 2015
@jonboulle jonboulle removed this from the v0.10.0 milestone Oct 20, 2015
@iaguis
Copy link
Member

iaguis commented Oct 29, 2015

Do you think you'll have time to add some tests?

@cgonyeo
Copy link
Member Author

cgonyeo commented Oct 29, 2015

Whoops, completely forgot about this. Sorry about that. I'll write some tests.

@cgonyeo cgonyeo force-pushed the resumable-downloads branch 3 times, most recently from 8cce34c to fc9a2c4 Compare November 4, 2015 19:38
@cgonyeo
Copy link
Member Author

cgonyeo commented Nov 4, 2015

I added a test. It starts up a little server hosting an image, has rkt fetch the image, kills it when it reaches 50%, has rkt fetch it again, and ensures that rkt downloads only a part of the image the second time.

Think any other tests would be warranted, or is this enough?

@alban
Copy link
Member

alban commented Nov 9, 2015

@dgonyeo I think testing an interrupted download on a server which does not support ranges or etags would be good.

@cgonyeo cgonyeo force-pushed the resumable-downloads branch 2 times, most recently from df3a226 to 2e20ba2 Compare November 9, 2015 17:52
@cgonyeo
Copy link
Member Author

cgonyeo commented Nov 9, 2015

I added two more tests: one for a server that doesn't support last-modified or etags, and one that sets the last-modified header to one minute in the future.

rangeHeader := rangeHeaders[0][6:] // The first (and only) range header, with len("bytes=") characters chopped off the front
tokens := strings.Split(rangeHeader, "-")
if len(tokens) != 2 {
t.Fatalf("number of '-'s in range header was not 1, it was %d", len(tokens))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be: len(tokens) - 1

or just: t.Fatalf("couldn't parse rangeHeader: %q", rangeHeader)

@cgonyeo cgonyeo force-pushed the resumable-downloads branch 2 times, most recently from 0d23ea9 to b24cd35 Compare November 10, 2015 00:06
@@ -636,6 +713,8 @@ func (f *fetcher) downloadHTTP(url, label string, out writeSyncer, etag string)
return nil, fmt.Errorf("bad HTTP status code: %d", res.StatusCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the error is http.StatusRequestedRangeNotSatisfiable (416), then there is a bug in the cache management (in rkt or in the server). Should we delete the partially downloaded file and the etag file in order to recover from that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, but I'm curious if this is an edge case that will ever be hit, since at this point we already verified that range requests were supported via the HEAD request.

Could this happen if the image size changed since the first part was cached, resulting in the requested starting point being larger than the size of the new image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the logic for this. If http.StatusRequestedRangeNotSatisfiable is the HTTP code, the out writeSyncer is a file, and the file has a non-zero size, then we seek to the beginning of the file, truncate it to 0 bytes, and retry the download. Otherwise we fallthrough to the error case.

// rkt tries to use the cache, the hash won't check out.
server2 := httptest.NewServer(testSimpleServerHandler(t, imagePath))
defer server2.Close()
cmd = fmt.Sprintf("%s --no-store --insecure-skip-verify fetch %s", ctx.Cmd(), server2.URL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server.URL might not be the same as server2.URL, so rkt might not even try to use the cache.

Downloads for a given url are now saved to a file named after a sha512
hash of the url. If the file already exists for a requested download, a
check will be made to see if the server supports HTTP range requests. If
such requests are supported then only the missing part of the file will
be downloaded, otherwise the entire file will be downloaded as before.
@cgonyeo
Copy link
Member Author

cgonyeo commented Nov 10, 2015

Weird, TestFetch failed on semaphoreci, but it passes locally for me. I also didn't modify the test...

@iaguis
Copy link
Member

iaguis commented Nov 10, 2015

The error is:

error getting image json: HTTP code: 500, URL: https://registry-1.docker.io/v1/images/d1592a710ac323612bd786fa8ac20727c58d8a67847e5a65177c594f43919498/json

That's an internal server error. We get them sometimes, I'll restart the test.

@alban
Copy link
Member

alban commented Nov 11, 2015

LGTM

alban added a commit that referenced this pull request Nov 11, 2015
stage0: added ability to resume interrupted downloads
@alban alban merged commit 07e6ee4 into rkt:master Nov 11, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants