New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative for Large Files #73
Comments
This is exactly what I though of. We don't need to change much to support this. Some clarification to point out the difference to small files: MetaFile (small files)
LargeMetaFile (large files)
Clients having access to this file and currently being online are contacted with the request for a single chunk of the large file. The request contains the following:
The requested client checks its filesystem for the existence of the file. If the file exists, the client reads from the offset for the given length and returns that file-part. He sends a response to the client containing to following attributes:
The requester can verify the correctness of the file-part using the hash in the meta file. What I don't know yet is whether we need versioning at large files. This can decrease the sync performance dramatically: Not only clients need to be online at the same time, but they also need to have the same version. An improvement of the solution above is that the client stores the file-parts at a temporary location. He can then go offline and continue fetching the rest of the file-parts at another moment in time without re-fetching all file-parts again. We need to ensure that every operation works at file-part level and it's never necessary to have the full fill in-memory (which would cause troubles when handling large files). BTW: We did already consider this when syncing small files. |
I vote for not having it.
But will be possible to downloads chunks not in order? May I ask why not using an already made protocol like torrent to manage chunks and downloads? |
Thanks, @0vermind for the comment. I think, too, that large files should not be versioned in the first iteration. We already implemented a torrent-like protocol, which downloads file parts and assembles them as soon as all chunks are available. The implementation is done by ourself, not relying on an external library. Later on, we could consider using an optimized protocol supporting for example also compression or erasure coding (see #56). To your question, whether it will be possible to download chunks not in order: Yes, this is possible. It even goes one step further, allowing a parallel download of the chunks (same as in torrent). |
A basic implementation for the support of large files has been done. Every download of a file is now handled by the DownloadManager. Tasks can be submitted to the manager. A task may be downloading a file from the DHT or to download it directly from another user (in case of a large file). When the user logs out, active downloads are serialized and stored. When he logs in again, the downloads can be continued, without re-downloading all parts. This has not been heavily tested and needs further improvements (the synchronization step during the login does not yet know that the download is already running and may initialize another download). |
Thomas proposed a further improvement which might get very popular in our growing fan base. H2H has currently (according to the file configuration) no limitation in file size. However large files can flood the network and use too much resources. An approach could be to provide the functionality which we already implemented in the predecessor project Box2Box, similar to the BTSync approach. H2H would store only meta data of large files. Syncing would be possible only when a node is online which is holding a copy of the large file.
The text was updated successfully, but these errors were encountered: