New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental regeneration #3116
Incremental regeneration #3116
Conversation
Amazing work. 🙇 |
/cc @mattr- for his 👀 as well. |
|
||
# Private: The metadata file storing dependency tree and build history | ||
# | ||
# Returns an Array with the metdata file as the only item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo for 'metadata' 😃
Yes, for Liquid 2.x at least. No idea about Liquid 3.x. Are we using Liquid 2.x or are we upgrading to Liquid 3.x soon? |
@@ -17,7 +17,7 @@ class Configuration < Hash | |||
# Handling Reading | |||
'safe' => false, | |||
'include' => ['.htaccess'], | |||
'exclude' => [], | |||
'exclude' => ['.jekyll-metadata'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is Jekyll internal, I'm wondering if the metadata file is something that we should always exclude, no matter what. In my mind, this implies that we would move it out of configuration.
Upgrading to Liquid 3.x for sure. |
@mattr- Thanks for the comments! I'll fix them up as suggested. |
fce7d4c
to
ac03af3
Compare
Finished fixing unit tests (quite a few were failing because of the no-render, no-write policy for unchanged files introduced by this), and implemented the suggestions. Now I just need to tackle the cucumber tests! I also rebased onto master to maintain better merge compatibility. |
You are a boss. Great work! |
Tossing all sorts of confetti here. Will work to get this GH-Pages compatible when it lands in a Jekyll version. |
All Cucumber tests pass on my machine, let's see what Travis thinks (might take a while since the queue seems to be quite long right now). I also added a @gjtorikian Would it be helpful if I added a configuration option to specify the location of the metadata file? |
Allegory to |
@parkr Yeah, pretty much. I was going to use it in the Cucumber tests but I fixed it the right way in the end 😛 |
GitHub Pages won't need this at all – most builds take < 4 seconds to complete, and the build environment is ephemeral like Travis. Maybe for Pages you can add a That reminds me – what are we calling this? Cache? Incremental regeneration? Partial rebuild? I like option 1 the most, but I'm open to less-tech-savvy names. @jglovier / @gjtorikian, you're both very smart with the people. What might you call this feature? |
For non-techies, I would probably call it "lazy regeneration" or "lazy rebuild" (similar to "lazy loading", since unchanged files aren't unnecessarily regenerated). |
If you know that it exists you can handle any name for it, I think, though I'd say that the word cache is a thing I'm looking to avoid with Jekyll (if you've ever had to do layers of caching to get Wordpress to run decently), so the connotation may be bad for some. I vote for some variant of "partial build" since I think that speaks directly to what it's doing. But really here to say a big Thanks! guys. Game changer for Jekyll and a great addition! |
@@ -26,6 +29,7 @@ def initialize(site, base, name) | |||
@site = site | |||
@base = base | |||
@name = name | |||
@path = Jekyll.sanitized_path(site.source, File.join(base, name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use site.in_source_dir(base, name)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's much simpler. I'll change the rest as well.
@parkr Having it in the Git repo would allow for faster build times for cloners, but on the other hand it's not really something we want completely exposed... |
On the logic part, the entire method was rather confusing and could probably be organized in a way that combines it into a single simple if. I would short-circuit some of that (but I'd have to reread the entire class to get a firm grasp on it to see if it could be organized in a better way.) On the metadata part, if that is the case then please do marshal it to a single file instead of a folder of files because meta data is small, but the file amount will be high, files are expensive and it will be cheaper in the long run to store it in a single file than thousands of smaller files. The cost of data storage and IO only benefits from large files (from my experience.) That said, I must also imply that if it cannot keep track of itself between runs (and it's not a persistent process) that in and of itself adds a performance penalty that doesn't necessarily add a benefit to using this no? Unless I am completely misunderstanding but from what I'm to believe you would need to wrap around as a server with inotify/fswatch and track these file changes efficiently and proc off into a run. I don't fully understand the entirety of this code though as I've only gone over parts of it so I'll have a full look at it today or tomorrow and get a full grasp of it. |
@envygeeks I'll see what I can do about the method. The metadata is currently stored in a single YAML file, |
@envygeeks Does something like this for # Checks if a path should be regenerated
#
# Returns a boolean.
def regenerate?(path, skip_add = false)
return true if disabled?
# Check for path in cache
if cache.has_key? path
return cache[path]
end
# Check path that exists in metadata
data = metadata[path]
if data
data["deps"].each do |dependency|
if regenerate?(dependency)
cache[dependency] = cache[path] = true
return false
end
end
if data["mtime"].eql? File.mtime(path)
cache[path] = false
return false
else
add(path) unless skip_add
return true
end
end
# Path does not exist in metadata, add it
add(path) unless skip_add
return true
end |
Man, that's a very complicated method... |
That is a pretty big method with a lot of context switching (or branches whatever you prefer to call it) but there is also a slight question that I feel the need to ask because your contexts muddy the matter, why are you short-circuiting |
Yeah, it's quite a large function but I'm not really sure how (if it even should) to split it up. Maybe the middle @envygeeks Thanks for catching that, it's a typo in my comment (originally it was |
Hooray! |
…quid discovered when testing incremental regen in jekyll/jekyll#3116 -- means changes get picked up in 1s instead of 10s, using the next version of jekyll (unfortunately not yet gemmed)
works well for us in brooklyn (https://github.com/apache/incubator-brooklyn/tree/master/docs ... modulo some liquid-induced changes apache/incubator-brooklyn#450 ). specifics in case it's useful for people:
the last is very useful as we have menus built from a generator and incremental doesn't pick up all the dependencies there. but that's understandable, i like fast incremental as the default and great to have this! |
…quid discovered when testing incremental regen in jekyll/jekyll#3116 -- means changes get picked up in 1s instead of 10s, using the next version of jekyll (unfortunately not yet gemmed)
…quid discovered when testing incremental regen in jekyll/jekyll#3116 -- means changes get picked up in 1s instead of 10s, using the next version of jekyll (unfortunately not yet gemmed)
I've been testing it out. Couple of liquid 3.0 fixes, but no biggie. But I did run into an issue when trying to serve it up, any ideas? Here's the stack trace:
|
…quid discovered when testing incremental regen in jekyll/jekyll#3116 -- means changes get picked up in 1s instead of 10s, using the next version of jekyll (unfortunately not yet gemmed)
@crearc Please file an issue for that, I'd be happy to fix that today unless @alfredxing wants to. |
Two folders take turns being public_html. This allows us not to copy most of the files, while still being safe to cancel the script at any time. Users will never end up with a partially deployed site. This feature won't be very useful until Jekyll stable has incremental site regeneration, which is coming soon. jekyll/jekyll#3116
This is really great work, we are seeing huge improvements on our site. Thanks so much @alfredxing and everybody else who was involved. |
Almost complete implementation of incremental regeneration (#380 and #3060) between as well as within a process. Currently handles file content changes and dependency changes (layouts and includes only). This implementation takes a few things from @mattr-'s implementation in #1761.
Here is what happens on each build:
Metadata
class is initialized and reads in metadata from.jekyll-metadata
(metadata is not read if the--clean
flag is set).reset
,read
, andgenerate
as it did before.regenerate: true
in its front-matter.cleanup
is performed if the--clean
flag is set..jekyll-metadata
, are written inwrite
.Remarks:
site.x
are not resolved. (This is why I included theregenerate
front-matter override.) Any ideas on how we can make this work?Todo: