Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental regeneration #3116

Merged
merged 17 commits into from Dec 22, 2014
Merged

Incremental regeneration #3116

merged 17 commits into from Dec 22, 2014

Conversation

alfredxing
Copy link
Member

Almost complete implementation of incremental regeneration (#380 and #3060) between as well as within a process. Currently handles file content changes and dependency changes (layouts and includes only). This implementation takes a few things from @mattr-'s implementation in #1761.

Here is what happens on each build:

  1. The Metadata class is initialized and reads in metadata from .jekyll-metadata (metadata is not read if the --clean flag is set).
  2. Jekyll performs reset, read, and generate as it did before.
  3. In the render step, all documents, pages, and posts are checked for modifications to itself or its dependencies. A document/page/post is also marked as modified if it has regenerate: true in its front-matter.
  4. cleanup is performed if the --clean flag is set.
  5. Modified files, as well as the new .jekyll-metadata, are written in write.

Remarks:

  • Although initial (no metadata present) build times went up a tiny bit, builds with metadata are super quick (here are the results from the jekyllrb.com site)
  • Dependencies in the form of referencing posts/pages/collections/data in Liquid with site.x are not resolved. (This is why I included the regenerate front-matter override.) Any ideas on how we can make this work?
  • The metadata file is currently YAML (loaded with SafeYAML). Is this optimal, or would another serialization method work better?

Todo:

  • Implement behaviour
  • Fix failing tests
  • Add tests (cucumber & unit)
  • Add more unit tests!

@parkr
Copy link
Member

parkr commented Nov 17, 2014

Amazing work. 🙇

@parkr
Copy link
Member

parkr commented Nov 17, 2014

/cc @mattr- for his 👀 as well.


# Private: The metadata file storing dependency tree and build history
#
# Returns an Array with the metdata file as the only item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo for 'metadata' 😃

@mattr-
Copy link
Member

mattr- commented Nov 17, 2014

Dependencies in the form of referencing posts/pages/collections/data in Liquid with site.x are not resolved. (This is why I included the regenerate front-matter override.) Any ideas on how we can make this work?

Yes, for Liquid 2.x at least. No idea about Liquid 3.x. Are we using Liquid 2.x or are we upgrading to Liquid 3.x soon?

@@ -17,7 +17,7 @@ class Configuration < Hash
# Handling Reading
'safe' => false,
'include' => ['.htaccess'],
'exclude' => [],
'exclude' => ['.jekyll-metadata'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is Jekyll internal, I'm wondering if the metadata file is something that we should always exclude, no matter what. In my mind, this implies that we would move it out of configuration.

@parkr
Copy link
Member

parkr commented Nov 17, 2014

Are we using Liquid 2.x or are we upgrading to Liquid 3.x soon?

Upgrading to Liquid 3.x for sure.

@alfredxing
Copy link
Member Author

@mattr- Thanks for the comments! I'll fix them up as suggested.

@alfredxing
Copy link
Member Author

Finished fixing unit tests (quite a few were failing because of the no-render, no-write policy for unchanged files introduced by this), and implemented the suggestions. Now I just need to tackle the cucumber tests!

I also rebased onto master to maintain better merge compatibility.

@parkr
Copy link
Member

parkr commented Nov 22, 2014

You are a boss. Great work!

@gjtorikian
Copy link
Member

Tossing all sorts of confetti here. Will work to get this GH-Pages compatible when it lands in a Jekyll version.

@alfredxing
Copy link
Member Author

All Cucumber tests pass on my machine, let's see what Travis thinks (might take a while since the queue seems to be quite long right now).

I also added a jekyll clean command for easy cleaning of output and metadata (kind of like --full-rebuild but without the build part).


@gjtorikian Would it be helpful if I added a configuration option to specify the location of the metadata file?

@parkr
Copy link
Member

parkr commented Nov 23, 2014

I also added a jekyll clean command for easy cleaning of output and metadata

Allegory to make clean? 👍

@alfredxing
Copy link
Member Author

@parkr Yeah, pretty much. I was going to use it in the Cucumber tests but I fixed it the right way in the end 😛

@parkr
Copy link
Member

parkr commented Nov 23, 2014

@gjtorikian Would it be helpful if I added a configuration option to specify the location of the metadata file?

GitHub Pages won't need this at all – most builds take < 4 seconds to complete, and the build environment is ephemeral like Travis. Maybe for Pages you can add a --no-cache flag or something?

That reminds me – what are we calling this? Cache? Incremental regeneration? Partial rebuild? I like option 1 the most, but I'm open to less-tech-savvy names. @jglovier / @gjtorikian, you're both very smart with the people. What might you call this feature?

@alfredxing
Copy link
Member Author

For non-techies, I would probably call it "lazy regeneration" or "lazy rebuild" (similar to "lazy loading", since unchanged files aren't unnecessarily regenerated).

@budparr
Copy link

budparr commented Nov 23, 2014

If you know that it exists you can handle any name for it, I think, though I'd say that the word cache is a thing I'm looking to avoid with Jekyll (if you've ever had to do layers of caching to get Wordpress to run decently), so the connotation may be bad for some. I vote for some variant of "partial build" since I think that speaks directly to what it's doing.

But really here to say a big Thanks! guys. Game changer for Jekyll and a great addition!

@alfredxing alfredxing closed this Nov 23, 2014
@alfredxing alfredxing reopened this Nov 23, 2014
@@ -26,6 +29,7 @@ def initialize(site, base, name)
@site = site
@base = base
@name = name
@path = Jekyll.sanitized_path(site.source, File.join(base, name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use site.in_source_dir(base, name).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's much simpler. I'll change the rest as well.

@alfredxing
Copy link
Member Author

@envygeeks

  • I think so; there's no other way to keep the metadata persistently (between runs).
  • Which lines/parts did you have trouble with?

@parkr Having it in the Git repo would allow for faster build times for cloners, but on the other hand it's not really something we want completely exposed...

@envygeeks
Copy link
Contributor

@alfredxing

On the logic part, the entire method was rather confusing and could probably be organized in a way that combines it into a single simple if. I would short-circuit some of that (but I'd have to reread the entire class to get a firm grasp on it to see if it could be organized in a better way.)

On the metadata part, if that is the case then please do marshal it to a single file instead of a folder of files because meta data is small, but the file amount will be high, files are expensive and it will be cheaper in the long run to store it in a single file than thousands of smaller files. The cost of data storage and IO only benefits from large files (from my experience.)

That said, I must also imply that if it cannot keep track of itself between runs (and it's not a persistent process) that in and of itself adds a performance penalty that doesn't necessarily add a benefit to using this no? Unless I am completely misunderstanding but from what I'm to believe you would need to wrap around as a server with inotify/fswatch and track these file changes efficiently and proc off into a run. I don't fully understand the entirety of this code though as I've only gone over parts of it so I'll have a full look at it today or tomorrow and get a full grasp of it.

@alfredxing
Copy link
Member Author

@envygeeks I'll see what I can do about the method.

The metadata is currently stored in a single YAML file, .jekyll-metadata. It does indeed add a slight performance penalty from the increased overhead, but this penalty is quite insignificant relative to the time otherwise needed to render Liquid pages/posts/documents.

@alfredxing
Copy link
Member Author

@envygeeks Does something like this for regenerate? look better?

# Checks if a path should be regenerated
#
# Returns a boolean.
def regenerate?(path, skip_add = false)
  return true if disabled?

  # Check for path in cache
  if cache.has_key? path
    return cache[path]
  end

  # Check path that exists in metadata
  data = metadata[path]
  if data
    data["deps"].each do |dependency|
      if regenerate?(dependency)
        cache[dependency] = cache[path] = true
        return false
      end
    end
    if data["mtime"].eql? File.mtime(path)
      cache[path] = false
      return false
    else
      add(path) unless skip_add
      return true
    end
  end

  # Path does not exist in metadata, add it
  add(path) unless skip_add
  return true
end

@parkr
Copy link
Member

parkr commented Dec 12, 2014

Man, that's a very complicated method...

@envygeeks
Copy link
Contributor

That is a pretty big method with a lot of context switching (or branches whatever you prefer to call it) but there is also a slight question that I feel the need to ask because your contexts muddy the matter, why are you short-circuiting false if a dependency needs to regenerate?

@alfredxing
Copy link
Member Author

Yeah, it's quite a large function but I'm not really sure how (if it even should) to split it up. Maybe the middle data portion could be split off into a separate method?
The rewrite made it even longer since I took out assignment returns and a lot of the unclear condensed lines.

@envygeeks Thanks for catching that, it's a typo in my comment (originally it was return cache[dependency] = cache[path] = true.

parkr added a commit that referenced this pull request Dec 22, 2014
@parkr parkr merged commit 7227ad4 into jekyll:master Dec 22, 2014
parkr added a commit that referenced this pull request Dec 22, 2014
@gjtorikian
Copy link
Member

Hooray!

@parkr parkr mentioned this pull request Jan 10, 2015
ahgittin added a commit to ahgittin/incubator-brooklyn that referenced this pull request Jan 15, 2015
…quid

discovered when testing incremental regen in jekyll/jekyll#3116 --
means changes get picked up in 1s instead of 10s, using the next version of jekyll
(unfortunately not yet gemmed)
@ahgittin
Copy link

works well for us in brooklyn (https://github.com/apache/incubator-brooklyn/tree/master/docs ... modulo some liquid-induced changes apache/incubator-brooklyn#450 ).

specifics in case it's useful for people:

  • it only regens one page when we make an isolated change
  • regen time with several plugins and 100 pages is down from 10s to 1s (slightly surprised it's not down to 100ms though! 😛)
  • responds correctly to _layout and _includes changes by building dependent pages
  • serve -f triggers a full rebuild correctly

the last is very useful as we have menus built from a generator and incremental doesn't pick up all the dependencies there. but that's understandable, i like fast incremental as the default and -f is there when i need it.

great to have this!

ahgittin added a commit to ahgittin/incubator-brooklyn that referenced this pull request Jan 15, 2015
…quid

discovered when testing incremental regen in jekyll/jekyll#3116 --
means changes get picked up in 1s instead of 10s, using the next version of jekyll
(unfortunately not yet gemmed)
ahgittin added a commit to ahgittin/incubator-brooklyn that referenced this pull request Jan 15, 2015
…quid

discovered when testing incremental regen in jekyll/jekyll#3116 --
means changes get picked up in 1s instead of 10s, using the next version of jekyll
(unfortunately not yet gemmed)
@ericmillsio
Copy link

I've been testing it out. Couple of liquid 3.0 fixes, but no biggie. But I did run into an issue when trying to serve it up, any ideas? Here's the stack trace:

Configuration file: /home/eric/workspace/website/_config.yml
            Source: src
       Destination: /home/eric/workspace/website/_site
 Incremental build: enabled
      Generating... 
/home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:322:in `block in write': undefined method `regenerate?' for #<Sprockets::StaticAsset:0x007f662353c5d8> (NoMethodError)
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:481:in `block (2 levels) in each_site_file'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:480:in `each'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:480:in `block in each_site_file'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:479:in `each'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:479:in `each_site_file'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:321:in `write'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-assets-0.9.2/lib/jekyll/assets_plugin/patches/site_patch.rb:61:in `__wrap_write'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/site.rb:57:in `process'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/command.rb:28:in `process_site'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/commands/build.rb:58:in `build'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/commands/build.rb:34:in `process'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/lib/jekyll/commands/serve.rb:26:in `block (2 levels) in init_with_program'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `call'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `block in execute'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `each'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/command.rb:220:in `execute'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary/program.rb:42:in `go'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/mercenary-0.3.5/lib/mercenary.rb:19:in `program'
    from /home/eric/.rbenv/versions/2.0.0-p481/lib/ruby/gems/2.0.0/gems/jekyll-2.5.3/bin/jekyll:17:in `<top (required)>'
    from /home/eric/.rbenv/versions/2.0.0-p481/bin/jekyll:23:in `load'
    from /home/eric/.rbenv/versions/2.0.0-p481/bin/jekyll:23:in `<main>'

ahgittin added a commit to ahgittin/incubator-brooklyn that referenced this pull request Jan 15, 2015
…quid

discovered when testing incremental regen in jekyll/jekyll#3116 --
means changes get picked up in 1s instead of 10s, using the next version of jekyll
(unfortunately not yet gemmed)
@envygeeks
Copy link
Contributor

@crearc Please file an issue for that, I'd be happy to fix that today unless @alfredxing wants to.

kevin1 added a commit to columbiafsae/website that referenced this pull request Jan 17, 2015
Two folders take turns being public_html. This allows us not to copy
most of the files, while still being safe to cancel the script at
any time. Users will never end up with a partially deployed site.

This feature won't be very useful until Jekyll stable has incremental
site regeneration, which is coming soon.

jekyll/jekyll#3116
@parkr parkr mentioned this pull request Jan 18, 2015
7 tasks
@fw42
Copy link
Contributor

fw42 commented May 11, 2015

This is really great work, we are seeing huge improvements on our site. Thanks so much @alfredxing and everybody else who was involved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet