Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

always include file extension on destination files #3490

Merged
merged 1 commit into from Mar 3, 2015
Merged

always include file extension on destination files #3490

merged 1 commit into from Mar 3, 2015

Conversation

willnorris
Copy link
Contributor

This is an initial take at always including file extensions on destination files written to disk. This is a continuation of the discussion that's happened on #2294 and #3463, and coincides with the WEBrick changes in #3452.

This should only have any impact when the site or post permalink setting is configured without either a trailing slash or the file extension for that file type. In that case, the file written to disk will contain the correct extension for that type, while the URL will remain as specified in the permalink setting.

Notably, this does break the test that was added in #2925 to support files ending in ".htm" rather than ".html" (cc @pathawks to discuss), because this is using the output_ext value to determine the expected extension for the destination file. I believe this is the correct behavior. I think that any attempt to support alternate extensions (like ".htm") should make sure one way or another that output_ext returns the desired alternate extension. Not doing so seems likely to cause more problems down the road. Either way, we certainly need to resolve this breaking test before this can be safely merged.

@@ -164,6 +164,8 @@ def destination(base_directory)
dest = site.in_dest_dir(base_directory)
path = site.in_dest_dir(dest, URL.unescape_path(url))
path = File.join(path, "index.html") if url =~ /\/$/
output_ext = Jekyll::Renderer.new(site, self).output_ext
path += output_ext if path !~ /#{output_ext}$/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if path.end_with?(output_ext)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@parkr
Copy link
Member

parkr commented Feb 21, 2015

So dope. Thanks for the PR!

@parkr
Copy link
Member

parkr commented Feb 21, 2015

What about checking for the existence of a file extension before adding this one?

>> File.extname("path/to/file.htm")
=> ".htm"
>> File.extname("path/to/file.htm").empty?
=> false
>> File.extname("path/to/file")
=> ""
>> File.extname("path/to/file").empty?
=> true

If it's empty, then add the output extension. Otherwise, pass.

Also, let's use << instead of +=.

permalink: /:title
---

Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we output {{ page.url }} here to ensure the Liquid URL is being set properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@willnorris
Copy link
Contributor Author

What about checking for the existence of a file extension before adding this one?

So I did think about that. It will certainly work, but there's two things I didn't like about it:

  1. it prevents having a period in your post title if you have an extensionless permalink structure. Perhaps a theoretical argument, as I'm not sure how often people actually use periods in post titles (I know I never do)
  2. it doesn't address the issue of always being able to trust output_ext to accurately identify the extension of the destination file. Based on how it's being used today, I don't actually see any immediate issues, but it just feels wrong to me.

That said, I'm happy to add it in if you'd like.

Also, let's use << instead of +=.

Done.

@willnorris
Copy link
Contributor Author

okay, thinking more about it, I agree that respecting existing extensions in permalink is probably the way to go for right now at least. This next commit should be safe to merge I think.

@parkr
Copy link
Member

parkr commented Feb 22, 2015

okay, thinking more about it, I agree that respecting existing extensions in permalink is probably the way to go for right now at least. This next commit should be safe to merge I think.

You make good points... people should be able to use dots in their files, and we don't want to limit file extensions... Hm...

@parkr
Copy link
Member

parkr commented Feb 22, 2015

Would you mind adding a test for Document and Page as well? Just until they're all one class...

@willnorris
Copy link
Contributor Author

okay, thinking more about it, I agree that respecting existing extensions in permalink is probably the way to go for right now at least. This next commit should be safe to merge I think.

You make good points... people should be able to use dots in their files, and we don't want to limit file extensions... Hm...

okay, yeah... I now see that there are test cases that specifically focus on files with dots in their name, so this is definitely something we'll want to support. So now I'm back to feeling like we will need to back out the test case from #2925 and find another way to support that (if at all). As @pathawks mentioned on that bug, he could setup redirects for the legacy URLs, which may be the better option here.

Right now, I'm fairly certain that Pages still end up with URLs contain the ".html" extension. That never actually impacted me, so I sort of forgot about it. I'll look into what it will take to fix that, and add tests for that and Document (as well as additional tests for dotted filenames).

@parkr
Copy link
Member

parkr commented Feb 25, 2015

I think we should be able to support htm extensions, even if just to maintain backwards compatability with the hundreds of thousands (guessing, maybe more or less) of GitHub Pages sites that are presently out there... Perhaps we do check against a list of possible file extensions? That may slow down this method considerably...

One idea is to release this in a beta, have @pathawks take a look, and come up with a fix for that later.

Right now, I'm fairly certain that Pages still end up with URLs contain the ".html" extension. That never actually impacted me, so I sort of forgot about it. I'll look into what it will take to fix that, and add tests for that and Document (as well as additional tests for dotted filenames).

Yes, I believe they are presently just as flexible as Posts, so it would be good to have them under test. Thank you!!

@pathawks
Copy link
Member

I'd be happy to test 👍

You want I should test this pull request?

@parkr
Copy link
Member

parkr commented Feb 25, 2015

@pathawks Yes please.

@willnorris
Copy link
Contributor Author

Yes, I believe they are presently just as flexible as Posts, so it would be good to have them under test. Thank you!!

Actually pages are the worst in terms of permalink flexibility (from _config.yaml that is). If you're not using :pretty, then pages always use a template of /:path/:basename:output_ext. This is a large part of #2691, which I'm trying to fix here as well (haven't uploaded that commit yet.. that's what prompted #3507)

@parkr
Copy link
Member

parkr commented Feb 25, 2015

Ok then this PR looks good to me.

@pathawks, let me know what you figure out. It's hacky, but we could also check for .htm...

@@ -280,6 +280,7 @@ def destination(dest)
# The url needs to be unescaped in order to preserve the correct filename
path = site.in_dest_dir(dest, URL.unescape_path(url))
path = File.join(path, "index.html") if self.url =~ /\/$/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this should totally be .end_with?("/")... not sure if that's for this PR but it's an optimization we should take advantage of. i'll write a benchmark to ensure this

@willnorris
Copy link
Contributor Author

@parkr: as previously noted, we do have tests that ensure that files with dots in them work, which this method of simply looking for a file extension doesn't work with. As a result, this next commit switches back to enforcing output_ext exactly. This also necessitates removing @pathawks' feature test for .htm permalinks. Supporting non-standard file extensions like this will likely need to be done in a plugin, which I'll take a look at separately, just to show how it would be done for those that are interested.

This next commit also includes tests for posts and documents.

@parkr
Copy link
Member

parkr commented Mar 2, 2015

What about allowing configuration of HTML files' file extension? is that crazy?

@willnorris
Copy link
Contributor Author

What about allowing configuration of HTML files' file extension? is that crazy?

personally I think so, yes. :)

just make it a setting

Also keep in mind that I'm fairly certain @pathawks was only interested in having ".htm" extension for legacy, imported-in content. I don't think there is any real demand for having .htm versus .html long-term, we just need a solution that is good enough to handle Pat's imported content (which I personally would still argue that just redirecting to the new URLs would be the much simpler approach)

@parkr
Copy link
Member

parkr commented Mar 2, 2015

Also keep in mind that I'm fairly certain @pathawks was only interested in having ".htm" extension for legacy, imported-in content. I don't think there is any real demand for having .htm versus .html long-term

👍

@parkr
Copy link
Member

parkr commented Mar 2, 2015

also /cc @benbalter for that great image above. ben's always talking about our 🎄 of options.

@pathawks
Copy link
Member

pathawks commented Mar 2, 2015

I'm fairly certain @pathawks was only interested in having ".htm" extension for legacy, imported-in content.

Yup.

just redirecting to the new URLs would be the much simpler approach

Yup.

@willnorris
Copy link
Contributor Author

@pathawks: in my brief testing, this plugin seems to do the trick of preserving the ".htm" extension when a post's custom permalink contains that extension:

module Jekyll
  module Convertible
    def output_ext_with_htm
      ext = output_ext_without_htm
      ext = ".htm" if self.permalink.to_s.end_with?(".htm")
      ext
    end

    alias_method :output_ext_without_htm, :output_ext
    alias_method :output_ext, :output_ext_with_htm
  end
end

Or of course, you could also just redirect from within the web server itself.

Assuming that is okay and we're fine removing this behavior from jekyll core, then this should be ready for final review. @parkr could you take another look (there's a few additional tests now as well)

This ensures that destination files for HTML posts, pages and
collections always include the proper file extension (as defined by
output_ext) regardless of permalink structure.  This allows for URLs
that contain no extension or trailing slash to still result in proper
destination files with .html extensions.

Because this change relies so heavily on output_ext accurately
identifying the extension of the destination file, this change also
removes the feature test that tested support for permalinks with a .htm
extension.  In order to support alternate file extensions, a future
patch or plugin will need to modify the output_ext value, at which point
everything else should work as expected.
@parkr
Copy link
Member

parkr commented Mar 3, 2015

I'm fine with removing this feature. We'll need to include it on our upgrading docs page.

@parkr
Copy link
Member

parkr commented Mar 3, 2015

This looks great to me! :shipit:

Thanks, Will!

@willnorris
Copy link
Contributor Author

I don't think it necessarily needs mentioning in the upgrade docs, since this never actually worked until #3014, which has only shipped with 3.0.0.beta1.

(and just to make sure, "ship-it squirrel" == "go ahead and merge"? Or should I wait for you or someone else to merge?)

@pathawks
Copy link
Member

pathawks commented Mar 3, 2015

👍

@parkr
Copy link
Member

parkr commented Mar 3, 2015

(and just to make sure, "ship-it squirrel" == "go ahead and merge"? Or should I wait for you or someone else to merge?)

Yep, :shipit: means I'm cool merging it. Thanks for checking :)

@alfredxing
Copy link
Member

Looks good! Though this only works for Markdown and converted files, right?

So if I had a a-page.html:

---
layout: default
permalink: /a-page
---
<strong>This is a page</strong>

It would output as /a-page, not /a-page.html?

@willnorris
Copy link
Contributor Author

It works just as well for non-converted files. If there is no converter registered for the file extension, then output_ext returns the extension of the source file. So in your example, the page would have a url of /a-page, but the destination file would be /a-page.html.

willnorris added a commit that referenced this pull request Mar 3, 2015
always include file extension on destination files
@willnorris willnorris merged commit 5d1a24e into jekyll:master Mar 3, 2015
willnorris added a commit that referenced this pull request Mar 3, 2015
@willnorris willnorris deleted the ext branch March 3, 2015 18:55
@alfredxing
Copy link
Member

@willnorris Right, forgot about that. Thanks!

@snickers0129
Copy link

I like the idea of configuring the html extension for .html files as well... I'm hosting on Amazon S3, which supports serving files that are extension-less and doesn't really offer configuration

@willnorris
Copy link
Contributor Author

@snickers0129 you could do a variation of #3490 (comment) to customize the extension for HTML output files.

@willnorris
Copy link
Contributor Author

err, you might actually need to patch the Markdown converter, but it would still be the output_ext method.

@envygeeks
Copy link
Contributor

Why not just do folder based navigation? The AWS docs states that's supported?

@snickers0129
Copy link

@willnorris Thanks! I saw that... I just thought I'd cast my vote!

@envygeeks You mean putting folders with index.html in them? That causes 302 redirects from url/page to url/page/

@envygeeks
Copy link
Contributor

Oh fair enough, I didn't notice it would do that crazy mess, but I guess I shouldn't be surprised it is a CDN.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants