Optimus Sitemap Generator

Optimus Sitemap Generator (OSG) is a universal XML sitemap generator that works by crawling your website, avoiding excessive bandwidth overhead by only scanning the contents of pages that have changed since the last time the sitemap was generated.

Download

Platform Version Package Size
Linux (32-bit) 1.0 – 2012-05-08 osg-1.0-i386.tar.gz 1.3 MB
Linux (64-bit) 1.0 – 2012-05-08 osg-1.0-amd64.tar.gz 1.2 MB
Git repository Development https://github.com/patrickmn/osg

Usage Examples

Command Explanation
./osg sitemap.xml example.com Generate sitemap.xml by crawling the pages on example.com
./osg sitemap.xml.gz example.com/science/ Generate sitemap.xml.gz by crawling the pages linked to on example.com/science/
./osg -c 10 sitemap.xml a.com b.com Generate sitemap.xml by crawling the pages linked to on a.com and b.com, crawling up to 10 pages at once
./osg -v sitemap.xml example.com Generate sitemap.xml by crawling the pages linked to on example.com, showing additional information about the crawling process

Run ./osg without any parameters to get an overview and explanation of all available options.

Features

  • Crawls pages only once, even if they’re linked to from many pages
  • Reads existing sitemap and avoids scanning pages that haven’t been updated
  • Crawling an arbitrary number of URLs simultaneously
  • HTTP KeepAlive and session re-use (when applicable)
  • Warns if any pages on your site don’t load, return a 404 Not Found status, etc.
  • Customizable User-Agent (–ua flag)

FAQ

Q: How do I install and use OSG?
A: On Windows, download and extract the zip file above, then run osg.exe either through a command prompt, or by right-clicking osg.exe, making a shortcut, changing the parameters for that shortcut (in Properties) to e.g.: “D:.exe” -c 10 sitemap.xml example.com, and then running it. On Linux, copy the link for your architecture above, then run curl -s <link> | tar xvz, and you’re good to go. cd osg, and run e.g. ./osg -c 10 sitemap.xml example.com.

Q: Do I need to have WordPress, W3 Total Cache, WP Super Cache, memcached, … to use OSG?
A: No. OSG is completely indifferent to what powers your website. As long as there are links on the site, OSG can generate a sitemap for it.

Q: How do I generate an XML sitemap regularly?
A: The easiest way is to set up a cron job. On most Linux distributions you can do this by adding a cron entry using crontab -e. The entry can be e.g. /5 * * * /home/patrick/osg /home/patrick/sitemap.xml example.com, which will run OSG every five minutes. For more information, see Ubuntu’s Cron Howto.

(Note that Cron’s environment/path is very minimal, and you might need to use full paths to your commands.)

Support

If you have any problems with OSG, or have a question that isn’t answered in the FAQ, please search through or send a message to the Optimus Sitemap Generator group ([email protected]).

Changelog

Here are the highlights from the past releases to the current:

Version Changes
1.0 – 2012-05-08
  • First release

License

Optimus Sitemap Generator (OSG) is released under the MIT license.