|
|
Subscribe / Log in / New account

Development

Optimizing Linker Load Times

July 25, 2006

This article was contributed by John Richard Moser

Everyone wants their computer to start faster. Distribution maintainers are thus always looking for things to decrease boot time and program load time, using mechanisms such as readahead, boot reordering, even parallel init to more efficiently distribute CPU and disk I/O and get the user to the login screen faster.

One such tool used to decrease boot time and program load time is prelink [PDF], developed by Jakub Jelínek at Red Hat and discussed in an earlier LWN article. Prelinking is not the only way to improve start-up times, however; there are also a number of existing and developing optimizations the GNU linker can apply to give substantial gains. Besides existing GNU linker hash table optimizations, Michael Meeks has done substantial works on three separate optimizations in the GNU linker in a quest to make OpenOffice.org load quickly [PDF].

1.0 The -Wl,-O1 Linker Options

In typical operation, symbols are stored in hash tables in ELF binaries; these hash tables are kept small, and symbols that fall into the same bucket—hash to the same value—are compared by a simple string comparison. Unfortunately, symbols in the same bucket with the same prefix will need a long string comparison, which can be slow; if there are lots of symbols in the same bucket, then the dynamic linker has to string compare with each of them until it finds a match (Fig. 1.0-1).

Buckets containing multiple symbols Fig. 1.0-1: A hash table of symbols in a library. The bucket 0x00b1 contains three symbols; the red characters will have to be tested against the symbol name before the symbol 'foodragon' is resolved.

It was recommended in a paper [PDF] by Ulrich Drepper to utilize a GNU linker optimization that focuses more on producing short hash chains than a small hash table size. Although the hash table may grow by a few kilobytes, the shortened length of hash chains means that symbol look-ups do not have to perform as many string comparisons. Further, the chances of symbols sharing long prefixes landing in the same bucket are greatly reduced, making walking the chains faster (Fig. 1.0-2).

Optimized hash table Fig. 1.0-2: A hash table of symbols in a library. This table is optimized, so symbols hash to buckets containing shorter hash chains; this reduces the number of string comparisons done. The table is a bit bigger, though.

This linker optimization can be activated by passing -Wl,-O1 to gcc at link time. This flag can also be passed during any compilation; however, optimizing hash tables can be slow, and there are no benefits to doing this when creating object files that are just going to be linked into a main executable or shared object.

Currently this linker optimization is used by GNOME in its official builds. It has also been discussed on the Gentoo forums; and is used by the Ubuntu distribution, to name a few. An old but good technique.

2.0 -Bdirect Linking

Michael Meeks also proposed an optimization to GNU binutils and glibc which functions similar to direct binding in Solaris. By passing -Bdirect at link time, the build process can cause many symbols to be directly linked, allowing the dynamic linker to severely decrease the search space during lookup.

Libraries have unresolved symbols when they use functions or global variables from other libraries; these symbols are resolved during dynamic linking. The dynamic linker locates every unresolved symbol each library needs by searching the symbol table of each loaded library in order from first loaded to last loaded. This can become quite time consuming, especially in cases such as GNOME or OpenOffice.org where 50-150 libraries are loaded and up to 200,000 symbols must be resolved.

Normal symbol binding Fig. 2.0-1: Normal binding is done by checking through all libraries until the symbol is found.

Direct binding shortens the path the linker has to take to resolve a symbol by drawing it straight from whatever ELF object needs the symbol to the ELF object containing the symbol. A section is added to the ELF header which associates each symbol with an entry in the DT_NEEDED table—the list of libraries needed by the ELF object. The linker uses this information to go straight to the library containing the symbol and search only there, instead of searching through each library sequentially.

Direct symbol binding Fig. 2.0-2: Direct binding is done by checking the library that the symbol table says the binding is located in. This gives a much shorter path for any symbol lookup; three are shown above.

Meeks' -Bdirect linking method operates slightly differently from Solaris direct binding. With the Solaris linker, direct binding binds all symbols directly to the relevant libraries. With Michael Meeks' implementation of -Bdirect linking, vague symbols are detected and linked as normal. This is particularly important with C++, as vague C++ symbols are fairly common. It is not possible to guarantee that a vague symbol is linked to a specific library, so direct binding of vague symbols is not technically feasible.

-Bdirect binding can, to a degree, work with shared objects that are loaded with dlopen() rather than run-time linking; this is reported to greatly aid with OpenOffice.org and KDE start-up times. This is interesting because the performance improvements per library are comparable to but not quite as substantial as prelink, which does not affect libraries loaded in this manner.

The current implementation unfortunately has a few rough edges. Some areas could be better optimized; and there is an issue with certain libraries and programs breaking due to direct binding. Sometimes multiple libraries supply the same symbol, and the first loaded is the one used; this is called interposing, and is the reason why Meeks' -Bdirect linking patch does not optimize vague symbols. When direct binding is used, the symbol is chosen by the GNU linker at link time rather than by the dynamic linker at run time. This can be both a good and a bad thing, depending on the situation.

In most cases direct binding is a good thing, as it guarantees that libraries dependent on other libraries will always find the proper symbols, even if multiple libraries are loaded with the same symbols. Unfortunately, sometimes this very functionality also changes the way existing programs link, which just happen to work fine based on using the symbol from the first library supplying it. This is usually disastrous, as the symbol now used is typically a variable or function meant for a different purpose; the program no longer does what the programmer intended, and likely ends up simply crashing.

Not everybody believes that the use of interposing is the right thing to do. Still, getting things like this working would really require finding a work-around that does not break compatibility with existing code. Michael Meeks suggested during e-mail discussion a link-map extension to handle direct binding per symbol:

I would instead say: writing code that explicitely relies on interposing is not sensible. However - to make this work all that is needed is to implement an extension to link-maps (as done in the Solaris linker) to allow certain symbols to be marked as interposers (and hence not direct linked). That's most useful for (eg.) in glibc's use of pthreads - done by interposing.

This would allow direct binding to be used with libraries that currently fail with it, by only doing a long symbol search for vague symbols. Further, Meeks' finterpose tool can be used to find interposers between libraries; this not only shows where -Bdirect binding may fail, but also has exposed a few bugs in software such as GTK+ and gstreamer. The unfortunately large number of interposers presents a slight roadblock, but should not stop Meeks' -Bdirect linking from eventually becoming fully functional.

This optimization has an uncertain future, however. Relevant glibc code has so far been rejected by Ulrich Drepper. Despite this, -Bdirect linking has already found uses in Gentoo (with a portage overlay), Pardus Linux, and Solaris/OpenSolaris. The patch has also been committed to OpenSUSE, although it is uncertain if it will be used to build upcoming OpenSUSE releases.

3.0 dynsort

Another optimization posted by Michael Meeks adds the dynsort keyword to the GNU linker. By passing -zdynsort at link time, the .dynsym and .dynstr sections as well as relocations are all sorted by ELF hash and by the position where they land in the hash bucket.

When symbols are looked up in an ELF object, a hash table has to be searched. Hash collisions are effectively stored as a linked list, which then has to be walked to find the appropriate entry. With dynsort, the symbols that have to be examined while walking a bucket are all adjacent to each other. This reduces the number of L1 and L2 cache misses, allowing the CPU to utilize its facilities much more efficiently during dynamic linking.

The dynsort optimization has gotten a good bit of attention since it was unveiled. Meeks is working on improving it in various ways, such as moving data for undefined symbols away from the rest of the data. It was this patch that got Meeks offered a branch in binutils cvs, although as of this writing one hasn't been set up. It can be used in Gentoo with an overlay, but may soon be available everywhere via the --hash-style option.

4.0 Precomputed Hash Values

A few weeks after the dynsort patch, Meeks posted another optimization the binutils mailing list. This one adds pre-computed ELF hash values to the elf header when the -hashvals switch is given. Normally the dynamic linker has to compute a hash value of a symbol and then use that to find the index in the symbol tables of the other libraries. The -hashvals optimization pre-computes these, removing a series of cache misses and mathematical computations from the process.

The visible effect of the -hashvals switch is a large speedup in symbol look-up; Meeks measured dlopen() on libsvx as requiring 40% less time, and in some places a 51% reduction was realized. This is due not only to avoiding the calculation of symbol hash values; but also to much more efficient L2 cache utilization. Only a single 32-bit value is accessed for each symbol, rather than a long string; and locality is increased, since only the .hashvals section is accessed instead of both .dynsym and .dynstr, which reduces L2 cache misses.

This patch has a very favorable future; it almost immediately gained support for inclusion in upstream binutils. Several months later, a rewrite done by Ulrich Drepper and Jakub Jelínek was posted, implementing sorting and precomputed hashvals via --hash-style. And of course, Gentoo users have had access to it for a while with various portage overlays.

Conclusion

There are a lot of optimizations possible to reduce dynamic linking time, creating substantial gains in application load time. Future versions of the GNU linker and glibc may allow distributions to boot and load programs much faster, getting users to the desktop and on their applications in substantially less time.

Comments (17 posted)

System Applications

Audio Projects

Rivendell v0.9.69 is available

Version 0.9.69 of Rivendell, a radio station automation system, is out. "This release, featuring the debut of full voicetracking and log customization capabilities, marks a significant milestone for the project, with all modules now feature-complete."

Full Story (comments: none)

Database Software

pgAdmin III v1.4.3 released

Version 1.4.3 of pgAdmin III, a GUI PostgreSQL administration tool, has been announced. "v1.4.3 is primarily a bug fix release".

Comments (none posted)

DNS Software

dnspython 1.4.0 announced

Version 1.4.0 of dnspython has been announced. "dnspython is a DNS toolkit for Python. It supports almost all record types. It can be used for queries, zone transfers, and dynamic updates. It supports TSIG authenticated messages and EDNS0. dnspython provides both high and low level access to DNS. The high level classes perform queries for data of a given name, type, and class, and return an answer set. The low level classes allow direct manipulation of DNS zones, messages, names, and records."

Comments (none posted)

smbind 0.4.4 released (SourceForge)

Version 0.4.4 of smbind, a PHP-based tool for managing DNS zones for BIND via the web, is available. "A number of fixes have been included in this release for both smbind and smbind-slave. A problem with user delegation of zones was fixed (thanks to an anonymous SF user report), along with a minor fix for zone deletion, which wasn't getting committed to the config file as expected. Deleting a user was supposed to transfer ownership from that user over to the admin user, but was broken because of the expected admin user's id in the database. A few documentation updates and grammatical changes have also been made."

Comments (none posted)

Interoperability

Samba 3.0.23a Available for Download

Version 3.0.23a of Samba has been announced. "This is the latest stable release of Samba. This is the version that production Samba servers should be running for all current bug-fixes."

Full Story (comments: none)

LDAP Software

LAT 1.1.5 is available

Version 1.1.5 of LAT, the LDAP Administration Tool, is available. "This release is the 6th of the 1.1.x development cycle which will eventually become v1.2. If you need a stable release stick with the 1.0 branch."

Full Story (comments: none)

Printing

Common UNIX Printing System 1.2.2 announced

Version 1.2.2 of CUPS has been announced. "CUPS 1.2.2 fixes several build, platform, notification, and printing bugs."

Comments (none posted)

Security

FTimes 3.7.0 Released (SourceForge)

FTimes version 3.7.0 has been announced. "Version 3.7.0 is a minor release of FTimes, a system baselining and evidence collection tool. The primary purpose of ftimes is to gather and/or develop topographical information and attributes about specified directories and files in a manner conducive to intrusion and forensic analysis."

Comments (none posted)

Sussen 0.26 announced

Version 0.26 of Sussen, a vulnerability and configuration file scanner, is out with new features, code cleanup and bug fixes.

Full Story (comments: none)

Web Site Development

ASF Announces Apache Geronimo Version 1.1

The Apache Software Foundation has announced the release of Apache Geronimo Version 1.1, an open-source J2EE application server. "Along with many new features, Apache Geronimo Version 1.1 introduces several structural changes designed to improve scalability, portability and overall organization. An easy-to-use configuration and management console provides access to the new innovative plug-in architecture, allowing advanced control over the rich modularity of the Apache Geronimo server as well as simplifying day-to-day operational management tasks."

Comments (none posted)

OpenReports 2.0 Milestone 1 released (SourceForge)

Version 2.0 Milestone 1 of OpenReports has been announced. "OpenReports, the leading open source web reporting solution, is pleased to announce the availability of OpenReports 2.0 Milestone 1. OpenReports 2.0 features new export formats, ChartReports, scheduling improvements, support for JasperReports 1.2.5 and Hibernate 3.1.3, and many other bug fixes and enhancements."

Comments (none posted)

Plone Foundation Announces Plone 2.5

Version 2.5 of the Plone Content Management System has been announced. "With the addition of powerful caching technologies, Plone 2.5 enables websites to run 10 to 40 times faster than in previous versions. Plone 2.5 focuses on streamlining code, strengthening stability, and increasing flexibility. The release incorporates the latest generation of the underlying Zope application server, setting the groundwork for Plone Foundation’s anticipated 3.0 release, available early 2007. Plone 3.0 will substantially increase ease-of-use and efficiency through user interface improvements."

Comments (none posted)

Web Services

WSMT v1.3 Released (SourceForge)

Version 1.3 of the Web Service Modeling Toolkit (WSMT), a collection of tools for Semantic Web Services intended for use with the Web Service Modeling Ontology, is available. "The main aim of this release has been to improve the functionality of the WSML Text Editor and Reasoner Views with respect to syntax completion. In the previous release only keywords where recommended and this keyword recommendation was not sensitive to the current location in the document. This release sees the addition of full context sensitive syntax completion."

Comments (none posted)

Desktop Applications

Audio Applications

Announcing Jokosher 0.1 (GnomeDesktop)

The initial version 0.1 release of Jokosher, a multi-track audio studio application for GNOME, has been announced. "The Jokosher team are proud to announce our very first 0.1 release of Jokosher, a simple, usability focused Open Source multi-track studio. Since the original design and conception of the project in February, a team of developers, documentation writers, artists, testers and packagers have worked together to create a compelling first release."

Comments (none posted)

Desktop Environments

GNOME 2.15.90 released

GNOME 2.15.90 (otherwise known as the first GNOME 2.16 beta) is out. Click below for download details and pointers to information on what is changing in 2.16.

Full Story (comments: none)

GNOME Software Announcements

The following new GNOME software has been announced this week: You can find more new GNOME software releases at gnomefiles.org.

Comments (none posted)

KDE Software Announcements

The following new KDE software has been announced this week: You can find more new KDE software releases at kde-apps.org.

Comments (none posted)

KDE Commit-Digest (KDE.News)

KDE.News has announced the July 23, 2006 edition of the KDE Commit-Digest. Here is the content summary: "KDevelop gets new configuration framework functionality. The start of a Satellite tracks feature in KStars. Support for PDF data extraction, and speed optimisations in Strigi. New features in KPhotoAlbum (KPhotoAlbum is the new name for KimDaBa). Perspective grid support in Krita, with the implementation of a Bezier tool becoming feature-complete. More work on unit conversions in KRecipes. Porting of KRDC to KDE 4."

Comments (none posted)

Electronics

Covered 0.4.6 released

Version 0.4.6 of Covered, a Verilog code coverage utility, has been announced. "This release contains several bug fixes".

Comments (none posted)

Games

New male media in Ember

The WorldForge game project has announced the creation of a new human male simulation. "I’m trying to get a new version of Ember out, hopefully this week. In the meantime I want to show some screenshots of the new male mesh that Jayr has been working on. By using the model definition system in Ember, it’s possible to define different “parts” of the model, such as “/torso/cloth/green” or “/torso/cloth/red”. These parts use the same submesh (in the example the torso submesh) but with different materials."

Comments (1 posted)

Graphics

OpenSceneGraph 1.1 released

Version 1.1 of OpenSceneGraph, an open-source 3D scene graph project based on OpenGL, is available with a long list of new capabilities.

Full Story (comments: none)

Mail Clients

UI Improvements in Evolution 2.8

Srinivasa Ragavan's GNOME blog covers changes to the Evolution mail client. "Four months, in GNOME 2.16 cycle, We have added a lot of UI improvements to Evolution to make it look much better. Not just features and lot of bug fixes too!!! I have blogged them in parts. Im summarising all of them."

Comments (none posted)

Music Applications

New Colombo hydrogen drumkit

Marcos Guglielmetti has announced a new drum kit sample set for the hydrogen drum machine. "Colombo drums are handcraft drums made in La Plata, Argentina by a man called Colombo."

Full Story (comments: none)

Office Suites

OpenOffice and ISO distribution with Metalink

OpenOffice.org and other software is now available via Metalink using aria2. "OpenOffice.org is using a new Web/P2P hybrid called Metalink to distribute its free office suite. Other open source software and Linux ISOs are available at Metalink Packages Resources."

Full Story (comments: none)

Languages and Tools

Python

Crunchy Frog 0.6 released

Version 0.6 of Crunchy Frog has been announced. "Crunchy Frog is an application that transforms an html-based Python tutorial into an interactive session within a browser window."

Comments (none posted)

Tcl/Tk

Dr. Dobb's Tcl-URL!

The July 25, 2006 edition of Dr. Dobb's Tcl-URL! is online with new Tcl/Tk articles and resources.

Full Story (comments: none)

Page editor: Forrest Cook
Next page: Linux in the news>>


Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds