|
|
Subscribe / Log in / New account

KSM tries again

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jonathan Corbet
April 28, 2009
Back in November, LWN examined the KSM (kernel shared memory) patch. KSM was written to address a problem encountered on systems running virtualized guests: such systems can have large numbers of pages holding identical data, but the kernel has no way to let guests share those pages. The KSM code scans through memory to find pairs of pages holding identical data; when such pairs are found, they are merged into a single page mapped into both locations. The pages are marked copy-on-write, so the kernel will automatically separate them again should one process modify its data.

There were some concerns about the intended purpose of this patch, but it was soon made clear that KSM can help the system to save significant amounts of memory. But KSM was not merged as a result of two other problems. One of them, discussed mostly behind closed doors, appears to be concerns about the use of SHA1 hashes to compare pages. If an attacker could create hash collisions, he might just be able to inject his own data (or code) into processes belonging to other users and/or virtual machines. The other problem had to do with a different kind of attacker: VMWare holds a patent to an algorithm which looks quite similar to the method used by the early KSM patches. There is evidence that this patent could be overturned on prior art, but that is still a battle that nobody wants to fight.

KSM disappeared from view for a while after those issues came to light, but, more recently, new versions of the KSM patches have been posted for review. A quick look at the code makes it clear that both of these concerns have been addressed - and, in fact, that the KSM developers were able to kill both birds with the same stone. It's all a matter of doing away with the hashing of pages.

Patent 6,789,156 is not exactly light reading; it has a full 74 claims. Most of the independent claims have one thing in common, though: they include the calculation of a hash value to find identical pages in the system. If the KSM code were to avoid hashing pages, those claims of the patent would clearly not read against it. And, as described above, the use of hashing also created some security worries. So it must have seemed clear to the KSM developers (and any lawyers they may have talked to) that the hash needed to go.

The current KSM patches have replaced the hash table with two separate red-black trees. Pages tracked by KSM are initially stored in the "unstable tree"; the term "unstable" means that KSM considers their contents to be volatile. Placement in the tree is determined by a simple memcmp() of the page's contents; essentially, the page is treated as containing a huge number and sorted accordingly. The unstable tree is suitable for finding pages with duplicate contents; a relatively quick traversal of the tree will turn up the only candidates.

It's worth noting that KSM does not place every page it scans in the unstable tree. If the contents of a page change over the course of one memory scanning cycle, the page will not really be a good candidate for sharing anyway. So pages which are seen to change are not represented in the unstable tree. The unstable tree is also dumped and rebuilt from the beginning after each scan cycle. That deals with the problem of pages which, as a result of modifications, find themselves in the wrong location in the tree. The nature of red-black trees means that search and insertion operations are almost the same thing, so there is little real cost to rebuilding the unstable tree from the beginning every time.

The other pages which are not found in the unstable tree are those which have actually been merged with duplicates. Since shared pages are marked read-only, KSM knows that their contents cannot change. Those pages are put into a separate "stable tree." The stable tree is also a red-black tree, but, since pages cannot become misplaced there, it need not be rebuilt regularly. Once a page goes into the stable tree, it stays there until all users have either modified or unmapped it.

The resulting system clearly works. Dropping the hash may impose a cost in the form of slightly higher CPU and memory use; there have been no benchmarks posted which would show the difference. But there is no cost on systems which do not use KSM at all, and, in any case, avoiding the potential problems associated with using hash tables to identify pages with identical contents will be worth the cost. At this point, comments on the KSM code are mostly concerned with relatively small details. It could well be that this code will be ready for inclusion into the 2.6.31 kernel.

(Postscript: above, your editor noted that "most" of the independent claims in the VMWare patent required the use of a hashing mechanism. There are, in fact, a few claims which lack that requirement, but they replace it with one or two others. Some claims cover the use of copy-on-write pages, but they all explicitly say that this technique is used on pages with a "relatively high probability of impending modification." But there is little point in sharing such pages at all; KSM, by leaving them out of the unstable tree, ignores those pages entirely. The remaining claims describe partitioning memory into "classes," which is not done in KSM.)

Index entries for this article
KernelMemory management/Kernel samepage merging
KernelVirtualization


(Log in to post comments)

KSM tries again

Posted Apr 30, 2009 4:54 UTC (Thu) by ncm (guest, #165) [Link]

I'll bet you can get almost equally good results much more cheaply by tracking pages that map to common disk blocks. I.e., instead of actually scanning the pages, you track where their contents came from. You can get a little better by similarly tracking writable pages that are read() into, marking them read-only until they're touched, if ever.

This gets you a long way toward zero-copy I/O (what the BSDs call UVM), although that's said to play poorly with multiple cores.

Close but no cigar

Posted Apr 30, 2009 6:25 UTC (Thu) by khim (subscriber, #9252) [Link]

Of course all such pages come from common block somewhere on Debian's build server. But there are no easy way to track this relationship at all! If you are thinking about shared libraries and such - this is different kettle of fish and THESE pages are shared already. KSM is designed to work with KVM and Xen where identical pages come from different virtual filesystems and thus from different physical places. May be in the future btrfs will change this, but KSM works here and now...

Memory deduplication by common disk source

Posted May 3, 2009 2:47 UTC (Sun) by giraffedata (guest, #1954) [Link]

Your point about KSM working in the here and now is good, but as a question of long term strategy, ncm's is equally good. Maybe KSM should be thought of as a stop-gap.

In the systems where the memory duplication is a big problem, the common disk source of the memory pages is a lot closer than Debian's build server. If you have a hundred virtual machines, you probably did the Debian install once and copied the disk image a hundred times locally.

If that copy is a naive physical copy, then it's still hard for the hypervisor to know that two memory pages have the same ultimate source. But if you apply the same deduplication strategy to the duplicated disk (and there's plenty of work going on to make that a reality), then you have 100 virtual disk devices all sharing the same physical disk blocks and the hypervisor knows it. So it can easily back 100 virtual memory pages with a single real memory page.

Memory deduplication by common disk source

Posted Nov 22, 2009 21:26 UTC (Sun) by kabloom (guest, #59417) [Link]

There's another situation where this is useful: interpreted languages. Supposing you run a rails app on your server. To take advantage of concurrency you fork the rails app several times. All of the Rails code (parsed already, and byte-compiled if you're on Ruby 1.9) is on Copy-on-write pages. (Ruby Enterprise Edition is intended to keep the garbage collector from writing to those pages.) Now suppose you run several apps but they don't come about by forking. The Rails code (parsed or byte-compiled) can't be traced back to a single disk block (because it's been parsed) but it's probably got the same page layout despite being loaded in different interpreters. Here, KSM should help.

KSM tries again

Posted Apr 30, 2009 9:08 UTC (Thu) by simlo (guest, #10866) [Link]

Doesn't this in some ways replace/complement dynamic linking? It seems to be based on the same idea, but moves the whole job into the kernel.

Ofcourse, if two applications statically links to a library, there could be many reasons why the resulting pages within the executables are not identical :-(

KSM tries again

Posted Apr 30, 2009 9:45 UTC (Thu) by Darkmere (subscriber, #53695) [Link]

Not really, You think of this from a single-Machine perspective.

Think of it as this:
host machine running KSM:
VM 1:
VM 2:
VM 3:

In all these machines there is bound to be a lot of similar pages of memory allocated. For most (heh?) uses where this would be an improvement you run the same distribution+patchlevel on all VM's with some minor differences. In these cases things like prelink and memory randomization are in effect making the binaries different on disk, however, once you load them into RAM, the working state of for example /sbin/init is bound to have a LOT of pages in common between the four systems (host+3*VM).

And that is where you can get some neat memory savings.

See? Patents do help foster innovation!

Posted May 1, 2009 14:02 UTC (Fri) by dion (guest, #2764) [Link]

Without software patents the KSM feature would be stuck doing the same stupid thing as vmware is doing:)

I makes me quite happy to see that there was a way around the patent problem this time, although it would be helpful to be able to disable patent encumbered features or enable workarounds only when the software happens to be running in a non-free country.

Maybe all of the patent nonsense could be done away with simply by making it the responsibility of the patent owner to spot and warn of infringements and make it impossible to collect any damages until after a reasonable period has passed without action.

See? Patents do help foster innovation!

Posted May 3, 2009 7:38 UTC (Sun) by dlang (guest, #313) [Link]

that isn't fair to the 'small inventor' battling the 'large company' (the image near and dear to the patent defenders heart) because the little guy may not be able to find out that the company is infringing on the patent due to trade secrets.

that being said, I think that a very good case could be made for an explicit exception for the case of open-source software, where any author of open-source software can post a copy of their code to a site (or a list of sites), and companies then have X amount of time to protest the use of any patent in code posted there. give the author of the opensource code immunity for any use prior to any notice being delivered and you will have authors willing to post there.

I'm sure that companies would quickly spring up offering the service of investigating this code looking for patent violations (I'm also sure that most of them would not begin to be able to do the job, but that's their, and their clients problem to deal with ;-)

See? Patents do help foster innovation!

Posted May 3, 2009 11:32 UTC (Sun) by dion (guest, #2764) [Link]

Well, personally I have never seen one instance of patents having a positive influence on anything, but I've heard plenty of stories from various industries where they were a huge burden, so I'd rather do away with patents completely.

However, "immunity until notified" for published works would be a very fair deal for everyone, especially the society that allows patents, because publishing is exactly what patents are supposed to encourage.

See? Patents do help foster innovation!

Posted May 3, 2009 12:41 UTC (Sun) by dlang (guest, #313) [Link]

playing devil's advocate here.

can you give an example of any country that did not have patent laws producing anywhere near the number of new things that are produced by the US?

unfortunantly there is no example to go by to counter this.

all that can be done is to look at history and note the number of things that we done by one person or group, and then lost over time to be rediscovered by another group many years later. the patent system is supposed to prevent this by getting these people to disclose 'enough details so that a person ordinarily skilled in the field can duplicate the invention'

In my opinion, the big problem is that the bar for getting patents is just too low. it used to be that you had to provide a working model of the invention to be inspected and see if it matched up with the patent. that's not done anymore, and many patents are for things that just don't work. the patent is also only supposed to be granted for things that are 'not obvious to a person skilled in the field', that is not being done (especially in the computer field) and this leads to far too many patents.

then you get into the problem that

See? Patents do help foster innovation!

Posted May 3, 2009 13:27 UTC (Sun) by dion (guest, #2764) [Link]

Yes, in software tons of things are being invented places where there are no software patents.

If there is any great advantage to having software patents then the US and Japan should be leading the world in software innovation.

See? Patents do help foster innovation!

Posted May 3, 2009 14:26 UTC (Sun) by dlang (guest, #313) [Link]

the grandparent comment was advocating eliminating all patents, not just the ones on software.

I don't think that there is anyone who would claim that software patents are currently working correctly.

See? Patents do help foster innovation!

Posted May 3, 2009 14:33 UTC (Sun) by lemmings (guest, #53618) [Link]

> Without software patents the KSM feature would be stuck doing the same stupid thing as vmware is doing:)

No, what vmware is doing is not stupid. The security "concern" is not a
concern. One simply does a memcmp on pages which have a matching hash.

The result of the KSM hash avoidance _will_ be a slower, less scalable
identical page scanning mechanism. The stable/unstable lists optimisation
is orthogonal to the use of hashes.

See? Patents do help foster innovation!

Posted May 3, 2009 15:00 UTC (Sun) by dion (guest, #2764) [Link]

Yeah, I knew that, my point was that the patent system forces extra work to happen in order to work around patents rather than help people to avoid re-inventing things.

See? Patents do help foster innovation!

Posted May 8, 2009 9:34 UTC (Fri) by muwlgr (guest, #35359) [Link]

To find identical pages by comparing their hashes, first you have to calculate these hashes, reading full contents of each page into CPU registers (and maybe even evicting all useful CPU cache). Your memory traffic is 100% of your used pages, every time.

Contrary, when you compare pages by their contents right from the start, in most cases you could find differing pages quite quickly, by reading only their first few bytes. You have to read two whole pages only to find out that they are identical. And this is done without mathematical overhead of hash calculation, remember. So your memory traffic is 100% only for those pages that happen to be identical. For pages that are different pairwise, you usually have to read only a small part of them to find that. With random contents of the pages, you stop comparing them on the first byte in 255/256 cases. On the fourth byte (32-bit word), in (2^32-1)/2^32 cases. In Linux case, there could exist differing pages with longer common prefixes, but I think this prefix will be shorter than half of page length often enough. So for differing memory pages you most probably don't get 100% of memory traffic anyway. I clearly see why simplified approach of KSM could have its benefits.

See? Patents do help foster innovation!

Posted May 8, 2009 10:50 UTC (Fri) by efexis (guest, #26355) [Link]

If you want to just compare two pages then yes, that may be true. It starts to break down though when you want to compare a page with, say, ten thousand other pages. Comparing a hash with ten thousand other hashes is going to be quicker than comparing a page with ten thousand other pages (your argument of only needing to scan the first x bytes of a page before likelyhood of finding differences holds up also when comparing hashes). If that speed increase comparing hashes outweights the time spent hashing the pages to begin with, then you are losing speed by not hashing. Of course it's not this simple; optimised sorting/indexing algorithms means you don't have to compare every page with every other page to rule out matches (as you also wouldn't have to compare every pair of hashes). For example, what's the effects of reading from all these pages many times as opposed to smaller hashes on the CPU memory cache going to be?

I think in this case, testing and observation are going to be important, it's near impossible to speculate - with the dynamicness of potentially millions of memory pages spread across similar to disparate virtual systems - what the comparitive results of the two different methods will be.

See? Patents do help foster innovation!

Posted May 8, 2009 14:19 UTC (Fri) by nix (subscriber, #2304) [Link]

What I'd do there is hash the first 32/64/128 bytes or thereabouts of a
page (one cacheline) and then use that partial hash to eliminate the
majority of pages from comparison. (But this obvious solution, which took
less than five seconds to think of, may be patented. Bleah.)

See? Patents do help foster innovation!

Posted May 15, 2009 19:43 UTC (Fri) by nevyn (subscriber, #33129) [Link]

Speed improvement: Don't bother hashing it, just use the first 8 bytes of the page as the first pass value. I'd assume that couldn't be patented (given how it's a trivial optimization of what the memcmp() is doing) ... but who knows.

We can avoid reading the whole page for each comparison, even in the worst case.

Posted Nov 17, 2009 16:03 UTC (Tue) by gmatht (subscriber, #58961) [Link]

Say that they have a binary tree. Each left edge is labelled 0, and each
right edge is labelled 1. Whenever we read a bit from the page we follow the
edge with the label of the bit. So for example, if we look for the zero page
in the tree then we navigate to the leftmost child node.

This is clearly very inefficient, as this simple algorithms needs exactly
(4096*8) branches to find any page. However we read the page exactly once
for all comparisons. And we can optimize it, e.g. if we have a chain of
nodes with only single children we can merge them into a single edge.

Another approach would be to make use of the fact that if say, we know all
child pages of the current node start with "Foo" we don't need to compare
the first 3 characters anymore. As such, we'd usually only have to access
each cache-line once.

KSM tries again

Posted May 1, 2009 19:33 UTC (Fri) by tack (guest, #12542) [Link]

I just want to point out that VMware does a first-pass comparison using hashes, and, should the hashes match, a second pass byte-by-byte comparison in order to rule out hash collisions.

http://www.vmware.com/pdf/esx3_memory.pdf

vmware memory sharing numbers

Posted May 14, 2009 8:47 UTC (Thu) by robbe (guest, #16131) [Link]

For reference: on my vmware ESX 3.5 installation Linux VMs have between 5
and 25 % of their allotted memory marked as shared. The factor is higher
for idle machines than for busy ones -- as expected.

I've seen 35 % for Windows VMs, as these tend to more aggressively zero
out unused pages. These are of course trivially shareable.


Copyright © 2009, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds