|
|
Subscribe / Log in / New account

ALS: Linux interprocess communication and kdbus

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jake Edge
May 30, 2013

As part of the developer track at this year's Automotive Linux Summit Spring, Greg Kroah-Hartman talked about interprocess communication (IPC) in the kernel with an eye toward the motivations behind kdbus. The work on kdbus is progressing well and Kroah-Hartman expressed optimism that it would be merged before the end of the year. Beyond just providing a faster D-Bus (which could be accomplished without moving it into the kernel, he said), it is his hope that kdbus can eventually replace Android's binder IPC mechanism.

Survey of IPC

[Greg Kroah-Hartman]

There are a lot of different ways to communicate between processes available in Linux (and, for many of the mechanisms, more widely in Unix). Kroah-Hartman strongly recommended Michael Kerrisk's book, The Linux Programming Interface, as a reference to these IPC mechanisms (and most other things in the Linux API). Several of his slides [PDF] were taken directly from the book. All of the different IPC mechanisms fall into one of three categories, he said: signals, synchronization, or communication. He used diagrams from Kerrisk's book (page 878) to show the categories and their members.

There are two types of signals in the kernel, standard and realtime, though the latter doesn't see much use, he said.

Synchronization methods are numerous, including futexes and eventfd(), which are both relatively new. Semaphores are also available, both as the "old style" System V semaphores and as "fixed up" by POSIX. The latter come in both named and unnamed varieties. There is also file locking, which has two flavors: record locks to lock a portion of a file and file locks to prevent access to the whole file. However, the code that implements file locking is "scary", he said. Threads have four separate types of synchronization methods (mutex, condition variables, barriers, and read/write locks) available as well.

For communication, there are many different kernel services available too. For data transfer, one can use pseudo-terminals. For byte-stream-oriented data, there are pipes, FIFOs, and stream sockets. For communicating via messages, there are both POSIX and System V flavored message queues. Lastly, there is shared memory which also comes in POSIX and System V varieties along with mmap() for anonymous and file mappings. Anonymous mappings with mmap() were not something Kroah-Hartman knew about until recently; they ended up using them in kdbus.

Android IPC

"That is everything we have today, except for Android", Kroah-Hartman said. All of the existing IPC mechanisms were "not enough for Android", so that project added ashmem, pmem, and binder. Ashmem is "POSIX shared memory for the lazy" in his estimation. The Android developers decided to write kernel code rather than user-space code, he said. Ashmem uses virtual memory and can discard memory segments when the system is under memory pressure. Currently, ashmem lives in the staging tree, but he thinks that Google is moving to other methods, so it may get deleted from the tree soon.

Pmem is a mechanism to share physical memory. It was used to talk to GPUs. Newer versions of Android don't use pmem, so it may also go away. Instead, Android is using the ION memory allocator now.

Binder is "weird", Kroah-Hartman said. It came from BeOS and its developers were from academia. It was developed and used on systems without the System V IPC APIs available and, via Palm and Danger, came to Android. It is "kind of like D-Bus", and some (including him) would argue that Android should have used D-Bus, but it didn't. It has a large user-space library that must be used to perform IPC with binder.

Binder has a number of serious security issues when used outside of an Android environment, he said, so he stressed that it should never be used by other Linux-based systems.

In Android, binder is used for intents and app separation; it is good for passing around small messages, not pictures or streams of data. You can also use it to pass file descriptors to other processes. It is not particularly efficient, as sending a message makes lots of hops through the library. A presentation [YouTube] at this year's Android Builders Summit showed that one message required eight kernel-to-user-space transitions.

More IPC

A lot of developers in the automotive world have used QNX, which has a nice message-passing model. You can send a message and pass control to another process, which is good for realtime and single processor systems, Kroah-Hartman said. Large automotive companies have built huge systems on top of QNX messages, creating large libraries used by their applications. They would like to be able to use those libraries on Linux, but often don't know that there is a way to get the QNX message API for Linux. It is called SIMPL and it works well.

Another solution, though it is not merged into the kernel, is KBUS, which was created by some students in England. It provides simple message passing through the kernel, but cannot pass file descriptors. Its implementation involves multiple data copies, but for 99% of use cases, that's just fine, he said. Multiple copies are still fast on today's fast processors. The KBUS developers never asked for it to be merged, as far as he knows, but if they did, there is "no reason not to take it".

D-Bus is a user-space messaging solution with strong typing and process lifecycle handling. Applications subscribe to messages or message types they are interested in. They can also create an application bus to listen for messages sent to them. It is widely used on Linux desktops and servers, is well-tested, and well-documented too. It uses the operating system IPC services and can run on Unix-like systems as well as Windows.

The D-Bus developers have always said that it is not optimized for speed. The original developer, Havoc Pennington, created a list of ideas on how to speed it up if that was of interest, but speed was not the motivation behind its development. In the automotive industry, there have been numerous efforts to speed D-Bus up.

One of those efforts was the AF_BUS address family, which came about because in-vehicle infotainment (IVI) systems needed better D-Bus performance. Collabora was sponsored by GENIVI to come up with a solution and AF_BUS was the result. Instead of the four system calls required for a D-Bus message, AF_BUS reduced that to two, which made it "much faster". But that solution was rejected by the kernel network maintainers.

The systemd project rewrote libdbus in an effort to simplify the code, but it turned out to significantly increase the performance of D-Bus as well. In preliminary benchmarks, BMW found [PPT] that the systemd D-Bus library increased performance by 360%. That was unexpected, but the rewrite did take some shortcuts and listened to what Pennington had said about D-Bus performance. Kroah-Hartman's conclusion is that "if you want a faster D-Bus, rewrite the daemon, don't mess with the kernel". For example, there is a Go implementation of D-Bus that is "really fast". The Linux kernel IPC mechanisms are faster than any other operating system, he said, though it may "fight" with some of the BSDs for performance supremacy on some IPC types.

kdbus

In the GNOME project, there is plan for something called "portals" that will containerize GNOME applications. That would allow running applications from multiple versions of GNOME at the same time while also providing application separation so that misbehaving or malicious applications could not affect others. Eventually, something like Android's intents will also be part of portals, but the feature is still a long way out, he said. Portals provides one of the main motivations behind kdbus.

So there is a need for an enhanced D-Bus that has some additional features. At a recent GNOME hackfest, Kay Sievers, Lennart Poettering, Kroah-Hartman, and some other GNOME developers sat down to discuss a new messaging scheme, which is what kdbus is. It will support multicast and single-endpoint messages, without any extra wakeups from the kernel, he said. There will be no blocking calls to kdbus, unlike binder which can sleep, as the API for kdbus is completely asynchronous.

Instead of doing the message filtering in user space, kdbus will do it in the kernel using Bloom filters, which will allow the kernel to only wake up the destination process, unlike D-Bus. Bloom filters have been publicized by Google engineers recently, and they are an "all math" scheme that uses hashes to make searching very fast. There are hash collisions, so there is still some searching that needs to be done, but the vast majority of the non-matches are eliminated immediately.

Kdbus ended up with a naming database in the kernel to track the message types and bus names, which "scared the heck out of me", Kroah-Hartman said. But it turned to be "tiny" and worked quite well. In some ways, it is similar to DNS, he said.

Kdbus will provide reliable order guarantees, so that messages will be received in the order they were sent. Only the kernel can make that guarantee, he said, and the current D-Bus does a lot of extra work to try to ensure the ordering. The guarantee only applies to messages sent from a single process, the order of "simultaneous" messages from multiple processes is not guaranteed.

Passing file descriptors over kdbus will be supported. There is also a one-copy message passing mechanism that Tejun Heo and Sievers came up with. Heo actually got zero-copy working, but it was "even scarier", so they decided against using it. Effectively, with one-copy, the kernel copies the message from user space directly into the receive buffer for the destination process. Kdbus might be fast enough to handle data streams as well as messages, but Kroah-Hartman does not know if that will be implemented.

Because it is in the kernel, kdbus gets a number of attributes almost for free. It is namespace aware, which was easy to add because the namespace developers have made it straightforward to do so. It also integrated with the audit subystem, which is important to the enterprise distributions. For D-Bus, getting SELinux support was a lot of work, but kdbus is Linux Security Module (LSM) aware, so it got SELinux (Smack, TOMOYO, AppArmor, ...) support for free.

Current kdbus status

As a way to test kdbus, the systemd team has replaced D-Bus in systemd with kdbus. The code is available in the systemd tree, but it is still a work in progress. The kdbus developers are not even looking at speed yet, but some rudimentary tests suggest that it is "very fast". Kdbus will require a recent kernel as it uses control groups (cgroups); it also requires some patches that were only merged into 3.10-rc kernels.

The plan is to merge kdbus when it is "ready", which he hopes will be before the end of the year. His goal, though it is not a general project goal, is to replace Android's binder with kdbus. He has talked to the binder people at Google and they are amenable to that, as it would allow them to delete a bunch of code they are currently carrying in their trees.

Kdbus will not "scale to the cloud", Kroah-Hartman said in answer to a question from the audience, because it only sends messages on a single system. There are already inter-system messaging protocols that can be used for that use case. In addition, the network maintainers placed a restriction on kdbus: don't touch the networking code. That makes sense because it is an IPC mechanism, and that is where AF_BUS ran aground.

The automotive industry will be particularly interested because it is used to using the QNX message passing, which it mapped to libdbus. It chose D-Bus because it is well-documented, well-understood, and is as easy to use as QNX. But, it doesn't just want a faster D-Bus (which could be achieved by rewriting it), it wants more: namespace support, audit support, SELinux, application separation, and so on.

Finally, someone asked whether Linus Torvalds was "on board" with kdbus. Kroah-Hartman said that he didn't know, but that kdbus is self-contained, so he doesn't think Torvalds will block it. Marcel Holtmann said that Torvalds was "fine with it" six years ago when another, similar idea had been proposed. Kroah-Hartman noted that getting it past Al Viro might be more difficult than getting it past Torvalds, but binder is "hairy code" and Viro is the one who found the security problems there.

Right now, they are working on getting the system to boot with systemd using kdbus. There are some tests for kdbus, but booting with systemd will give them a lot of confidence in the feature. The kernel side of the code is done, he thinks, but they thought that earlier and then Heo came up with zero and one-copy. He would be happy if it is merged by the end of the year, but if it isn't, it shouldn't stretch much past that, and he encouraged people to start looking at kdbus for their messaging needs in the future.

[ I would like to thank the Linux Foundation for travel assistance so that I could attend the Automotive Linux Summit Spring and LinuxCon Japan. ]

Index entries for this article
Kernelkdbus
KernelMessage passing
ConferenceAutomotive Linux Summit Spring/2013


(Log in to post comments)

ALS: Linux interprocess communication and kdbus

Posted May 31, 2013 6:55 UTC (Fri) by sorokin (guest, #88478) [Link]

> In the GNOME project, there is plan for something called "portals"
> that will containerize GNOME applications. That would allow running
> applications from multiple versions of GNOME at the same time while
> also providing application separation so that misbehaving or malicious
> applications could not affect others.
> ...
> Portals provides one of the main motivations behind kdbus.

Could someone clarify why portals in GNOME require a new transport for D-Bus in kernel? Why it is not possible to continue to use UNIX domain sockets and provide separation in userspace?

ALS: Linux interprocess communication and kdbus

Posted May 31, 2013 7:13 UTC (Fri) by alexl (subscriber, #19068) [Link]

True app sandboxing should be kernel enforced, but if you use a userspace dbus daemon implementation then the sandboxed application will be talking directly to a process that is not sandboxed (with no kernel involvement). This would "work", but its not really considered a robust sandbox.

ALS: Linux interprocess communication and kdbus

Posted May 31, 2013 17:14 UTC (Fri) by Tobu (subscriber, #24111) [Link]

The trusted part could be in-kernel, or it could be the dbus daemon's code that deals with message serialisation and filtering. It's academic because we seem to be wed to C, but userspace has access to languages, tools and type systems that could implement the sensitive parts of the daemon with much higher safety.

That said kdbus looks like a useful primitive, and getting rid of most of the copies and context switches could be a great enabler; users could put some extra security boundaries without worrying about throughput or latency.

ALS: Linux interprocess communication and kdbus

Posted May 31, 2013 18:06 UTC (Fri) by brouhaha (subscriber, #1698) [Link]

The networking people are opposed to AF_BUS because it sends messages on a single system? Sounds like a specious argument to me. Why aren't they removing AF_UNIX?

ALS: Linux interprocess communication and kdbus

Posted May 31, 2013 22:52 UTC (Fri) by gregkh (subscriber, #8) [Link]

That's not why they rejected AF_BUS, see the archives of the netdev mailing list for the full details as to why it was rejected if you are curious.

Google invented Bloom filters all the way back in 1970

Posted Jun 1, 2013 0:45 UTC (Sat) by Tester (guest, #40675) [Link]

The real interesting bit here is that Google has leaked that they have time travel, since they placds Blooms filters into Burton Howard Bloom's head back in 1970.

Google invented Bloom filters all the way back in 1970

Posted Jun 18, 2013 9:11 UTC (Tue) by pixelpapst (guest, #55301) [Link]

Maybe Google don't have time travel per se, but have positronic thought inducers capable of planting an idea into a past individual's brain. Or they will have.

Good thing Burton wasn't wearing a tinfoil hat at the time, or KDBUS would be stuck with boring old hash tables as delivered into Ershov's head by the USSR/ASIEN empire.

ALS: Linux interprocess communication and kdbus

Posted Jun 7, 2013 5:27 UTC (Fri) by swetland (guest, #63414) [Link]

pmem is obsolete and I'm not entirely sure why it found its way into staging in the first place.

ashmem provides anonymous, shareable (via fd passing and mmap) memory without requiring /tmp, without having to worry about cleaning up after processes that leave stuff in /tmp, etc. It also provides a mechanism for processes to indicate which chunks of these regions may be reclaimed by the kernel in low memory situations via the unpin/pin mechanism. Android relies on it in a number of places and that is not likely to change, but if all the same semantics were provided via a slightly different mechanism, it could be migrated to.

ALS: Linux interprocess communication and kdbus

Posted Jun 14, 2013 19:51 UTC (Fri) by jstultz (subscriber, #212) [Link]

Yea, I don't know why he brought up pmem. And its not in staging (as far as I'm aware).

And yea, I also don't understand the "lazy" characterization of ashmem either.

I'm hoping the volatile range work will provide the same functionality as pinning/unpinning (which I think is a really cool feature of ashmem) in a more generic way.

But there is still the need for getting passable fds to anonymous memory without having a tmpfs mount. And until that is solved I suspect we'll have to keep ashmem (although with the pin/unpin logic removed, it will be much simplified).

ALS: Linux interprocess communication and kdbus

Posted Jun 7, 2013 11:35 UTC (Fri) by tibsnjoan (guest, #3800) [Link]

As to KBUS.

I'm one of the principle developers, and current maintainers, of KBUS. It was developed in-house at Kynesim (my employers) and is still used by us and our customers. Sadly, we are no longer students. We did make some attempt to submit it to the kernel (and Jonathan Corbet even wrote "The interface is ... creative"). Various people were helpful with their comments, but the impression we got was that if we didn't use sockets it wasn't going in.

A lack of an existing visible user base also (quite correctly) made the LKML sceptical of our version of sliced bread.

If anyone is interested in merging it, we'd be happy to look to that, but I'm now going to go away and read up on KDBUS.

ALS: Linux interprocess communication and kdbus

Posted Jun 9, 2013 18:07 UTC (Sun) by raalkml (guest, #72852) [Link]

kdbus has somewhat interesting interface:

https://plus.google.com/109267255479262056080/posts/eN8Sd...

Why such a strange namespaces design for the API?
The entry points of lower level API (bus) are higher up in the /dev/kdbus hierarchy than the higher level API (namespaces)!
The bus API implementation even seems to had to add the uid to avoid name clashes with namespace API (namely "ns" directory, under which the namespaces are collected).
I.e. why not /dev/kdbus/ns/{namespaces..}/bus/{buses...}/entries/{entries}? This way there may be no clash between the interface and user-affected elements in the API: namespaces and buses would be in directories completely for themselves.

ALS: Linux interprocess communication and kdbus

Posted Jun 10, 2013 17:16 UTC (Mon) by vrfy (guest, #13362) [Link]

> Higher level, lower level API?

A set of buses *is* a namespace. What's high and low here?

> Clashes with the "namespace API"?

What namespace API? There is none besides the one "control" node in
every namespace.

Namespaces should just look exactly like the host's /dev/kdbus/ interface,
there is no separation of buses and namespaces, buses are just *in* a namespace.

I don't really understand anything in that text. :)

ALS: Linux interprocess communication and kdbus

Posted Oct 16, 2013 8:18 UTC (Wed) by mirabilos (subscriber, #84359) [Link]

Urgh!

This all assumes someone would want to use D-Bus *at all*. /me shudders

http://mid.gmane.org/20131016072853.GA1077@dinah

ALS: Linux interprocess communication and kdbus

Posted Oct 16, 2013 14:38 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

The responses on that thread don't seem to agree with your feelings. Personally, I run a system bus and not a session bus (until kid3 kicks one into existence), but kdbus doesn't seem like the end of the world. So my questions to you are: what sucks so much about kdbus, how would you fix it, and if you'd remove it, how would you do what it allows you to do now just as easily (mainly the signal notification and RPC calls)? Raw socket communication? Please, then we'd need some kind of parser and serializer anyways to avoid all kinds of bugs at which point you're half way to dbus anyways.

SIMPL is an userspace implementation which may not be as efficient as the original QNX's implementation

Posted Mar 3, 2015 15:25 UTC (Tue) by dirco (guest, #92179) [Link]

> A lot of developers in the automotive world have
> used QNX, which has a nice message-passing model.
> You can send a message and pass control to another
> process, which is good for realtime and single
> processor systems, Kroah-Hartman said. Large
> automotive companies have built huge systems on
> top of QNX messages, creating large libraries used by
> their applications. They would like to be able to use
> those libraries on Linux, but often don't know that there
> is a way to get the QNX message API for Linux. It is
> called SIMPL and it works well.

QNX message passing needs kernel support, as we can see in Wikipedia
QNX article:
https://en.wikipedia.org/w/index.php?title=QNX&oldid=...
"QNX interprocess communication consists of sending a message from one
process to another and waiting for a reply. This is a single
operation, called MsgSend. The message is copied, by the kernel, from
the address space of the sending process to that of the receiving
process.
If the receiving process is waiting for the message, control of the
CPU is transferred at the same time, without a pass through the CPU
scheduler. Thus, sending a message to another process and waiting for
a reply does not result in "losing one's turn" for the CPU.
This tight integration between message passing and CPU scheduling is
one of the key mechanisms that makes QNX message passing broadly
usable. Most Unix and Linux interprocess communication mechanisms lack
this tight integration ... Mishandling of this subtle issue is a
primary reason for the disappointing performance of some other
microkernel systems such as early versions of Mach."
It points out transferring CPU control directly from the sender process
to the receiving process is the key step improving QNX IPC performance,
which is in general not possible be implemented by merely userspace
program on Linux.

According to mails to SIMPL's group, I have confirmed that SIMPL is
just an userspace implementation:
"The main SIMPL source tree is all in user space. For most
SIMPL users the performance is adequate and the codebase is much more portable
and easy to maintain.

There are kernel implementations of SRR that are compatible with SIMPL but
none of those are recent.

There is is also a SIPC variation. I know the developer who wrote this one.
If you are interested I could connect you two and see if he has more recent
stuff."
so what we need would be a kernel implementation (like aforementioned SRR
or SIPC) which integrates the special scheduling into the kernel if we want to
have comparable performance with the original implementation.


Copyright © 2013, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds