|
|
Subscribe / Log in / New account

Parallel page rendering with Mozilla Servo

By Nathan Willis
June 17, 2015
LinuxCon Japan

The free-software community may not have excessively many browsers to choose from these days, but there are even fewer choices for developers in search of a web rendering engine—WebKit and Blink dominate the category. Thus, Mozilla's Servo project is interesting for providing another option to consider and, as Lars Bergstrom and Mike Blumenkrantz explained in their LinuxCon Japan talk [PDF], Servo also takes on some new challenges. It is a new engine written with the needs of modern web usage in mind—namely, security and extensive parallelism.

[Lars Bergstrom]

Bergstrom works at Mozilla Research, while Blumenkrantz is from Samsung; Servo is a joint venture of the two organizations. Bergstrom started the session by addressing the fundamental "why" question: why create a new web-rendering engine from scratch then there are several robust engines already (namely WebKit, Blink, and Gecko)? The answer, he said, is that all of the existing engines were designed prior to 2000. The state of the web has changed significantly since then.

One of the biggest such shifts can be seen in the fact that existing browser engines provide parallelism only at an extremely coarse level: separate processes for each tab. In recent years, the main trend in CPU development has been adding more cores, so there are important gains to be made by parallelizing more stages in each page-rendering task. These gains are particularly noticeable on mobile devices, he said, which often run at lower clock speeds and have multiple CPU cores sitting idle. The existing browser engines also tend to be rather monolithic in design, with tightly coupled components that cannot be improved or replaced easily.

The second justification for the project is that the team believes it can improve security by working in the Rust language, taking advantage of Rust's inherent memory safety. Memory handling is the source of most browser security issues, they said, stemming largely from C++'s memory model. Finally, he said, while Gecko used to be maintained as an embeddable component, Mozilla long ago dropped that feature and focused on making Gecko a highly tuned engine for Firefox. Mozilla's commitment, he said, is to making the web better, and that means exploring a rendering engine designed for embedding in other projects.

Bergstrom then spent a few minutes discussing Rust and, in particular, its memory-safety features. Rust is designed to provide memory safety without overhead, he said: no reference counting is required, no garbage collection is needed, and none of the "smart" pointer classes "that litter most C++ codebases" are necessary. Rust's memory safety comes through its type system: every value has an owner, and if ownership changes hands, the previous owner can no longer access the object through references. Thus, there are no race conditions and no need to guard against them with locks.

The Rust toolchain provides static checks that will catch attempts at violating ownership. While Rust's model means there are some types of code you can't write, he said, it uses essentially the same syntax as C++, except that it is safe for concurrency.

Rendering in parallel

Utilizing Rust's concurrency, of course, is the goal of using it in Servo. Bergstrom showed a block diagram of the page-rendering process: a page's HTML, CSS, and JavaScript are first parsed, then a document object model (DOM) constructed. The DOM is handed to the layout engine, where the individual pieces are rendered and displayed. The Servo team investigated where would be the best point in that process to speed matters up with parallelism—and it turned out that the answer was "anywhere."

The reason this is the case is that most modern web pages are composed of multiple independent blocks: <iframe>s and <div>s, for example, many of which are loaded separately. Consider Pinterest and Reddit, he said: the sites load dozens of separate page elements that are independent, and generally of predictable size. Rendering each in a separate thread allows Servo to complete the entire DOM tree in minimal time.

In Servo's rendering architecture, each tab is a "constellation" that gets assigned a number of rendering pipeline threads. The tab's contents are broken down into a DOM tree, then some number of pipelines are started and begin running a work-stealing algorithm to process the nodes in the tree. Each of the pipelines has its own renderer, script engine, and layout engine, although they must (by necessity) share some items that are reused in the page. Bergstrom and Blumenkrantz showed a live demo of Servo keeping an animated image looping smoothly on one side of a page while it continued to dynamically lay out and render the remainder of the page.

The amount of the speed-up experienced when using Servo instead of Gecko depends on how many pipeline threads are started up, and comparisons can be difficult at present because Servo does not yet handle as many HTML and CSS elements as Gecko. But the results are already impressive; the team said that Servo provides a 2x speed increase when rendering the CNN homepage and a 3x speed-up on Reddit. The tricky part is determining which parts of the DOM can be split off to be rendered separately, but even "worst-case scenario" sites like Wikipedia (which is one long page of text) can be parallelized to some degree—sidebars, headers, and image blocks can all be handled independently.

Through testing, the team discovered another benefit that was not obvious at the outset: parallelism results in power savings. Multiple threads working in parallel on a page-rendering job allow the CPU to complete the entire page in the same amount of time while running at a lower frequency.

Making it embeddable

[Mike Blumenkrantz]

Blumenkrantz then discussed the embeddability of Servo. Project members decided they wanted to avoid the mistakes that other browser-engine projects have made, he said. The most oft-cited complaint about WebKit and Blink is that they have rapidly changing APIs and ABIs. "So you have to either be constantly updating your code or you have to package up the full engine with your application." Consequently, the Servo team wanted to establish a commitment to a reliable API.

They discovered Chromium Embedded Framework (CEF), a small project attempting to make an API-stable engine from Google's Chromium. CEF does not have many users (at least, not many compared to WebKit and Blink), but the users it does have are rather large ones, such as Adobe and Steam. The team decided to commit to supporting the same API as CEF. That would drastically simplify testing, since Servo could be a drop-in replacement for CEF in real workloads and with unit tests.

The implementation was a long and rough process, he said, starting with a massive list of issues in the GitHub repository, and grinding through the list one at a time—using LD_PRELOAD hacks to test individual components. Servo now has full symbol and ABI coverage for CEF's interfaces, he said, although some interfaces are placeholders. The functions are implemented as foreign function interfaces in Rust and all of the structs and memory alignments are identical. The work has now turned toward enforcing identical behavior, he said, like ensuring that two callbacks always fire in the same order in Servo that they do in CEF.

Bergstrom noted that Servo now passes the Acid1 and Acid2 tests and has implemented about 85% of the CSS2 specification. For those tracking feature parity on such matters, he said, "we're creeping right up on Internet Explorer 6." Progress is just a matter of knocking out more and more DOM and CSS features. Over the course of the next quarter, the team is focusing on features that demonstrably break real-world sites—such as the invisible pop-ups that CNN.com uses for trackers. There is still work to be done improving performance, especially in the graphics pipeline, but the team hopes to have an alpha release before the end of 2015. "I wouldn't recommend logging in to your bank with it," he said, "but it should be usable as a basic browser."

Both speakers encouraged interested volunteers to get involved. The entire project is developed in the open at GitHub, they said, and the team prides itself on maintaining a list of "easy bugs" (such as fixing function names) that are actually easy. As the roadmap indicates, while there is no plan to port desktop Firefox over to Servo, individual Servo components could make their way into Gecko. Bergstrom and Blumenkrantz also noted that Servo may have a future with Firefox for Android and Firefox OS; it is already testing on those platforms, via a Java wrapper on Android and over Firefox OS's interprocess communication (IPC) system.

There were a few technical questions from the audience at the end of the session. One attendee asked whether or not all of Servo was written in Rust. The speakers replied that Servo uses Mozilla's existing JavaScript engine, SpiderMonkey, as well as its Azure library for 2D graphics drawing (both of which are C++ projects). Another audience member asked if they had explored other forms of parallelism, such as GPU parallelism. The team had explored it, they replied, but found that as of now, the I/O overhead required to move page contents in and out of the GPU erased all performance gains. They had, however, found some additional performance gains through using SIMD instructions to compute bounding boxes, but that implementation was on hold for now because it would require altering a lot of data structures.

On a final note, Bergstrom and Blumenkrantz answered one audience question about the suitability of Rust for new projects. They responded that the tool support is good—the compiler could use some speed improvements, but it works well with Valgrind, GDB, and other standard tools. For someone used to writing C++ today, Rust is well worth considering. Developers who are already working well in Go or Python may find it a hard adjustment to make, though, since the ownership-based memory model is not as easy to use as garbage collection. In any case, they said, "don't bet the whole team on it. Start it as an experiment first." That seems to be the approach Mozilla is taking, although the signs point to the Servo experiment as having a bright future, at least on some platforms.

[The author would like to thank the Linux Foundation for travel assistance to attend LCJ 2015.]

Index entries for this article
ConferenceLinuxCon Japan/2015


(Log in to post comments)

Parallel page rendering with Mozilla Servo

Posted Jun 18, 2015 15:09 UTC (Thu) by mcatanzaro (subscriber, #93033) [Link]

I should note that WebKitGTK+ provides API and ABI stability for WebKit (with the exception of the removal of the original WebKit1 API last year, a major one-time event).

Wishing the best of luck to Servo -- it's hard to understate how impressive it will be to have a rendering engine that's immune to all the most common security vulnerabilities, once it's matured enough for use by a major browser.

Parallel page rendering with Mozilla Servo

Posted Jun 18, 2015 16:25 UTC (Thu) by marcH (subscriber, #57642) [Link]

> but even "worst-case scenario" sites like Wikipedia (which is one long page of text) can be parallelized to some degree

Well, maybe this is partly because Wikipedia, using very few bells and whistles, is also some kind of "best case" for today's single-threaded renderers, no? As in: the type of pages that is *already* rendered blazing fast and the the least needing optimizations.

(Fascinating work and great article - thx)

Parallel page rendering with Mozilla Servo

Posted Jun 18, 2015 18:02 UTC (Thu) by metajack (guest, #90500) [Link]

It's not that it's one long page of text. It's the floats that kill parallel performance. Wikipedia's sidebar is one big unclosed float.

For contrast, the mobile wikipedia page parallelizes amazingly well, and it's the same text :)

Parallel page rendering with Mozilla Servo

Posted Jun 27, 2015 17:04 UTC (Sat) by excors (subscriber, #95769) [Link]

> Another audience member asked if they had explored other forms of parallelism, such as GPU parallelism. The team had explored it, they replied, but found that as of now, the I/O overhead required to move page contents in and out of the GPU erased all performance gains.

I wonder if they explored it on mobile SoC GPUs, or just discrete GPUs on PCs?

My understanding is that discrete GPUs suffer from high latency (~10usec or more?) for accesses over PCI Express. That latency is negligible if you can copy a very large batch of data into the GPU's VRAM at once and process it all on the GPU, but some tasks can't be easily collected into large batches and will be limited by the latency; and designing data structures to be efficiently copied can be hard (you probably need to pack all the data for one task into as few 4KB pages as possible and avoid pointers).

Mobile SoCs can't afford dedicated VRAM, so their GPUs just use system RAM (plus a wide variety of small caches inside the GPU), and that means they are designed with relatively efficient access to RAM (latency <1usec?). There's rarely any need to copy data, since a page of physical memory can be mapped into CPU and GPU at the same time. Some hardware has full cache coherency between the GPU and CPU so you don't even have the flush/invalidate cost.

Modern hardware and compute APIs (OpenCL 2.0+, CUDA, HSA) support shared virtual memory, where the GPU and CPU essentially use the same page tables, so the CPU can construct a complex data structure full of pointers and the GPU can access it directly with no special pointer translation and no copying.

In principle, all those features should reduce the cost and difficulty of offloading work onto the GPU. And the GPU typically has much higher FLOPS and higher FLOPS-per-watt than the CPU, so it's worth using it when you can.

In practice I suspect support for those features on mobile devices is currently somewhere between spotty and non-existent. But judging by most phone reviews, web browser benchmarks help sell phones, so a browser that benefits from these new features might encourage the SoC vendors to support the features better, which would be nice...

Parallel page rendering with Mozilla Servo

Posted Jun 27, 2015 18:19 UTC (Sat) by raven667 (subscriber, #5198) [Link]

Pretty much every x86_64 CPU has a GPU on-die as well, just like ARM SoC usually do, even if you are using a remote GPU over PCIe to actually drive the displays. If you treat your OpenCL programs as just an extension of your x86_64 programs, like the FPU or SSE, you can use whole sections of the CPU hardware that would otherwise be sitting idle. There is no requirement that you have a discrete GPU.

Parallel page rendering with Mozilla Servo

Posted Jun 27, 2015 23:49 UTC (Sat) by zlynx (guest, #2285) [Link]

Oddly this is true of the mid and low-end x86_64 chips but not the high-end CPUs.

For example, the Xeon CPUs and the 5960x do not contain an Intel GPU. The AMD FX chips don't have GPUs either. I suppose that they give up the GPU for more cores and cache.

In the future though, I think the high-end CPUs will pick up the GPU as well, just for the features like accelerated video codecs for remote desktop access.


Copyright © 2015, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds