Reusing Previous Computations | Web Browser Engineering

Compositing (see Chapter 13) makes animations smoother, but it doesn’t help with interactions that affect layout, like text editing or DOM modifications. Luckily, we can avoid redundant layout work by treating the layout tree as a kind of cache, and only recomputing the parts that change. This invalidation technique is traditionally complex and bug-prone, but we’ll use a principled approach and simple abstractions to make it manageable.

Editing Content

In Chapter 13, we used compositing to smoothly animate CSS properties like transform or opacity. But we couldn’t animate layout-inducing properties like width or font-size this way because they change not only the display list but also the layout tree. And while it’s best to avoid animating layout-inducing properties, many user interactions that change the layout tree need to be responsive.

One good example is editing text. People type pretty quickly, so even a few frames’ delay is distracting. But editing changes the HTML tree and therefore the layout tree. Rebuilding the layout tree from scratch, which our browser currently does, can be very slow on complex pages. Try, for example, loading the web version of this chapter in our browser and typing into the input box that appears after this paragraph … You’ll find that it is much too slow—1.7 seconds just in render (see Figure 1)!Trace here.

Typing into input elements could be special-cased,The input element doesn’t change size as you type, and the text in the input element doesn’t get its own layout object, so typing into an input element doesn’t really have to induce layout, just paint. but there are other text editing APIs that can’t be. For example, the contenteditable attribute makes any element editable.The contenteditable attribute can turn any element on any page into a living document. It’s how we implemented the “typo” feature for this book: type Ctrl-E (or Cmd-E on a Mac) to turn it on. The source code is on the website; see the typo_mode function for the contenteditable attribute.

Let’s implement the most basic possible version of contenteditable in our browser—it’s a useful feature and also a good test of invalidation. To begin with, we need to make elements with a contenteditable property focusable:Actually, in real browsers, contenteditable can be set to true or false, and false is useful in case you want to have a non-editable element inside an editable one. But I’m not going to implement that in our browser.

Once we’re focused on an editable node, typing should edit it. A real browser would handle cursor movement and all kinds of complications, but I’ll keep it simple and just add each character to the last text node in the editable element. First we need to find that text node:

Note that if the editable element has no text children, we create a new one. Then we add the typed character to this element:

This is enough to make editing work, but it’s convenient to also draw a cursor to confirm that the element is focused and show where edits will go. Let’s do that in BlockLayout:

You can now edit the examples on this chapter’s page in your browser—but each key stroke will take more than a second, making for a frustrating editing experience. So let’s work on speeding that up.

Why Invalidation?

Fundamentally, the reason editing this page is slow in our browser is that it’s pretty big. After all, it’s not handling the keypress that’s slow: appending a character to a Text node takes almost no time. What takes time is re-rendering the whole page afterward.

We want interactions to be fast, even on large, complex pages, so we want re-rendering the page to take time proportional to the size of the change, and not proportional to the size of the page. I call this the principle of incremental performance, and it’s crucial for handling large and complex web applications. Not only does it make text editing fast, it also means that developers can think about performance one change at a time, without considering the contents of the whole page. Incremental performance is therefore necessary for complex applications.

But the principle of incremental performance also really constrains our browser implementation. For example, even traversing the whole layout tree would take time proportional to the whole page, not the change being made, so we can’t even afford to do that.

To achieve incremental performance, we’re going to need to think of the initial render and later re-renders differently.While initial and later renders are in some ways conceptually different, they’ll use the same code path. Basically, the initial render will be one big change from no page to the initial page, while later re-renders will handle smaller changes. After all, a page could use innerHTML to replace the whole page; that would be a big change, and rendering it would take time proportional to the whole page, because the change is the size of the whole page! The point is: all of these will ultimately use the same code path. When the page is first loaded, rendering will take time proportional to the size of the page. But we’ll treat that initial render as a cache. Later renders will invalidate and recompute parts of that cache, taking time proportional to the size of the change, but won’t touch most of the page.I’m sure there are all sorts of performance improvements possible without implementing the invalidation techniques from this chapter, but invalidation is still essential for incremental performance, which is a kind of asymptotic guarantee that micro-optimization alone won’t achieve. In a real browser, every step of the rendering pipeline needs to be incremental, but this chapter focuses on layout.Why layout? Because layout is both important and complex enough to demonstrate most of the core challenges and techniques.

The key to this cache-and-invalidate approach will be tracking the effects of changes. When one part of the page, like a style attribute, changes, other things that depend on it, like that element’s size, change as well. So we’ll need to construct a detailed dependency graph, down to the level of each layout field, and use that graph to determine what to recompute. It will be similar to our needs_style and needs_layout flags, scaled way up. Most of this chapter is thus about tracking dependencies in the dependency graph, and building abstractions to help us do that. To use those abstractions, we’ll need to refactor our layout engine significantly. But incrementalizing layout will allow us to skip the two most expensive parts of layout: building the layout tree and traversing it to compute layout fields. When we’re done, re-layout will take under a millisecond for small changes like text editing.

Idempotence

If we want to implement this caching-and-invalidation idea, the first roadblock is that our browser rebuilds the layout tree from scratch every time the layout phase runs:

By starting over with a new DocumentLayout, we ignore all of the old layout information and start from scratch; we are essentially invalidating the whole tree. So our first optimization has to be avoiding that, reusing as many layout objects as possible. That both saves time allocating memory and makes the caching-and-invalidation approach possible by keeping around the old layout information.

But before jumping right to coding, let’s review how layout objects are created. Search your browser code for Layout, which all layout class names end with. You should see that layout objects are created in just a few places:

Let’s start with DocumentLayout. It’s created in render, and its two parameters, nodes and self, are the same every time. This means that identical DocumentLayouts are created each time.This wouldn’t be true if the DocumentLayout constructor had side-effects or read global state, but it doesn’t do that. That’s wasteful; let’s create the DocumentLayout just once, in load:

Once again, the constructor parameters cannot change, so again we can skip reconstructing this layout object, like so:

But don’t run your browser with these changes just yet! By reusing layout objects, we end up running layout multiple times on the same object. That’s not how layout is intended to work, and it causes all sorts of weird behavior. For example, after the DocumentLayout creates its child BlockLayout, it appends it to the children array:

The issue here is called idempotence: repeated calls to layout shouldn’t repeatedly change state. More formally, a function is idempotent if calling it twice in a row with the same inputs and dependencies yields the same result. Assigning a field is idempotent: assigning the same value for a second time is a no-op. But methods like append aren’t idempotent.

We’ll need to fix any non-idempotent method calls. In DocumentLayout, we can switch from append to assignment:

BlockLayout also calls append on its children array. We can fix that by resetting the children array in layout. I’ll put separate reset code in the block and inline cases:

This makes the BlockLayout’s layout function idempotent because each call will start over from a new children array.

Before we try running our browser, let’s read through all of the other layout methods, noting any subroutine calls that might not be idempotent. I found:If you’ve being doing exercises throughout this book, there might be more, in which case there might be more calls. In any case, the core idea is replacing non-idempotent calls with idempotent ones.

The new_line and add_inline_child methods are only called through layout, which resets the children array, so they don’t break idempotency. The get_font function acts as a cache, so multiple calls return the same font object, maintaining idempotency. And dpx just does math, so it always returns the same result given the same inputs. In other words all of our layout methods are now idempotent.

It’s therefore safe to call layout multiple times on the same object—which is exactly what we’re now doing. More generally, since it doesn’t matter how many times an idempotent function is called, we can skip redundant calls! That makes idempotency the foundation for the rest of this chapter, which is all about skipping redundant work.

Dependencies

So far, we’re only reusing two layout objects: the DocumentLayout, and the root BlockLayout. Let’s look at the other BlockLayouts, created here:

This code is a little more complicated than the code that creates the root BlockLayout: the child and previous arguments come from node.children, and that children array can change—as a result of contenteditable edits or innerHTML calls.Or any other exercises and extensions that you’ve implemented. Moreover, in order to even run this code, the node’s layout_mode has to be block, and layout_mode itself also reads the node’s children.It also looks at the node’s tag and the node’s children’s tags, but tags can’t change, so we don’t need to think about them as dependencies. In invalidation we care only about dependencies that can change. This makes it harder to know when we need to recreate the BlockLayouts.

Recall that idempotency means that calling a function again with the same inputs and dependencies yields the same result. Here, the inputs can change, so we can only avoid redundant re-execution if the node’s children field hasn’t changed. So we need a way of knowing whether that children field has changed. We’re going to use a dirty flag:

We’ve seen dirty flags before—like needs_layout and needs_draw—but layout is more complex and we’re going to need to think about dirty flags a bit more rigorously.

Every dirty flag protects a certain field; this one protects a BlockLayout’s children field. A dirty flag has a certain life cycle: it can be set, checked, and reset. The dirty flag starts out True, and is set to True when an input or dependency of the field changes, marking the protected field as unusable. Then, before using the protected field, the dirty flag must be checked. The flag is reset to False only when the protected field is recomputed.

So let’s analyze the children_dirty flag in this way. Dirty flags have to be set if any dependencies of the fields they protect change. In this case, the dirty flag protects the children field of a BlockLayout, which in turn depends on the children field of the associated Element. That means that any time an Element’s children field is modified, we need to set the dirty flag for the associated BlockLayout:

Likewise, we need to set the dirty flag any time we edit a contenteditable element, since that can also affect the children of a node:

It’s important that all dependencies of the protected field set the dirty bit. This can be challenging, since it requires being vigilant about which fields depend on which others. But if we do forget to set the dirty bit, we’ll sometimes fail to recompute the protected fields, which means we’ll display the page incorrectly. Typically these bugs look like unpredictable layout glitches, and they can be very hard to debug—so we need to be careful.

Anyway, now that we’re setting the dirty flag, the next step is checking it before using the protected field. BlockLayout uses its children field in three places: to recursively call layout on all its children, to compute its height, and to paint itself. Let’s add a check in each place:

It’s tempting to skip these assertions, since they should never be triggered, but coding defensively like this catches bugs earlier and makes them easier to debug. It’s very easy to invalidate fields in the wrong order, or skip a computation when it’s actually important, and you’d rather that trigger a crash rather than a subtly incorrect rendering—at least when debugging a toy browser!Real browsers prefer not to crash, however—better a slightly wrong page than a browser that is crashing all the time. So in release mode browsers turn off these assertions, or at least make them not crash the browser.

Finally, when the field is recomputed we need to reset the dirty flag. Here, we reset the flag when we’ve recomputed the children array:

Now that we have all three parts of the dirty flag done, you should be able to run your browser and test it on this chapter’s page. Even when you edit text or call innerHTML, you shouldn’t see any assertion failures. Work incrementally and test often—it makes debugging easier.

Now that the children_dirty flag works correctly, we can rely on it to avoid redundant work. If children isn’t dirty, we don’t need to recreate the BlockLayout children:

If you add a print statement inside that inner-most if, you’ll see console output every time BlockLayout children are created. Try that out while editing text: it shouldn’t happen at all, and editing will be slightly smoother.

Protected Fields

Dirty flags like children_dirty are the traditional approach to layout invalidation, but they have downsides. Using them correctly means paying attention to the dependencies between fields and knowing when each field is read from and written to. And it’s easy to forget to check or set a dirty flag, which leads to hard-to-find bugs. In our simple browser it could probably be done, but a real browser’s layout system is much more complex, and mistakes become almost impossible to avoid.

A better approach exists. First of all, let’s try to combine the dirty flag and the field it protects into a single object:

That clarifies which dirty flag protects which field. Let’s replace our existing dirty flag with a ProtectedField:

Next, let’s add methods for each step of the dirty flag life cycle. I’ll say that we mark a protected field to set its dirty flag:

Note the early return: marking an already dirty field doesn’t do anything. That’ll become relevant later. Now call mark in innerHTML_set and keypress:

Before “get”-ting a ProtectedField’s value, let’s check the dirty flag:

Now we can use get to read the children field in layout and in lots of other places besides:

The nice thing about get is that it makes the dirty flag operations automatic, and therefore impossible to forget. It also makes the code a little nicer to read.

Finally, to reset the dirty flag, let’s make the caller pass in a new value when “set”-ting the field. This guarantees that the dirty flag and the value are updated together:

Unfortunately, using set will require a bit of refactoring. For example, in BlockLayout, we’ll need to build the children array in a local variable and then set the children field at the end:

But the benefit is that set, much like get, automates the dirty flag operations, making them hard to mess up. That makes it possible to think about more complex and ambitious invalidation algorithms in order to make layout faster.

Recursive Invalidation

Let’s leverage the ProtectedField class to avoid recreating all of the LineLayouts and their children every time inline layout happens. It all starts here:

The new_line and recurse methods, and the helpers they call like word, input, iframe, image, and add_inline_child, handle line wrapping: they check widths, create new lines, and so on. We’d like to skip all that if the children field isn’t dirty, but this will be a bit more challenging than for block layout mode: lots of different fields are read during line wrapping, and the children field depends on all of them.

Converting all of those fields into ProtectedFields will be a challenging project. We’ll take it bit by bit, starting with zoom, which almost every method reads.

However, in BlockLayout, the zoom value comes from its parent’s zoom field. We might be tempted to write something like this:

However, recall that with dirty flags we must always think about invalidating them (with mark), checking them (with get), and resetting them (with set). We’ve added get and set, but who marks the zoom dirty flag?Without marking them when they change, we will incorrectly skip too much layout work.

We mark a field’s dirty flag when its dependency changes. For example, innerHTML_set and keypress change the HTML tree, which the layout tree’s children field depends on, so those handlers call mark on the children field. Since a child’s zoom field depends on its parents’ zoom field, we need to mark all the children when the zoom field changes. So in DocumentLayout, we have to do:

But now we’re back to manually calling methods and trying to make sure we don’t forget a call. What we need is something seamless: set-ting a field should automatically mark all the fields that depend on it.

To do that, each ProtectedField will need to track all fields that depend on it, called its invalidations:

For example, we can add the child’s zoom field to its parent’s zoom field’s invalidations:

Then, to automate the mark call, let’s add a notify method to mark each invalidation:

That’s progress, but it’s still possible to forget to add the invalidation in the first place. We can automate it a little further. Think: why does the child’s zoom need to depend on its parent’s? It’s because we get the parent’s zoom when computing the child’s. So adding the invalidation can happen as part of get! Let’s make a variant of get called read with a notify parameter for the field to invalidate if the field being read changes:

Now the zoom computation just needs to use read, and all of the marking and dependency logic will be handled automatically:

In fact, this pattern where we just copy our parent’s value is pretty common, so let’s add a shortcut for it:

BlockLayout also reads from the zoom field inside the input, image, iframe, word, and add_inline_child methods, which are all part of computing the children field. In those methods, we can use read to both get the zoom value and also invalidate the children field if the zoom value ever changes:

Do the same in each of the other methods mentioned above. Also, go and protect the zoom field on every other layout object type (there are now quite a few!) using copy in place of writes and read in place of gets. Run your browser and make sure that nothing crashes, even when you increase or decrease the zoom level, to make sure you got it right.

Now—protecting the zoom field did not speed our browser up. We’re still copying the zoom level around, plus we’re now doing some extra work checking dirty flags and updating invalidations. But protecting the zoom field means we can invalidate children, and other fields that depend on it, when the zoom level changes, which will help tell us when we have to rebuild LineLayout and TextLayout elements.

Protecting Widths

Another field that line wrapping depends on is width. Let’s convert that to a ProtectedField, using the new read method along the way. Like zoom, width is initially set in DocumentLayout:

The width field is read during line wrapping. For example, add_inline_child needs it to determine whether to add a new line. We’ll use read to set up that dependency:

While we’re here, note that the decision for whether or not to add a new line also depends on w, which is an input to add_inline_child. If you look through add_inline_child’s callers, you’ll see that most of the time, this argument just depends on zoom, but in word it depends on a font object:

Note that the font depends on the node’s style, which can change, for example via the style_set function. To handle this, we’ll need to protect style:

The style field is computed in the style method, which computes a new style dictionary over multiple phases. Let’s build that new dictionary in a local variable, and set it at the end:

Inside style, one code path reads from the parent node’s style. We need to mark dependencies in these cases:

Then style_set can mark the style field:We would ideally make the style attribute a protected field, and have the style field depend on it, but I’m taking a short-cut in the interest of simplicity.

Finally, in word (and also in similar code in add_inline_child) we can depend on the style field:

Make sure all other uses of the style field use either read or get; it should be pretty clear which is which.

We’ve now protected all of the fields read during line wrapping. That means the children field’s dirty flag now correctly tracks whether line-wrapping can be skipped. Let’s make use of that:

We also need to make sure we now only modify children via set. That’s a problem for add_inline_child and new_line, which currently append to the children field. There are a couple of possible fixes, but in the interests of expediency,Perhaps the nicest design would thread a local children variable through all of the methods involved in line layout, similar to tree_to_list. I’m going to use a second, unprotected field, temp_children, to build the list of children, and then set it as the new value of the children field at the end:

Note that I reset temp_children once we’re done with it, to make sure that no other part of the code accidentally uses it. This way, new_line can modify temp_children, which will eventually become the value of children:

Thanks to these fixes, our browser now avoids rebuilding any part of the layout tree unless it changes, and that should make re-layout somewhat faster. If you’ve been going through and adding the appropriate read and get calls, your browser should be close to working. There’s one tricky case: tree_to_list, which might deal with both protected and unprotected children fields. I fixed this with a type test:

With all of these changes made, your browser should work again, and it should now skip line layout for most elements.

Note that we have quite a few protected fields now, but we only skip recomputing children based on dirty flags. That’s because recomputing children is slow, but most other fields are really fast to compute. Checking dirty flags takes time and adds code clutter, so we only want to do it when it’s worth it.

Widths for Inline Elements

At this point, BlockLayout has a protected width field, but other layout object types do not. Let’s fix that, because we’ll need it later. LineLayout is pretty easy:

In TextLayout, we again need to handle font (and hence have width depend on style):

There’s also a reference to width in the layout method for computing x positions. For now you can just use get here.

Finally, there are the various types of replaced content. In InputLayout, the width only depends on the zoom level:

IframeLayout and ImageLayout are very similar, with the width depending on the zoom level and also the element’s width and height attributes. So, we’ll need to invalidate the width field if those attributes are changed from JavaScript:

Otherwise, IframeLayout and ImageLayout are handled just like InputLayout. Search your code to make sure you’re always interacting with width via methods like get and read, and check that your browser works, including testing user interactions like contenteditable.

Invalidating Layout Fields

While we’re here, let’s take a moment to protect all of the other layout fields, including x, y, and height. Once we’ve done that, we’ll be ready to talk about speeding up layout even further by skipping unnecessary traversals.

As with width, let’s start with DocumentLayout and BlockLayout. First, x and y positions. In DocumentLayout, just use set:

A BlockLayout’s x position is just its parent’s x position, so we can just copy it over:

Let’s also do heights. For DocumentLayout, we just read the child’s height:

Note that in this last code block, we first read the children field, then iterate over the list of children and read each of their height fields. The height field, unlike the previous layout fields, depends on the children’s fields, not the parent’s (see Figure 2).

So that’s all the layout fields on BlockLayout and DocumentLayout. Do go through and fix up these layout types’ paint methods (and also the DrawCursor helper)—but note that the browser won’t quite run right now, because the BlockLayout assumes its children’s height fields are protected, but if those fields are LineLayouts they aren’t. Let’s get to that next.

Protecting Inline Layout

We need to protect LineLayouts’, TextLayouts’, and EmbedLayouts’ fields too, and their layout methods work a little differently. Yes, each of these layout objects has x, y, and height fields, but they also compute font, ascent, and descent fields that are used by other layout objects. We’ll have to protect all of these. Since we now have quite a bit of ProtectedField experience, we’ll do all the fields in one go.

We’ll need to compute these fields in layout. All of the font-related ones are fairly straightforward:

Note that I’ve changed width to read the font field instead of directly reading zoom and style. It does look a bit odd to compute f repeatedly, but remember that each of those read calls establishes a dependency for one layout field upon another. I like to think of each f as being scoped to its field’s computation.

We also need to compute the x position of a TextLayout. That can use the previous sibling’s font, x position, and width:

EmbedLayout is basically identical. As for its subclasses, here’s InputLayout:

And here’s ImageLayout; it has an img_height field, which I’m going to treat as an intermediate step in computing height and not protect:

Finally, here’s how IframeLayout computes its height, which is straightforward:

We also need to invalidate the height field if the height attribute changes:

So that covers all of the inline layout objects. All that’s left is LineLayout. Here are x and y:

However, height is a bit complicated: it computes the maximum ascent and descent across all children and uses that to set the height and the children’s y. I think the simplest way to handle this code is to add ascent and descent fields to the LineLayout to store the maximum ascent and descent, and then have the height and the children’s y field depend on those.

Note that we don’t need to read the children field because in LineLayout it isn’t protected; it’s filled in by BlockLayout when the LineLayout is created, and then never modified.

As a result of these changes, every layout object field is now protected. Just like before, make sure all uses of these fields use read and get and that your browser still runs, including during contenteditable. You will likely now need to fix a few uses of height and y inside Frame and Tab, like for clamping scroll offsets.

Skipping No-op Updates

We’ve got quite a number of layout fields now, so let’s see how much invalidation is actually going on. Add a print statement inside the set method on ProtectedFields to see which fields are getting recomputed:

The if check avoids printing during initial page layout, so it will only show how well our invalidation optimizations are working. The fewer prints you see, the fewer fields change and the more work we should be able to skip.

Try editing some text with contenteditable on a large web page (like this chapter)—you’ll see a screenful of output, thousands of lines of printed nonsense. It’s a little hard to understand why, so let’s add a nice printable form for ProtectedFields, plus a new name parameter for debugging purposes:Note that I print the node, not the layout object, because layout objects’ printable forms print layout field values, which might be dirty and unreadable.

If you look at your output again, you should now see two phases. First, there’s a lot of style re-computation:

Let’s fix these. First, let’s tackle style. The reason style is being recomputed repeatedly is just that we recompute it even if it isn’t dirty. Let’s skip if it’s not:

There should now be barely any style re-computation at all. But what about those layout field re-computations? Why are those happening? Well, the very first field being recomputed here is zoom, which itself traces back to DocumentLayout:

Every time we lay out the page, we set the zoom parameter, and we have to do that because the user might have zoomed in or out. But every time we set a field, that notifies every dependant field. The combination of these two things means we are recomputing the zoom field, and everything that depends on zoom, on every frame.

What makes this all wasteful is that zoom usually doesn’t change. So we should notify dependants only if the value didn’t change:

This change is safe, because if the new value is the same as the old value, any downstream computations don’t actually need to change. This small tweak should reduce the number of field changes down to the minimum:

All that’s happening here is recreating the contenteditable element’s children (which we have to do, to incorporate the new text) and checking that its height didn’t change (necessary in case we wrapped onto more lines).

Editing should also now feel snappier—about 0.6 seconds instead of the original 1.7 (see Figure 3). Better, but still not good:Trace here.

Skipping Traversals

Now that all of the layout fields are protected, we can check if any of them need to be recomputed by checking their dirty bits. But to check all of those dirty bits, we’d need to visit every layout object, which can take a long time. Instead, we should use dirty bits to minimize the number of layout objects we need to visit.

The basic idea revolves around the question: do we even need to call layout on a given node? The layout method does three things: create child layout objects, compute layout properties, and recurse into more calls to layout. Those steps can be skipped if:

There’s no dirty flag yet for the last condition, so let’s add one. I’ll call it has_dirty_descendants because it tracks whether any descendant has a dirty ProtectedField:In some code bases, you will see these called ancestor dirty flags instead. It’s the same thing, just following the flow of dirty bits instead of the flow of control.

Now we need to set the has_dirty_descendants flag if any dirty flag is set. We can do that with an additional (and optionalIt’s optional because only ProtectedFields on layout objects need this feature.) parent parameter to a ProtectedField.

Make sure to pass this parameter for each ProtectedField in each layout object type. Here’s BlockLayout, for example:

Then, whenever mark or notify is called, we set the descendant bits by walking the parent chain:

Note that the while loop exits early if the descendants bit is already set. That’s because whoever set that bit already set all the ancestors’ descendant dirty bits.This optimization is important in real browsers. Without it, repeatedly invalidating the same object would walk up the tree to the root repeatedly, violating the principle of incremental performance.

Now that we have descendant dirty flags, let’s use them to skip layout, including recursive calls:

Do the same for every other type of layout object. In DocumentLayout, you do need to be a little careful, since it receives the frame width and zoom level as an argument; you have to mark those fields of DocumentLayout if the corresponding Frame variables change:We need to mark the root layout object’s width because the frame_width is passed into DocumentLayout’s layout method as the width parameter. We could have protected the frame_width field instead, and then this mark would happen automatically; I’m skipping that for expediency, but it would have been a bit safer.

Skipping unneeded layout methods should provide a noticable speed bump, with small layouts now taking about 7 ms to update layout and editing now substantially smoother.It might also be pretty laggy on large pages due to the composite–raster–draw cycle being fairly slow, depending on which exercises you implemented in Chapter 13.Trace here.

However, Figure 4 shows that paint is still slow, and render overall is still about 230 ms. Making a browser fast requires optimizing everything! I won’t implement it, but paint can be made a lot faster too—see Exercise 16-10.

Granular Style Invalidation

Unfortunately, in the process of adding invalidation, we have inadvertently broken smooth animations. Here’s the basic issue: suppose an element’s opacity or transform property changes, for example through JavaScript. That property isn’t layout-inducing, so it should be animated entirely through compositing. However, changing any style property invalidates the Element’s style field, and that in turn invalidates the children field, causing the layout tree to be rebuilt. That’s no good.

Ultimately the core problem here is over-invalidation caused by ProtectedFields that are too coarse-grained. The children field, for example, doesn’t depend on the whole style dictionary, just a few font-related fields in it. We need style to be a dictionary of ProtectedFields, not a ProtectedField of a dictionary:

Make the same change in Text. The CSS_PROPERTIES dictionary contains each CSS property that we support, plus their default value:

When setting the style property from JavaScript, I’ll invalidate all of the fields by calling a new dirty_style function:

But that’s not all. There is also other code that invalidates style, in particular code that can affect a pseudo-class such as :focus.

Similarly, in style, we will need to recompute a node’s style if any of their style properties are dirty:

To match the existing code, I’ll make old_style and new_style just map properties to values:

Then, when we resolve inheritance, we specifically have one field of our style depend on one field of the parent’s style:

Then, once the new_style is all computed, we individually set every field of the node’s style:

Now we just need to update the rest of the browser to use the granular style fields. Mostly, this means replacing style.get()[property] with style[property].get():

However, the font method needs a little bit of work. Until now, we’ve read the node’s style and passed that to font:

That won’t work anymore, because now we need to read three different properties of style. To keep things compact, I’m going to rewrite font to pass in the field to invalidate as an argument:

Now we can simply pass self.children in for the notify parameter when requesting a font during line breaking:

Make sure to update all other uses of the font method to this new interface. This “destination-passing style” is a common way to add invalidation to helper methods.

Finally, now that we’ve added granular invalidation to style, we can invalidate just the animating property when handling animations:

When a property like opacity or transform is changed, it won’t invalidate any layout fields (because these properties don’t affect any layout fields) and so animations will once again skip layout entirely.

Analyzing Dependencies

Layout is now pretty fast and correct thanks to the ProtectedField abstraction. However, because most of our dependencies are established implicitly, by read, it’s hard to tell which fields will ultimately get invalidated from any given operation. That makes it hard to understand which operations are fast and which are slow, especially as we add new style and layout features. This auditability concern happens in real browsers, too. After all, real browsers are millions, not thousands, of lines long, and support thousands of CSS properties. Their dependency graphs are dramatically more complex than our browser’s.

We’d therefore like to make it easier to see the dependency graph, though see Figure 5 for an idea of the scale of the task. And along the way we can centralize invariants about the shape of that graph. That will harden our browser against accidental bugs in the future and also improve performance.

An easy first step is explicitly listing the dependencies of each ProtectedField. We can make this an optional constructor parameter:

Moreover, if the dependencies are passed in the constructor, we can “freeze” the ProtectedField, so that read no longer adds new dependencies, just checks that they were declared:

For example, in DocumentLayout, we can now be explicit about the fact that its fields have no external dependencies, and thus have to be marked explicitly:I didn’t even notice that myself until I wrote this section!

But note that height is missing the dependencies parameter. A DocumentLayout’s height depends on its child’s height, and that child doesn’t exist until layout is called. “Downward” dependencies like this mean we can’t freeze every ProtectedField when it’s constructed. But every protected field we freeze makes the dependency graph easier to audit.

We can also freeze the zoom, width, x, and y fields in BlockLayout. For y, the dependencies differ based on whether or not the layout object has a previous sibling:

We can’t freeze height for BlockLayout, for the same reason as DocumentLayout, in the constructor. But we can freeze it as soon as the children field is computed. Let’s add a set_dependencies method to do that:This is dynamic, just like calls to read, but at least we’re centralizing dependencies in one place. Plus, listing the dependencies explicitly and then checking them later is a kind of defense in depth against invalidation bugs.

The other layout objects can also freeze their fields. In TextLayout, EmbedLayout, and its subclasses we can freeze everything:

In LineLayout, due to the somewhat complicated way a line is created and then laid out, we need to delay freezing ascent and descent until the first time layout is called:

The last layout class is EmbedLayout. The dependencies there are straightforward except for two things: first, just like for TextLayout, x depends on the previous x if present, and second, height depends on width because of aspect ratios:

We can even freeze all of the style fields! The only complication is that innerHTML changes an element’s parent, so let’s create the style dictionary dynamically. Initialize it to None in the constructor:

Inside init_style, we need to freeze the dependencies of each style field. That’s easy: only inherited fields have any dependencies:

By freezing every layout and style field, except children, we can get a good sense of our browser’s dependency graph just by looking at layout object type constructors. That’s nice, and helps us avoid cycles and long dependency chains as we add more style and layout features.

But to obtain maximum performance, the kind you would need for a real browser, there’s an additional benefit. All these fancy ProtectedFields add a lot of overhead, mostly because they take up more memory and require more function calls. In fact, this chapter likely made your browser quite a bit slower on an initial page load.For me, it’s about twice as slow. Some of that can be improved by skipping asserts,If you run Python with the -O command-line flag, Python will automatically skip asserts. but it’s definitely not ideal.

Luckily, techniques like compile-time code generation and macros can be used to turn ProtectedField objects into straight-line code behind the scenes. Setting a particular ProtectedField can set the dirty bits on statically known invalidations, the dirty bits can be inlined into the layout objects, and the read function can check that the dependency was declared at compile time.Real browsers pull tricks like that all the time, in order to be super fast but still maintainable and readable. For example, Chromium has a fancy way of generating optimized code for all of the style properties. Such techniques are beyond the scope of this book, but I’ve left exploring it to an advanced exercise.

Summary

This chapter introduces the concept of partial style and layout through optimized cache invalidation. The main takeaways are:

Outline

The complete set of functions, classes, and methods in our browser should now look something like this:

COOKIE_JAR

class URL:
    def __init__(url)

    def request(referrer, payload)

    def resolve(url)

    def origin()

    def __str__()

class Text:
    def __init__(text, parent)

    def __repr__()

class Element:
    def __init__(tag, attributes, parent)

    def __repr__()

def print_tree(node, indent)

def tree_to_list(tree, list)

def is_focusable(node)

def get_tabindex(node)

class HTMLParser:
    SELF_CLOSING_TAGS

    HEAD_TAGS

    def __init__(body)

    def parse()

    def get_attributes(text)

    def add_text(text)

    def add_tag(tag)

    def implicit_tags(tag)

    def finish()

class CSSParser:
    def __init__(s)

    def whitespace()

    def literal(literal)

    def word()

    def ignore_until(chars)

    def pair(until)

    def selector()

    def body()

    def parse()

    def until_chars(chars)

    def simple_selector()

    def media_query()

class TagSelector:
    def __init__(tag)

    def matches(node)

class DescendantSelector:
    def __init__(ancestor, descendant)

    def matches(node)

class PseudoclassSelector:
    def __init__(pseudoclass, base)

    def matches(node)

FONTS

def get_font(size, weight, style)

def font(css_style, zoom, notify)

def linespace(font)

NAMED_COLORS

def parse_color(color)

def parse_blend_mode(blend_mode_str)

def parse_transition(value)

def parse_transform(transform_str)

def parse_outline(outline_str)

def parse_image_rendering(quality)

REFRESH_RATE_SEC

class MeasureTime:
    def __init__()

    def time(name)

    def stop(name)

    def finish()

class Task:
    def __init__(task_code)

    def run()

class TaskRunner:
    def __init__(tab)

    def schedule_task(task)

    def set_needs_quit()

    def clear_pending_tasks()

    def start_thread()

    def run()

    def handle_quit()

DEFAULT_STYLE_SHEET

CSS_PROPERTIES

INHERITED_PROPERTIES

def init_style(node)

def style(node, rules, frame)

def cascade_priority(rule)

def diff_styles(old_style, new_style)

class NumericAnimation:
    def __init__(old_value, new_value, num_frames)

    def animate()

def dirty_style(node)

class ProtectedField:
    def __init__(obj, name, parent, dependencies, invalidations)

    def set_dependencies(dependencies)

    def set_ancestor_dirty_bits()

    def mark()

    def notify()

    def set(value)

    def get()

    def read(notify)

    def copy(field)

    def __repr__()

def dpx(css_px, zoom)

WIDTH, HEIGHT

HSTEP, VSTEP

INPUT_WIDTH_PX

IFRAME_WIDTH_PX, IFRAME_HEIGHT_PX

BLOCK_ELEMENTS

class DocumentLayout:
    def __init__(node, frame)

    def layout(width, zoom)

    def should_paint()

    def paint()

    def paint_effects(cmds)

    def layout_needed()

class BlockLayout:
    def __init__(node, parent, previous, frame)

    def layout_mode()

    def layout()

    def recurse(node)

    def add_inline_child(node, w, child_class, frame, word)

    def new_line()

    def word(node, word)

    def input(node)

    def image(node)

    def iframe(node)

    def self_rect()

    def should_paint()

    def paint()

    def paint_effects(cmds)

    def layout_needed()

class LineLayout:
    def __init__(node, parent, previous)

    def layout()

    def should_paint()

    def paint()

    def paint_effects(cmds)

    def layout_needed()

class TextLayout:
    def __init__(node, word, parent, previous)

    def layout()

    def should_paint()

    def paint()

    def paint_effects(cmds)

    def self_rect()

    def layout_needed()

class EmbedLayout:
    def __init__(node, parent, previous, frame)

    def layout()

    def should_paint()

    def layout_needed()

class InputLayout:
    def __init__(node, parent, previous, frame)

    def layout()

    def paint()

    def paint_effects(cmds)

    def self_rect()

class ImageLayout:
    def __init__(node, parent, previous, frame)

    def layout()

    def paint()

    def paint_effects(cmds)

class IframeLayout:
    def __init__(node, parent, previous, parent_frame)

    def layout()

    def paint()

    def paint_effects(cmds)

BROKEN_IMAGE

class PaintCommand:
    def __init__(rect)

class DrawText:
    def __init__(x1, y1, text, font, color)

    def execute(canvas)

class DrawRect:
    def __init__(rect, color)

    def execute(canvas)

class DrawRRect:
    def __init__(rect, radius, color)

    def execute(canvas)

class DrawLine:
    def __init__(x1, y1, x2, y2, color, thickness)

    def execute(canvas)

class DrawOutline:
    def __init__(rect, color, thickness)

    def execute(canvas)

class DrawCompositedLayer:
    def __init__(composited_layer)

    def execute(canvas)

class DrawImage:
    def __init__(image, rect, quality)

    def execute(canvas)

def DrawCursor(elt, offset)

class VisualEffect:
    def __init__(rect, children, node)

class Blend:
    def __init__(opacity, blend_mode, node, children)

    def execute(canvas)

    def map(rect)

    def unmap(rect)

    def clone(child)

class Transform:
    def __init__(translation, rect, node, children)

    def execute(canvas)

    def map(rect)

    def unmap(rect)

    def clone(child)

def local_to_absolute(display_item, rect)

def absolute_bounds_for_obj(obj)

def absolute_to_local(display_item, rect)

def map_translation(rect, translation, reversed)

def paint_tree(layout_object, display_list)

def paint_visual_effects(node, cmds, rect)

def paint_outline(node, cmds, rect, zoom)

def add_parent_pointers(nodes, parent)

class CompositedLayer:
    def __init__(skia_context, display_item)

    def can_merge(display_item)

    def add(display_item)

    def composited_bounds()

    def absolute_bounds()

    def raster()

SPEECH_FILE

class AccessibilityNode:
    def __init__(node, parent)

    def compute_bounds()

    def build()

    def build_internal(child_node)

    def contains_point(x, y)

    def hit_test(x, y)

    def map_to_parent(rect)

    def absolute_bounds()

class FrameAccessibilityNode:
    def __init__(node, parent)

    def build()

    def hit_test(x, y)

    def map_to_parent(rect)

def speak_text(text)

EVENT_DISPATCH_JS

SETTIMEOUT_JS

XHR_ONLOAD_JS

POST_MESSAGE_DISPATCH_JS

RUNTIME_JS

class JSContext:
    def __init__(tab, url_origin)

    def run(script, code, window_id)

    def add_window(frame)

    def wrap(script, window_id)

    def dispatch_event(type, elt, window_id)

    def dispatch_post_message(message, window_id)

    def dispatch_settimeout(handle, window_id)

    def dispatch_xhr_onload(out, handle, window_id)

    def dispatch_RAF(window_id)

    def throw_if_cross_origin(frame)

    def get_handle(elt)

    def querySelectorAll(selector_text, window_id)

    def getAttribute(handle, attr)

    def setAttribute(handle, attr, value, window_id)

    def innerHTML_set(handle, s, window_id)

    def style_set(handle, s, window_id)

    def XMLHttpRequest_send(...)

    def setTimeout(handle, time, window_id)

    def requestAnimationFrame()

    def parent(window_id)

    def postMessage(target_window_id, message, origin)

SCROLL_STEP

class Frame:
    def __init__(tab, parent_frame, frame_element)

    def allowed_request(url)

    def load(url, payload)

    def render()

    def clamp_scroll(scroll)

    def set_needs_render()

    def set_needs_layout()

    def advance_tab()

    def focus_element(node)

    def activate_element(elt)

    def submit_form(elt)

    def keypress(char)

    def scrolldown()

    def scroll_to(elt)

    def click(x, y)

class Tab:
    def __init__(browser, tab_height)

    def load(url, payload)

    def run_animation_frame(scroll)

    def render()

    def get_js(url)

    def allowed_request(url)

    def raster(canvas)

    def clamp_scroll(scroll)

    def set_needs_render()

    def set_needs_layout()

    def set_needs_paint()

    def set_needs_render_all_frames()

    def set_needs_accessibility()

    def scrolldown()

    def click(x, y)

    def go_back()

    def submit_form(elt)

    def keypress(char)

    def focus_element(node)

    def activate_element(elt)

    def scroll_to(elt)

    def enter()

    def advance_tab()

    def zoom_by(increment)

    def reset_zoom()

    def set_dark_mode(val)

    def post_message(message, target_window_id)

class Chrome:
    def __init__(browser)

    def tab_rect(i)

    def paint()

    def click(x, y)

    def keypress(char)

    def enter()

    def blur()

    def focus_addressbar()

class CommitData:
    def __init__(...)

class Browser:
    def __init__()

    def schedule_animation_frame()

    def commit(tab, data)

    def render()

    def composite_raster_and_draw()

    def composite()

    def get_latest(effect)

    def paint_draw_list()

    def raster_tab()

    def raster_chrome()

    def update_accessibility()

    def draw()

    def speak_node(node, text)

    def speak_document()

    def set_needs_accessibility()

    def set_needs_animation_frame(tab)

    def set_needs_raster_and_draw()

    def set_needs_raster()

    def set_needs_composite()

    def set_needs_draw()

    def clear_data()

    def new_tab(url)

    def new_tab_internal(url)

    def set_active_tab(tab)

    def schedule_load(url, body)

    def clamp_scroll(scroll)

    def handle_down()

    def handle_click(e)

    def handle_key(char)

    def handle_enter()

    def handle_tab()

    def handle_hover(event)

    def handle_quit()

    def toggle_dark_mode()

    def increment_zoom(increment)

    def reset_zoom()

    def focus_content()

    def focus_addressbar()

    def go_back()

    def cycle_tabs()

    def toggle_accessibility()

def mainloop(browser)

Exercises

16-1 Emptying an element. Implement the replaceChildren DOM method when called with no arguments. This method should delete all the children of a given element. Make sure to handle invalidation properly.

16-2 Protecting layout phases. Replace the needs_style and needs_layout dirty flags by making the document field on Frames a ProtectedField. Make sure animations still work correctly: animations of opacity or transform shouldn’t trigger layout, while animations of other properties should.

16-3 Transferring children. Implement the replaceChildren DOM method when called with multiple arguments. Here the arguments are elements from elsewhere in the document,Unless you’ve implemented Exercises 9-2 and 9-3, in which case they can also be “detached” elements. which are then removed from their current parent and then attached to this one. Make sure to handle invalidation properly.

16-4 Descendant bits for style. Add descendant dirty flags for style information, so that the style phase doesn’t need to traverse nodes whose styles are unchanged.

16-5 Resizing the browser. Perhaps, back in Exericse 2-3, you implemented support for resizing the browser. (And, most likely, you dropped support for it when we switched to SDL.) Reimplement support for resizing your browser; you’ll need to pass the SDL_WINDOW_RESIZABLE flag to SDL_CreateWindow and listen for SDL_WINDOWEVENT_RESIZED events. Make sure invalidation works: resizing the window should resize the page. How much does invalidation help make resizing fast? Test both vertical and horizontal resizing.

16-6 Matching children. Add support for the appendChild method if you haven’t already in Exercise 9-2. What’s interesting about appendChild is that, while it does change a layout object’s children field, it only does so by adding new children to the end. In this case, you can keep all of the existing layout object children. Apply this optimization, at least in the case of block-mode BlockLayouts.

16-7 Invalidating previous. Add support for the insertBefore method if you if you haven’t already in Exercise 9-2. Like with appendChild, we want to skip rebuilding layout objects if we can. However, this method can also change the previous field of layout objects; protect that field on all block-mode BlockLayouts and then avoid rebuilding as much of the layout tree as possible.

16-8 :hover pseudo-class. There is a :hover pseudo-class that identifies elements the mouse is hovering over. Implement it by sending mouse hover events to the active Tab and hit testing to find out which element is being hovered over. Try to avoid forcing a layout in this hit test; one way to do that is to store a pending_hover on the Tab and run the hit test after layout during render, and then perform another render to invalidate the hovered element’s style.

16-9 Optimizing away ProtectedField. As mentioned in the last section of this chapter, creating all these ProtectedField objects is way too expensive for a real browser. See if you can find a way to avoid creating the objects entirely. Depending on the language you’re using to implement your browser, you might have compile-time macros available to help; in Python, this might require refactoring to change the API shape of ProtectedField to be functional rather than object-oriented.

16-10 Optimizing paint. Even after making layout fast for text input, paint is still painfully slow. Fix that by storing the display list between frames, adding dirty bits for whether paint is needed for each layout object, and mutating the display list rather than recreating it every time.