Our little web browser now renders web pages that change. That means our browser is now doing styling and layout multiple times per page. Most of that styling and layout, however, is a waste: even when the page changes, it usually doesn't change much, and layout is expensive. In this chapter we'll put the breaks on new features and implement some speed-ups instead.
Before we start working on speeding up our browser, let's find out what's taking up so much time. Let's take a moment to list the stuff our browser does. First, on the initial load:
And then every time it does layout:
I'd like to get timing measurements on both of these, so that we know what to work on. To do that, I'm basically going to check the wall clock time at various points in the process and do some subtraction. To keep it all well-contained, I'm going to make a
Timer class to store that wall-clock time, with a method to report how long a phase took.
That wacky string in the
timer field on browsers:
Then we call
start every time we start doing something useful. For example, in
browse, I start the
I'm just going to go ahead and insert one of these for each of the bullet points above. Then, at the end of
render, I stop the timer:
Your results may not match mine,These results were recorded on a 2019 13-inch MacBook Pro with a 2.4GHz i5, 8 GB of LPDDR3 memory, and an Intel Iris Plus graphics 655 with 1.5 GB of video memory, running macOS 10.14.6. but here's what I saw on my console on a full page load for this web page.
[ 0.225331] Download [ 0.028922] Parse HTML [ 0.001835] Parse CSS [ 0.003599] Run JS [ 0.007073] Style [ 0.517131] Layout [ 0.022553] Display List [ 0.275645] Rendering [ 0.008225] Chrome
The overall process takes about one second (60 frames), with layout consuming half and then rendering and network consuming the rest. Moreover, the downloading only takes place on initial load, so it's really layout and rendering that we're going to optimize. By the way, keep in mind that while networking in a real web browser is similar enough to our toy version,Granted with caching, keep-alive, parallel connections, and incremental parsing to hide the delay… rendering is more complex in real web browsers (since real browsers can apply many more stylistic effects) and layout is much more complex in real browsers!Of course they're also not written in Python…Now is a good time to mention that benchmarking a real browser is a lot harder than benchmarking ours. A real browser will run most of these phases simultaneously, and may also split the work over multiple CPU and GPU processes. Counting how much time everything takes is a chore. Real browsers are also memory hogs, so optimizing memory usage is just as important for them, which is a real pain to measure in Python. Luckily our browser doesn't have tabs so it's unlikely to strain for memory!
By the way, this might be the point in the book where you realize you accidentally implemented something super-inefficiently. If you something other than the network, layout, or rendering is taking a long time, look into that.The exact speeds of each of these phases can vary quite a bit between implementations, and might depend (for example) on the exercises you ended up implementing, so don't sweat the details too much. Whatever help it is your browser needs, this chapter will only address layout and rendering.
But to really drive home the need for faster layout and rendering, let's implement a little browser feature that really taxes layout and rendering: hover styles. CSS has a selector, called
:hover, which applies to whichever element you are currently hovering over. In real browsers this is especially useful in combination with other selectors: you might have the
a:hover selector for links that you're hovering over, or
section:hover h1 for headings inside the section you're hoving over. We have a pretty limited set of selectors, but we can at least try out the style
This should draw a box around the element we are currently hovering over. Let's add this line to our browser stylesheet and try to implement it. First, we need to parse the
:hover selector. To do that, we create a new kind of selector:
Note that this expects an
ElementNode.pseudoclasses field, which I'll initialize to an empty set:
Now that we have this class, we need to parse these pseudo-class selectors. That just involves copying the bit of code we have for class selectors and replacing the period with a colon:
Finally we need to handle hover events. In Tk, you do that by binding the
handle_hover method is pretty simple; it calls
find_element, walks up the tree until it finds an
ElementNode, and then sets its
hover pseudoclass. Also, it has to unset the
hover pseudoclass on the previously-hovered element. I store a reference to that element in the
hovered_elt field, initialized to
None in the constructor.
class Browser: def handle_hover(self, e): x, y = e.x, e.y - 60 + self.scrolly elt = find_element(x, y, self.nodes) while elt and not isinstance(elt, ElementNode): elt = elt.parent if self.hovered_elt: self.hovered_elt.pseudoclasses.remove("hover") if not elt: self.hovered_elt = None return elt.pseudoclasses.add("hover") self.hovered_elt = elt self.relayout()
Note that hover calls
relayout, because by changing the pseudoclasses it potentially changes which rules apply to which elements and thus which borders are applied where.
Try this out! You should see black rectangles appear over every element you hover over, except of course that it'll take about a second to do so and the refresh rate will be really really bad. Now it's really clear we need to speed up layout.
Actually, this is not how
:hover works, because in normal CSS if you hover over an element you probably also hover over its parent, and both get the
:hover style. Because of how limited our selector language is, there's no style change that incrementalizes well that I can apply on hover. So I'm instead bowdlerizing how
:hover works. In other words, this chapter is a good guide to incremental reflow but a bad guide to hover selectors. My deepest apologies. Please learn how to use CSS from some other source.
My goal will be to leverage this intuition to skip as much work as possible on reflow. The idea is going to be to split layout into two phases. In the first, we'll compute relative positions for each element; in the second, we'll compute the absolute positions by adding the parent offset to the relative position. This will also involve splitting the
layout function into two, which I will call
layout2,Look, I've got a bit of a cold, I'm not feeling very creative. for each of our five layout types (block, line, text, input, and inline).
I'll start with block layout. In block layout, the x position is computed from the parent's left content edge and then changed in
layout to account for the margin. Likewise, the y position is initialized from the argument in
layout and then changed once to account for margins. In other words, neither changes much, so it's safe to move these absolute position fields to
layout1 doesn't take any arguments any more, since it doesn't need to know its position to figure out the relative positions of its contents. Meanwhile,
layout2 still needs a
y parameter from its parent.
layout1 creates children and then makes a recursive calls to
child.layout to lay out its children. But since we've split layout into two phases, we need to change those recursive calls to call either
layout2. Actually we need to do both—but let's do a recursive call to
layout1, and add a loop to
layout2 to do the recursive
We also need to make sure that our layout objects don't do anything interesting in the constructor. The hope, after all, is to keep layout objects around between reflows, and that means we won't call the constructor on reflow. If we do anything interesting in the constructor, it won't get updated as the page changes. In block layout, the margin, padding, and border values are computed in the constructor; let's move that computation into
layout1. Plus, the
children array is initialized in the constructor, but it's only modified in
layout1. Let's move that
children array to
These few changes are complex, and we'll be doing the same thing again for each of the other four layout types, so let's review what the code we've written guarantees:
This method is responsible for computing the twelve margin, padding, and border fields, and for creating the child layouts, and for assigning the
h fields on the
BlockLayout. Plus, it must call
layout1 on its children. It may not read the
y fields on anything.
This method may read the
h fields and is responsible for assigning the
y fields on the
BlockLayout. Plus, it must call
layout2 on its children.
Now that we've got
BlockLayout sorted, let's move on to text, input, and line layout working. I'll save inline layout to the very end.
layout function does basically nothing, because everything happens in the constructor. We'll rename
layout2 and move the computation of the
h fields from the constructor into
InputLayout, the width and height are computed in the constructor (so let's move them to
y but also creates a child
InlineLayout for textareas. Let's put the child layout creation in
layout1 and leave the
y computation in
layout2 will need to call
layout2 on the child layout, if there is one.
Now let's get to
LineLayout. Like in
BlockLayout, we'll need to split
layout into two functions. But one quirk of
LineLayout is that it does not create its own children (
InlineLayout is in charge of that) and it also does not compute its own width (its children do that in their
attach methods). So unlike the other layout modes,
LineLayout will initialize the
w fields in its constructor, and its
layout1 won't recursively call
layout1 on its children or compute the
layout2 method looks more normal, however:
Finally, inline layout. For inline layout we want to accomplish the same split as above. The current
layout function computes the
w fields, then creates its children with the
recurse method, and then calls
layout on each child. If we peek inside
recurse and its helper methods
input, we'll find that it only reads the
w field, which is allowed in
layout1. So all of
recurse can happen in
As mentioned above, there's a quirk with inline layout, which is that it is responsible not only for its children (line layouts) but their children (text and input layouts) as well. So, we need update the
input helpers to call
layout1 on the new layout objects they create.
y and will need a new loop that calls
layout2 on the children:
Note that I'm accepting a
y argument in
layout2, even though I don't use it. That's because
BlockLayout passes one. Probably it would be good to accept both
y in every
layout2 method, though that's not how I've written my code. Also don't forget to update
InputLayout to pass a
y argument in
layout2 when it handles its child layout.
I want to emphasize that
recurse is by far the slowest part of our browser's layout algorithm,Because it calls
font.measure, which has to do a slow and expensive text rendering to get the right size. Text is crazy. and since it happens in
layout1 we will be able to mostly skip it. This will be our the big performance win.
Finally, we need to update our browser to actually call all of these functions. In
Browser.relayout, where we used to call
layout, let's call
layout1 followed by
I cannot overemphasize how it important it is right now to stop and debug. If you're anything like me, you will have a long list of very minor bugs, including forgetting to add
self to stuff when moving it out of constructors, not moving the
children array, and passing the wrong number of arguments to one of the two layout phases. If you don't get things working bug-free now, you'll spend three times as long debugging the next phase, where we make things yet more complicated by only calling
layout1 sometimes. Read through all of the constructors for the layout classes to make sure they're not doing anything interesting.
If you time the browser, now that we've split layout into two phases, you'll find that not much has changed; for me, layout got about five percent slower. To find out more, I made
layout2 separate phases in the timer. Here's what it looks like:
[ 0.498251] Layout1 [ 0.006418] Layout2
layout1 is taking all of the time and
layout2 is near-instantaneous.Heh heh, near-instantaneous. Anyone that's worked on a high-performance application would like you to know that 6.418 milliseconds is almost 40% of your one-frame time budget! But, this is a toy web browser written in Python. Cut me some slack. That validates our next step: avoiding
layout1 whenever we can.
The idea is simple, but the implementation will be tricky, because sometimes you do need to lay an element out again; for example, the element you hover over gains a border, and that changes its width and therefore potentially its line breaking behavior. So you may need to run
layout1 on that element. But you won't need to do so on its siblings. The guide to this nonsense will be the responsibilities of
layout2 outlined above. Because
layout1 reads the node style, it only needs to be called when the node or its style changes.
Let's start by making a plan. Look over your browser and make a list of all of the places where
relayout is called. For me, these are:
parse, on initial page load.
js_innerHTML, when new elements are added to the page (and old ones removed).
edit_input, when the contents of an input element is changed.
handle_hover, when an element is hovered over.
Each of these needs a different approach:
parse, we are doing the initial page load so we don't even have an existing layout. We need to create one, and that involves calling
js_innerHTML, the new elements and their parent are the only elements that have had their nodes or style changed.
edit_input, only the input element itself has had a change to node or style.
handle_hover, only the newly-hovered and newly-not-hovered elements have had a change to node or style.
relayout into pieces to reflect the above. First, let's move the construction of
parse. Then let's create a new
reflow function that calls
layout1 on an element of your choice. And finally,
relayout will just contain the call to
layout2 and the computation of the display list. Here's
class Browser: def parse(self, body): # ... self.page = Page() self.layout = BlockLayout(self.page, self.nodes) self.reflow(self.nodes) self.relayout() def relayout(self): self.start("Layout2") self.layout.layout2(0) self.max_h = self.layout.h self.timer.start("Display List") self.display_list = self.layout.display_list() self.render()
reflow method is a little more complex. Here's what it looks like after a simple reorganization:
Note that while
reflow takes an element as an argument, it ignores it, and restyles and re-lays-out the whole page. That's clearly silly, so let's fix that. First, the
style call only needs to be passed
Second, instead of calling
self.layout, we only want to call it on the layout object corresponding to
elt. The easiest way to find that is with a big loop:There is a subtlety in the code below. It's important to check the current node before recursing, because some nodes have two layout objects, in particular block layout elements that contain text and thus have both a
BlockLayout and an
InlineLayout. We want the parent, and doing the check before recursing guarantees us that.
This is definitely inefficient, because we could store the element-layout correspondence on the node itself, but let's run with it for the sake of simplicity. We can now change the
self.layout.layout1() line to:
This mostly works, but
find_layout won't be happy on initial page load because at that point some of the layout objects don't have children yet. Let's add a line for that:I recognize that in many languages, unlike in Python, you can't just add fields in some method without declaring them earlier on. I assume that in all of those cases you've been initializing the fields with dummy values, and then you'd check for that dummy value instead of using
hasattr. Just make sure your dummy value for
children isn't the empty list, since that is also a valid value. Better to use a null pointer or something like that, whatever your language provides.
The logic of returning any layout object without a
children field is that if some layout object does not have such a field, it definitely hasn't had
layout1 called on it and therefore we should, which we do by returning it from
Finally, let's go back to place where we call
relayout and add a call to
reflow. The only complicated case is
handle_hover, where you need to call
reflow both on the old
hovered_elt and on the new one. (You only need to call
layout once, though.)
With this tweak, you should see
layout1 taking up almost no time, except on initial page load, and hovering should be much more responsible. For me the timings now look like this:
So now rendering takes up roughly 89% of the runtime when hovering, and everything else takes up 32 milliseconds total. That's not one frame, but it's not bad for a Python application!
Let's put a bow on this lab by speeding up
render. It's actually super easy: we just need to avoid drawing stuff outside the browser window; in the graphics world this is called clipping. Now, we need to make sure to draw text that starts outside the browser window but has a part inside the window, so I'm going to update the
DrawText constructor to compute where the text ends:
Ok, wait, that's not the code you expected. Why 50? Why not use
font.metrics are quite slow: they actually execute text layout, and that takes a long time! So I'll be using only y position for clipping, and I'll be using an overapproximation to
font.metrics. The 50 is not a magic value; it just needs to be bigger than any actual line height. If it's too big, we render a few too many
DrawText objects, but it won't change the resulting page.
DrawRect objects have top-left and bottom-right coordinates and we can check those in
That takes rendering down from a quarter-second to a hundredth of a second for me, and makes the hover animation fairly smooth. A hover reflow now takes roughly 25 milliseconds, with the display list computation 44% of that, rendering 39%, and
layout2 13%. We could continue optimizing (for example, tracking invalidation rectangles in rendering), but I'm going to call this a success. We've made interacting with our browser more than 30 times faster, and in the process made the
:hover selector perfectly usable.
With the changes in this chapter, my toy browser became roughly 30× faster, to the point that it is now reacts to changes fast enough to make simple animations. The cost of that is a more complex, two-pass layout algorithm.
TextLayout.display_list, you already know the width and height of the text to be laid out. Use that to compute
DrawText, and update the clipping implementation to clip horizontally as well as vertically.
font.measurefunction is quite slow! Change
TextLayout.add_spaceto use a cache for the size of a space in a given font.
BlockLayoutobjects, how much in
InlineLayout, and so on. What percentage of the
layout1time is spent handling inline layouts? If you did the first exercise, measure its effect.
styleattribute, which can change which styles apply to the affected element; make sure to handle that with reflow. Furthermore, you can use
setAttributeto update the
hrefattribute of a
<link>element, which means you must download a new CSS file and recompute the set of CSS rules. Make sure to handle that edge case as well. If you change the
srcattribute of a
setTimeout(ms, f)should run the function
msmilliseconds. In Python, you can use the
Timerclass from the
setTimeoutto implement a simple animation; for example, you might implement a "typewriter effect" where a paragraph is typed out letter-by-letter. Check that your browser handles the animation relatively smoothly.