Supporting Embedded Content | Web Browser Engineering

While our browser can render complex styles, visual effects, and animations, all of those apply basically just to text. Yet web pages contain a variety of non-text embedded content, from images to other web pages. Support for embedded content has powerful implications for browser architecture, performance, security, and open information access, and has played a key role throughout the web’s history.

Images

Images are certainly the most popular kind of embedded content on the web,So it’s a little ironic that images only make their appearance in Chapter 15 of this book! It’s because Tkinter doesn’t support many image formats or proper sizing and clipping, so I had to wait for the introduction of Skia. dating back to early 1993.This history is also the reason behind a lot of inconsistencies, like src versus href or img versus image. They’re included on web pages via the <img> tag, which looks like this:

Luckily, implementing images isn’t too hard, so let’s just get started. There are four steps to displaying images in our browser:

Let’s start with downloading images from a URL. Naturally, that happens over HTTP, which we already have a request function for. However, while all of the content we’ve downloaded so far—HTML, CSS, and JavaScript—has been textual, images typically use binary data formats. We’ll need to extend request to support binary data.

The change is pretty minimal: instead of passing the "r" flag to makefile, pass a "b" flag indicating binary mode:

Now every time we read from response, we will get bytes of binary data, not a str with textual data, so we’ll need to change some HTTP parser code to explicitly decode the data:

Note that I didn’t add a decode call when we read the body; that’s because the body might actually be binary data, and we want to return that binary data directly to the browser. Now, every existing call to request, which wants textual data, needs to decode the response. For example, in load, you’ll want to do something like this:

By passing replace as the second argument to decode, I tell Python to replace any invalid characters by a special � character instead of throwing an exception.

Make sure to make this change everywhere in your browser that you call request, including inside XMLHttpRequest_send and in several other places in load.

When we download images, however, we won’t call decode; we’ll just use the binary data directly.

Once we’ve downloaded the image, we need to turn it into a Skia Image object. That requires the following code:

There are two tricky steps here: the requested data is turned into a Skia Data object using the MakeWithoutCopy method, and then into an image with MakeFromEncoded.

Because we used MakeWithoutCopy, the Data object just stores a reference to the existing body and doesn’t own that data. That’s essential, because encoded image data can be large—maybe megabytes—and copying that data wastes memory and time. But that also means that the data will become invalid if body is ever garbage-collected; that’s why I save the body in an encoded_data field.This is a bit of a hack. Perhaps a better solution would be to write the response directly into a Skia Data object using the writable_data API. That would require some refactoring of the rest of the browser which is why I’m choosing to avoid it.

These download and decode steps can both fail; if that happens we’ll load a “broken image” placeholder (I used one from Wikipedia):

Now that we’ve downloaded and saved the image, we need to use it. That just requires calling Skia’s drawImageRect function:

The internals of drawImageRect, however, are a little complicated and worth expanding on. Recall that the Image object is created using a MakeFromEncoded method. That name reminds us that the image we’ve downloaded isn’t raw image bytes. In fact, all of the image formats you know—JPG, PNG, and the many more obscure ones—encode the image data using various sophisticated algorithms. The image therefore needs to be decoded before it can be used.And with much more complicated algorithms than just utf8 conversion.

Skia applies a variety of clever optimizations to decoding, such as directly decoding the image to its eventual size and caching the decoded image as long as possible.There’s also an HTML API to control decoding, so that the web page author can indicate when to pay that cost. That’s because raw image data can be quite large:Decoding costs both a lot of memory and also a lot of time, since just writing out all of those bytes can take a big chunk of our render budget. Optimizing image handling is essential to a performant browser. a pixel is usually stored as 4 bytes, so a 12 megapixel camera (as you can find on phones these days) produces 48 megabytes of raw data for a single image.

Because image decoding can be so expensive, Skia also has several algorithms available for decoding, some of which are faster but result in a worse-looking image.Image formats like JPEG are also lossy, meaning that they don’t faithfully represent all of the information in the original picture, so there’s a time/quality trade-off going on before the file is saved. Typically these formats try to drop “noisy details” that a human is unlikely to notice, just like different resizing algorithms might. For example, there’s the fast, simple “nearest neighbor” algorithm and the slower but higher-quality “bilinear” or even “Lanczos” algorithms.Specifically, these algorithms decide how to decode an image when the image size and the destination size are different and the image therefore needs to be resized. The faster algorithms tend to result in choppier, more jagged images.

To give web page authors control over this performance bottleneck, there’s an image-rendering CSS property that indicates which algorithm to use. Let’s add that as an argument to DrawImage:

But to talk about where this argument comes from, or more generally to actually see downloaded images in our browser, we first need to add images into our browser’s layout tree.

Embedded layout

Based on your experience with prior chapters, you can probably guess how to add images to our browser’s layout and paint process. We’ll need to create an ImageLayout class; add a new image case to BlockLayout’s recurse method; and generate a DrawImage command from ImageLayout’s paint method.

As we do this, you might recall doing something very similar for <input> elements. In fact, text areas and buttons are very similar to images: both are leaf nodes of the DOM, placed into lines, affected by text baselines, and painting custom content.Images aren’t quite like text because a text node is potentially an entire run of text, split across multiple lines, while an image is an atomic inline. The other types of embedded content in this chapter are also atomic inlines. Since they are so similar, let’s try to reuse the same code for both.

Let’s split the existing InputLayout into a superclass called EmbedLayout, containing most of the existing code, and a new subclass with the input-specific code, InputLayout:In a real browser, input elements are usually called widgets because they have a lot of special rendering rules that sometimes involve CSS.

The idea is that EmbedLayout should provide common layout code for all kinds of embedded content, while its subclasses like InputLayout should provide the custom code for that type of content. Different types of embedded content might have different widths and heights, so that should happen in each subclass, as should the definition of paint:

ImageLayout can now inherit most of its behavior from EmbedLayout, but take its width and height from the image itself:

Notice that the height of the image depends on the font size of the element. Though odd, this is how image layout actually works: a line with a single, very small, image on it will still be tall enough to contain text.In fact, a page with only a single image and no text or CSS at all still has its layout affected by a font—the default font. This is a common source of confusion for web developers. In a real browser, it can be avoided by forcing an image into a block or other layout mode via the display CSS property. The underlying reason for this is because, as a type of inline layout, images are designed to flow along with related text, which means the bottom of the image should line up with the text baseline. That’s also why we save img_height in the code above.

Also, in the code above I introduced new ascent and descent fields on EmbedLayout subclasses. This is meant to be used in LineLayout layout in place of the existing layout code for ascent and descent. It also requires introducing those fields on TextLayout:

Now we need to create ImageLayouts in BlockLayout. Input elements are created in an input method, so we create a largely similar image method. But input is itself largely a duplicate of word, so this would be a lot of duplication. The only part of these methods that differs is the part that computes the width of the new inline child; most of the rest of the logic is shared.

Let’s instead refactor the shared code into new methods which text, image, and input can call. First, all of these methods need a font to determine how much spaceYes, this is how real browsers do it too. to leave after the inline; let’s make a function for that:

There’s also shared code that handles line layout; let’s put that into a new add_inline_child method. We’ll need to pass in the HTML node, the element, and the layout class to instantiate (plus a word parameter that’s just for TextLayouts):

Now that we have ImageLayout nodes in our layout tree, we’ll be painting DrawImage commands to our display list and showing the image on the screen!

But what about our second output modality, screen readers? That’s what the alt attribute is for. It works like this:

As we continue to implement new features for the web platform, we’ll always need to think about how to make features work in multiple modalities.

Modifying Image Sizes

So far, an image’s size on the screen is its size in pixels, possibly zoomed.Note that zoom already may cause an image to render at a size different than its regular size, even before introducing the features in this section. But in fact it’s generally valuable for authors to control the size of embedded content. There are a number of ways to do this,For example, the width and height CSS properties (not to be confused with the width and height attributes!), which we met in Exercise 6-2. but one way is the special width and height attributes.Images have these mostly for historical reasons: they were invented before CSS existed.

If both those attributes are present, things are pretty easy: we just read from them when laying out the element, both in image:

This works great, but it has a major flaw: if the ratio of width to height isn’t the same as the underlying image size, the image ends up stretched in weird ways. Sometimes that’s on purpose but usually it’s a mistake. So browsers let authors specify just one of width and height, and compute the other using the image’s aspect ratio.Despite it being easy to implement, this feature of real web browsers only reached all of them in 2021. Before that, developers resorted to things like the padding-top hack. Sometimes design oversights take a long time to fix.

Your browser should now be able to render the following example page correctly, as shown in Figure 2. When it’s scrolled down a bit you should see what’s shown in Figure 3 (notice the different aspect ratios). And scrolling to the end will show what appears in Figure 4, including the “broken image” icon.

Interactive Widgets

So far, our browser has two kinds of embedded content: images and input elements. While both are important and widely used,As are variations like the <canvas> element. Instead of loading an image from the network, JavaScript can draw on a <canvas> element via an API. Unlike images, <canvas> elements don’t have intrinsic sizes, but besides that they are pretty similar in terms of layout. they don’t offer quite the customizabilityThere’s actually ongoing work aimed at allowing web pages to customize what input elements look like, and it builds on earlier work supporting custom elements and forms. This problem is quite challenging, interacting with platform independence, accessibility, scripting, and styling. and flexibility that complex embedded content use cases like maps, PDFs, ads, and social media controls require. So in modern browsers, these are handled by embedding one web page within another using the <iframe> element.Or via the embed and object tags, for cases like PDFs. I won’t discuss those here.

Semantically, an <iframe> is similar to a Tab inside a Tab—it has its own HTML document, CSS, and scripts. And layout-wise, an <iframe> is a lot like the <img> tag, with width and height attributes. So implementing basic iframes just requires handling these three significant differences:

We’ll get to these differences, but for now, let’s start working on the idea of a Tab within a Tab. What we’re going to do is split the Tab class into two pieces: Tab will own the event loop and script environments, Frames will do the rest.

It’s good to plan out complicated refactors like this in some detail. A Tab will:

Naturally, every Frame will need a reference to its Tab; it’s also convenient to have access to the parent frame and the corresponding <iframe> element:

Now let’s look at how Frames are created. The first place is in Tab’s load method, which needs to create the root frame:

Note that the guts of load now live in the Frame, because the Frame owns the HTML tree. The Frame can also construct child Frames, for <iframe> elements:

Since iframes can have subresources (and subframes!) and therefore be slow to load, we should load them asynchronously, just like scripts:

And since they are asynchronous, we need to record whether they have loaded yet, to avoid trying to render an unloaded iframe:

So we’ve now got a tree of frames inside a single tab. But because we will sometimes need direct access to an arbitrary frame, let’s also give each frame an identifier, which I’m calling a window ID:

Now that we have frames being created, let’s work on rendering those frames to the screen.

Iframe Rendering

Rendering is split between the Tab and its Frames: the Frame does style and layout, while the Tab does accessibility and paint.Why split the rendering pipeline this way? Because the accessibility tree and display list are ultimately transferred from the main thread to the browser thread, so they get combined anyway. DOM, style, and layout trees, meanwhile, don’t get passed between threads so don’t intermingle. We’ll need to implement that split, and also add code to trigger each Frame’s rendering from the Tab.

Let’s start with splitting the rendering pipeline. The main methods here are still the Tab’s run_animation_frame and render, which iterate over all loaded iframes:

Note that the needs_accessibility, pending_hover, and other flags are all still on the Tab, because they relate to the Tab’s part of rendering. Meanwhile, style and layout happen in the Frame now:

Again, these dirty bits move to the Frame because they relate to the frame’s part of rendering.

Unlike images, iframes have no intrinsic size: the layout size of an <iframe> element does not depend on its content.There was an attempt to provide iframes with intrinsic sizing in the past, but it was removed from the HTML specification when no browser implemented it. This may change in the future, as there are good use cases for a “seamless” iframe whose layout is coordinated with its parent frame. That means there’s a crucial extra bit of communication that needs to happen between the parent and child frames: how wide and tall should a frame be laid out? This is defined by the attributes and CSS of the iframe element:

The IframeLayout layout code is similar, inheriting from EmbedLayout, but without the aspect ratio code:

The extra two pixels provide room for a border, one pixel on each side, later on.

Note that if its width isn’t specified, an iframe uses a default value, chosen a long time ago based on the average screen sizes of the day:

Now, this code is run in the parent frame. We need to get this width and height over to the child frame, so that it can know its width and height during layout. So let’s add a field for that in the child frame:

The conditional is only there to handle the (unusual) case of an iframe blocked by CSP.

You might be surprised that I’m not calling set_needs_render on the child frame here. That’s a shortcut: the width and height attributes can only change through setAttribute, while zoom can only change in zoom_by and reset_zoom. All of those handlers, however, need to invalidate all frames, via a new method to do so, instead of the old set_needs_render on Tab which is now gone. Update all of these call sites to call it (plus changes to dark mode, which affects style for all frames):

Note that there’s a tricky dependency order here. We need the parent frame to do layout before the child frame, so the child frame has an up-to-date width and height when it does layout. That order is guaranteed for us by Python (3.7 or later), where dictionaries are sorted by insertion order, but if you’re following along in another language, you might need to sort frames before rendering them.

We’ve now got frames styled and laid out, and just need to paint them. Unlike layout and style, all the frames in a tab produce a single, unified display list, so we’re going to need to work recursively. We’ll have the Tab paint the root Frame:

Most of the layout tree’s paint methods don’t need to change, but to paint an IframeLayout, we’ll need to paint the child frame in paint_tree:

Before putting those commands in the display list, though, we need to add a border, clip iframe content that exceeds the visual area available, and transform the coordinate system:

The Transform shifts over the child frame contents so that its top-left corner starts in the right place,This book doesn’t go into the details of the CSS box model, but the width and height attributes of an iframe refer to the content box, and adding the border width yields the border box. As a result, what we’ve implemented is somewhat incorrect. ClipRRect clips the contents of the iframe to the inside of the border, and paint_outline adds the border. To trigger the outline, just add this to the browser CSS file:

Finally, let’s also add iframes to the accessibility tree. Like the display list, the accessibility tree is global across all frames. We can have iframes create iframe nodes:

So we’ve now got iframes showing up on the screen. The next step is interacting with them.

Iframe Input Events

Now that we’ve got iframes rendering to the screen, let’s close the loop with user input. We want to add support for clicking on things inside an iframe, and also for tabbing around or scrolling inside one.

When an iframe is clicked, it passes the click through to the child frame, and immediately returns afterward, because iframes capture click events. Note how I subtracted the absolute x and y offsets of the iframe from the (absolute) x and y click positions when recursing into the child frame:

Now, clicking on <a> elements will work, which means that you can now cause a frame to navigate to a new page. And because a Frame has all the loading and navigation logic that Tab used to have, it just works without any more changes!

You should now be able to load an iframe example. It should look like the image shown in Figure 5.

Repeatedly clicking on the link on that page will add another recursive iframe. After clicking twice it should look like Figure 6.

Let’s get the other interactions working as well, starting with focusing an element. You can focus on only one element per tab, so we will still store the focus on the Tab, but we’ll need to store the iframe the focused element is on too:

When an iframe tries to focus on an element, it sets itself as the focused iframe, but before it does that, it needs to un-focus the previously focused iframe:

We need to re-render the previously focused iframe so that it stops drawing the focus outline.

Another interaction is pressing Tab to cycle through focusable elements in the current frame. Let’s move the advance_tab logic into Frame and just dispatch to it from the Tab:This is not a particularly user-friendly implementation of tab cycling when multiple frames are involved; see Exercise 15-9 for a better version.

Do the same thing for keypress and enter, which are used for interacting with text inputs and buttons.

Another big interaction we need to support is scrolling. We’ll store the scroll offset in each Frame:

Now, as you might recall from Chapter 13, scrolling happens both inside Browser and inside Tab, to improve responsiveness. That was already quite complicated, so to keep things simple we’ll only support threaded scrolling on the root frame. We’ll need a new commit parameter so the browser thread knows whether the root frame is focused:

The Browser thread will save this information in commit and use it when the user requests a scroll:

If a frame other than the root frame is scrolled, we’ll just set needs_composite so the browser has to re-raster from scratch:

There’s one more subtlety to scrolling. After we scroll, we want to clamp the scroll position, to prevent the user scrolling past the last thing on the page. Right now clamp_scroll uses the window height to determine the maximum scroll amount; let’s move that function inside Frame so it can use the current frame’s height:

Make sure to use the clamp_scroll method everywhere. For example, in scroll_to:

There are also a number of accessibility hover interactions that we need to support. This is hard, because the accessibility interactions happen in the browser thread, which has limited information:

Hit testing FrameAccessibilityNodes will use the frame’s bounds to ignore clicks outside the frame bounds, and adjust clicks against the frame’s coordinates (note how we subtract off the zoomed border of the frame):

Hit testing should now work, but the bounds of the hovered node when drawn to the screen are still wrong. For that, we’ll need a method that returns the absolute screen rect of an AccessibilityNode. And that method in turn needs parent pointers to walk up the accessibility tree, so let’s add that first:

And now we’re ready for the method to map to absolute coordinates. This loops over all bounds Rects and maps them up to the root. Note that there is a special case for FrameAccessibilityNode, because its self-bounds are in the coordinate space of the frame containing the iframe.

This method calls map_to_parent to adjust the bounds. For most accessibility nodes we don’t need to do anything, because they are in the same coordinate space as their parent:

A FrameAccessibilityNode, on the other hand, adjusts for the iframe’s postion and clipping:

You should now be able to hover on nodes and have them read out by our accessibility subsystem.

Alright, we’ve now got all of our browser’s forms of user interaction properly recursing through the frame tree. It’s time to add more capabilities to iframes.

Iframe Scripts

We’ve now got users interacting with iframes—but what about scripts interacting with them? Of course, each frame can already run scripts—but right now, each Frame has its own JSContext, so these scripts can’t really interact with each other. Instead same-origin iframes should run in the same JavaScript context and should be able to access each other’s globals, call each other’s functions, and modify each other’s DOMs, as shown in Figure 7. Let’s implement that.

For two frames’ JavaScript environments to interact, we’ll need to put them in the same JSContext. So, instead of each Frame having a JSContext of its own, we’ll want to store JSContexts on the Tab, in a dictionary that maps origins to JavaScript contexts:

So we’ve got multiple pages’ scripts using one JavaScript context. But now we’ve got to keep their variables in their own namespaces somehow. The key is going to be the window global, of type Window. In the browser, this refers to the global object, and instead of writing a global variable like a, you can always write window.a instead.There are various proposals to expose multiple global namespaces as a JavaScript API. It would definitely be convenient to have that capability in this chapter, to avoid having to write window everywhere! To keep our implementation simple, in our browser, scripts will always need to reference variable and functions via window.This also means that all global variables in a script need to do the same, even if they are not browser APIs. We’ll need to do the same in our runtime:

Do the same for every function or variable in the runtime.js file. If you miss one, you’ll get errors like this:

If you see this error, it means you need to find where you need to write window.Node instead of Node. You’ll also need to modify EVENT_DISPATCH_JS to prefix classes with window:

To get multiple frames’ scripts to play nice inside one JavaScript context, we’ll create multiple Window objects: window_1, window_2, and so on. Before running a frame’s scripts, we’ll set window to that frame’s Window object, so that the script uses the correct Window.Some JavaScript engines support an API for changing the global object, but the DukPy library that we’re using isn’t one of them. There is a standard JavaScript operator called with which sort of does this, but the rules are complicated and not quite what we need here. It’s also not recommended these days.

So to begin with, let’s define the Window class when we create a JSContext:

Now, when a frame is created and wants to use a JSContext, it needs to ask for a window object to be created first:

Before running any JavaScript, we’ll want to change which window the window global refers to:

We can use this to, for example, set up the initial runtime environment for each Frame:

We’ll need to call wrap any time we use evaljs, which also means we’ll need to add a window ID argument to a lot of methods. For example, in run we’ll add a window_id parameter:

The same holds for various dispatching APIs. For example, to dispatch an event, we’ll need the window_id:

Likewise, we’ll need to pass a window ID argument in click, submit_form, and keypress; I’ve omitted those code fragments. Note that you should have modified your runtime.js file to store the LISTENERS on the window object, meaning each Frame will have its own set of event listeners to dispatch to:

Do the same for requestAnimationFrame, passing around a window ID and wrapping the code so that it correctly references window.

For calls from JavaScript into the browser, we’ll need JavaScript to pass in the window ID it’s calling from:

We’ll need something similar in innerHTML and style because we need to call set_needs_render on the relevant Frame.

Finally, for setTimeout and XMLHttpRequest, which involve a call from JavaScript into the browser and later a call from the browser into JavaScript, we’ll likewise need to pass in a window ID from JavaScript, and use that window ID when calling back into JavaScript. I’ve omitted many of the code changes in this section because they are quite repetitive. You can find all of the needed locations by searching your codebase for evaljs.

Communicating Between Frames

We’ve now managed to run multiple Frames’ worth of JavaScript in a single JSContext, and isolated them somewhat so that they don’t mess with each others’ state. But the whole point of this exercise is to allow some interaction between same-origin frames. Let’s do that now.

The simplest way two frames can interact is that they can get access to each other’s state via the parent attribute on the Window object. If the two frames have the same origin, that lets one frame call methods, access variables, and modify browser state for the other frame. Because we’ve had these same-origin frames share a JSContext, this isn’t too hard to implement. Basically, we’ll need a way to go from a window ID to its parent frame’s window ID:

On the JavaScript side, we now need to look up the Window object given its window ID. There are lots of ways you could do this, but the easiest is to have a global map:

Note that it’s possible for the lookup in WINDOWS to fail, if the parent frame is not in the same origin as the current one and therefore isn’t running in the same JSContext. In that case, this code returns a fresh Window object with that id. But iframes are not allowed to access each others’ documents across origins (or call various other APIs that are unsafe), so add a method that checks for this situation and raises an exception:

Then use this method in all JSContext methods that access documents:Note that in a real browser this is woefully inadequate security. A real browser would need to very carefully lock down the entire runtime.js code and audit every single JavaScript API with a fine-toothed comb.

So same-origin iframes can communicate via parent. But what about cross-origin iframes? It would be insecure to let them access each other’s variables or call each other’s methods, so instead browsers allow a form of message passing, a technique for structured communication between two different event loops that doesn’t require any shared state or locks.

Message-passing in JavaScript works like this: you call the postMessage API on the Window object you’d like to talk to, with the message itself as the first parameter and * as the second:The second parameter has to do with origin restrictions; see Exercise 15-8.

This will send the first argumentIn a real browser, you can also pass data that is not a string, such as numbers and objects. This works via a serialization algorithm called structured cloning, which converts most JavaScript objects (though not, for example, DOM nodes) to a sequence of bytes that the receiver frame can convert back into a JavaScript object. DukPy doesn’t support structured cloning natively for objects, so our browser won’t support this either. to the parent frame, which can receive the message by handling the message event on its Window object:

Note that in this second code snippet, window is the receiving Window, a different Window from the window in the first snippet.

Let’s implement postMessage, starting on the receiver side. Since this event happens on the Window, not on a Node, we’ll need a new WINDOW_LISTENERS array:

The event listener and dispatching code is the same as for Node, except it’s on Window and uses WINDOW_LISTENERS. You can just duplicate those methods:

That’s everything on the receiver side; now let’s do the sender side. First, let’s implement the postMessage API itself. Note that this is the receiver or target window:

Scheduling the task is necessary because postMessage is an asynchronous API; sending a synchronous message might involve synchronizing multiple JSContexts or even multiple processes, which would add a lot of overhead and probably result in deadlocks.

You should now be able to use postMessage to send messages between frames,In the iframe demo, for example, you should see “Message received from iframe: This is the contents of postMessage.” printed to the console. (This particular example uses a same-origin postMessage. You can test cross-origin locally by starting two local HTTP servers on different ports, then changing the URL of the example15-img.html iframe document to point to the second port.) including cross-origin frames running in different JSContexts, in a secure way.

Isolation and Timing

Iframes add a whole new layer of security challenges atop what we discussed in Chapter 10. The power to embed one web page into another creates a commensurate security risk when the two pages don’t trust each other—both in the case of embedding an untrusted page into your own page, and the reverse, where an attacker embeds your page into their own, malicious one. In both cases, we want to protect your page from any security or privacy risks caused by the other frame.Websites can protect themselves from being iframed via the X-Frame-Options header.

The starting point is that cross-origin iframes can’t access each other directly through JavaScript. That’s good—but what if a bug in the JavaScript engine, like a buffer overrun, lets an iframe circumvent those protections? Unfortunately, bugs like this are common enough that browsers have to defend against them. For example, browsers these days run frames from different origins in different operating system processes, and use operating system features to limit how much access those processes have.

Other parts of the browser mix content from multiple frames, like our browser’s Tab-wide display list. That means that a bug in the rasterizer could allow one frame to take over the rasterizer and then read data that ultimately came from another frame. This might seem like a rather complex attack, but it has happened before, so modern browsers use sandboxing techniques to prevent it. For example, Chromium can place the rasterizer in its own process and use a Linux feature called seccomp to limit what system calls that process can make. Even if a bug compromised the rasterizer, that rasterizer wouldn’t be able to exfiltrate data over the network, preventing private data from leaking.

These isolation and sandboxing features may seem “straightforward”, in the same sense that the browser thread we added in Chapter 12 is “straightforward”. In practice, the many browser APIs mean the implementation is full of subtleties and ends up being extremely complex. Chromium, for example, took many years to ship the first implementation of site isolation.

Site isolation has become much more important in recent years, due to the CPU cache timing attacks called spectre and meltdown. In short, these attacks allow an attacker to read arbitrary locations in memory—including another frame’s data, if the two frames are in the same process—by measuring the time certain CPU operations take. Placing sensitive content in different CPU processes (which come with their own memory address spaces) is a good protection against these attacks.

That said, these kinds of timing attacks can be subtle, and there are doubtless more that haven’t been discovered yet. To try to dull this threat, browsers currently prevent access to high-precision timers that can provide the accurate timing data typically required for timing attacks. For example, browsers reduce the accuracy of APIs like Date.now or setTimeout.

Worse yet, there are browser APIs that don’t seem like timers but can be used as such.For example, the SharedArrayBuffer API lets two JavaScript threads run concurrently and share memory, which can be used to construct a clock. These APIs are useful, so browsers don’t quite want to remove them, but there is also no way to make them “less accurate”, since they are not a clock to begin with. Browsers now require certain optional HTTP headers to be present in the parent and child frames’ HTTP responses in order to allow use of SharedArrayBuffer in particular, though this is not a perfect solution.

Summary

This chapter introduced how the browser handles embedded content use cases like images and iframes. Reiterating the main points:

And, as we hope you saw in this chapter, none of these features are too difficult to implement, though—as you’ll see in the exercises—implementing them well requires a lot of attention to detail.

Outline

The complete set of functions, classes, and methods in our browser should now look something like this:

COOKIE_JAR

class URL:
    def __init__(url)

    def request(referrer, payload)

    def resolve(url)

    def origin()

    def __str__()

class Text:
    def __init__(text, parent)

    def __repr__()

class Element:
    def __init__(tag, attributes, parent)

    def __repr__()

def print_tree(node, indent)

def tree_to_list(tree, list)

def is_focusable(node)

def get_tabindex(node)

class HTMLParser:
    SELF_CLOSING_TAGS

    HEAD_TAGS

    def __init__(body)

    def parse()

    def get_attributes(text)

    def add_text(text)

    def add_tag(tag)

    def implicit_tags(tag)

    def finish()

class CSSParser:
    def __init__(s)

    def whitespace()

    def literal(literal)

    def word()

    def ignore_until(chars)

    def pair(until)

    def selector()

    def body()

    def parse()

    def until_chars(chars)

    def simple_selector()

    def media_query()

class TagSelector:
    def __init__(tag)

    def matches(node)

class DescendantSelector:
    def __init__(ancestor, descendant)

    def matches(node)

class PseudoclassSelector:
    def __init__(pseudoclass, base)

    def matches(node)

FONTS

def get_font(size, weight, style)

def font(style, zoom)

def linespace(font)

NAMED_COLORS

def parse_color(color)

def parse_blend_mode(blend_mode_str)

def parse_transition(value)

def parse_transform(transform_str)

def parse_outline(outline_str)

def parse_image_rendering(quality)

REFRESH_RATE_SEC

class MeasureTime:
    def __init__()

    def time(name)

    def stop(name)

    def finish()

class Task:
    def __init__(task_code)

    def run()

class TaskRunner:
    def __init__(tab)

    def schedule_task(task)

    def set_needs_quit()

    def clear_pending_tasks()

    def start_thread()

    def run()

    def handle_quit()

DEFAULT_STYLE_SHEET

INHERITED_PROPERTIES

def style(node, rules, frame)

def cascade_priority(rule)

def diff_styles(old_style, new_style)

class NumericAnimation:
    def __init__(old_value, new_value, num_frames)

    def animate()

def dpx(css_px, zoom)

WIDTH, HEIGHT

HSTEP, VSTEP

INPUT_WIDTH_PX

IFRAME_WIDTH_PX, IFRAME_HEIGHT_PX

BLOCK_ELEMENTS

class DocumentLayout:
    def __init__(node, frame)

    def layout(width, zoom)

    def should_paint()

    def paint()

    def paint_effects(cmds)

class BlockLayout:
    def __init__(node, parent, previous, frame)

    def layout_mode()

    def layout()

    def recurse(node)

    def add_inline_child(node, w, child_class, frame, word)

    def new_line()

    def word(node, word)

    def input(node)

    def image(node)

    def iframe(node)

    def self_rect()

    def should_paint()

    def paint()

    def paint_effects(cmds)

class LineLayout:
    def __init__(node, parent, previous)

    def layout()

    def should_paint()

    def paint()

    def paint_effects(cmds)

class TextLayout:
    def __init__(node, word, parent, previous)

    def layout()

    def should_paint()

    def paint()

    def paint_effects(cmds)

    def self_rect()

class EmbedLayout:
    def __init__(node, parent, previous, frame)

    def layout()

    def should_paint()

class InputLayout:
    def __init__(node, parent, previous, frame)

    def layout()

    def paint()

    def paint_effects(cmds)

    def self_rect()

class ImageLayout:
    def __init__(node, parent, previous, frame)

    def layout()

    def paint()

    def paint_effects(cmds)

class IframeLayout:
    def __init__(node, parent, previous, parent_frame)

    def layout()

    def paint()

    def paint_effects(cmds)

BROKEN_IMAGE

class PaintCommand:
    def __init__(rect)

class DrawText:
    def __init__(x1, y1, text, font, color)

    def execute(canvas)

class DrawRect:
    def __init__(rect, color)

    def execute(canvas)

class DrawRRect:
    def __init__(rect, radius, color)

    def execute(canvas)

class DrawLine:
    def __init__(x1, y1, x2, y2, color, thickness)

    def execute(canvas)

class DrawOutline:
    def __init__(rect, color, thickness)

    def execute(canvas)

class DrawCompositedLayer:
    def __init__(composited_layer)

    def execute(canvas)

class DrawImage:
    def __init__(image, rect, quality)

    def execute(canvas)

class VisualEffect:
    def __init__(rect, children, node)

class Blend:
    def __init__(opacity, blend_mode, node, children)

    def execute(canvas)

    def map(rect)

    def unmap(rect)

    def clone(child)

class Transform:
    def __init__(translation, rect, node, children)

    def execute(canvas)

    def map(rect)

    def unmap(rect)

    def clone(child)

def local_to_absolute(display_item, rect)

def absolute_bounds_for_obj(obj)

def absolute_to_local(display_item, rect)

def map_translation(rect, translation, reversed)

def paint_tree(layout_object, display_list)

def paint_visual_effects(node, cmds, rect)

def paint_outline(node, cmds, rect, zoom)

def add_parent_pointers(nodes, parent)

class CompositedLayer:
    def __init__(skia_context, display_item)

    def can_merge(display_item)

    def add(display_item)

    def composited_bounds()

    def absolute_bounds()

    def raster()

SPEECH_FILE

class AccessibilityNode:
    def __init__(node, parent)

    def compute_bounds()

    def build()

    def build_internal(child_node)

    def contains_point(x, y)

    def hit_test(x, y)

    def map_to_parent(rect)

    def absolute_bounds()

class FrameAccessibilityNode:
    def __init__(node, parent)

    def build()

    def hit_test(x, y)

    def map_to_parent(rect)

def speak_text(text)

EVENT_DISPATCH_JS

SETTIMEOUT_JS

XHR_ONLOAD_JS

POST_MESSAGE_DISPATCH_JS

RUNTIME_JS

class JSContext:
    def __init__(tab, url_origin)

    def run(script, code, window_id)

    def add_window(frame)

    def wrap(script, window_id)

    def dispatch_event(type, elt, window_id)

    def dispatch_post_message(message, window_id)

    def dispatch_settimeout(handle, window_id)

    def dispatch_xhr_onload(out, handle, window_id)

    def dispatch_RAF(window_id)

    def throw_if_cross_origin(frame)

    def get_handle(elt)

    def querySelectorAll(selector_text, window_id)

    def getAttribute(handle, attr)

    def setAttribute(handle, attr, value, window_id)

    def innerHTML_set(handle, s, window_id)

    def style_set(handle, s, window_id)

    def XMLHttpRequest_send(...)

    def setTimeout(handle, time, window_id)

    def requestAnimationFrame()

    def parent(window_id)

    def postMessage(target_window_id, message, origin)

SCROLL_STEP

class Frame:
    def __init__(tab, parent_frame, frame_element)

    def allowed_request(url)

    def load(url, payload)

    def render()

    def clamp_scroll(scroll)

    def set_needs_render()

    def set_needs_layout()

    def advance_tab()

    def focus_element(node)

    def activate_element(elt)

    def submit_form(elt)

    def keypress(char)

    def scrolldown()

    def scroll_to(elt)

    def click(x, y)

class Tab:
    def __init__(browser, tab_height)

    def load(url, payload)

    def run_animation_frame(scroll)

    def render()

    def get_js(url)

    def allowed_request(url)

    def raster(canvas)

    def clamp_scroll(scroll)

    def set_needs_render()

    def set_needs_layout()

    def set_needs_paint()

    def set_needs_render_all_frames()

    def set_needs_accessibility()

    def scrolldown()

    def click(x, y)

    def go_back()

    def submit_form(elt)

    def keypress(char)

    def focus_element(node)

    def activate_element(elt)

    def scroll_to(elt)

    def enter()

    def advance_tab()

    def zoom_by(increment)

    def reset_zoom()

    def set_dark_mode(val)

    def post_message(message, target_window_id)

class Chrome:
    def __init__(browser)

    def tab_rect(i)

    def paint()

    def click(x, y)

    def keypress(char)

    def enter()

    def blur()

    def focus_addressbar()

class CommitData:
    def __init__(...)

class Browser:
    def __init__()

    def schedule_animation_frame()

    def commit(tab, data)

    def render()

    def composite_raster_and_draw()

    def composite()

    def get_latest(effect)

    def paint_draw_list()

    def raster_tab()

    def raster_chrome()

    def update_accessibility()

    def draw()

    def speak_node(node, text)

    def speak_document()

    def set_needs_accessibility()

    def set_needs_animation_frame(tab)

    def set_needs_raster_and_draw()

    def set_needs_raster()

    def set_needs_composite()

    def set_needs_draw()

    def clear_data()

    def new_tab(url)

    def new_tab_internal(url)

    def set_active_tab(tab)

    def schedule_load(url, body)

    def clamp_scroll(scroll)

    def handle_down()

    def handle_click(e)

    def handle_key(char)

    def handle_enter()

    def handle_tab()

    def handle_hover(event)

    def handle_quit()

    def toggle_dark_mode()

    def increment_zoom(increment)

    def reset_zoom()

    def focus_content()

    def focus_addressbar()

    def go_back()

    def cycle_tabs()

    def toggle_accessibility()

def mainloop(browser)

Exercises

15-1 Canvas element. Implement the <canvas> element, the 2D aspect of the getContext API, and some of the drawing commands on CanvasRenderingContext2D. Canvas layout is just like an iframe, including its default width and height. You should allocate a Skia surface of an appropriate size when getContext("2d") is called, and implement some of the APIs that draw to the canvas.Note that the Canvas APIs raster each drawing command immediately, instead of waiting until the rest of the page is rastered. This is called immediate mode rendering—as opposed to the retained mode used by HTML. Immediate mode means the web developer decides when to incur the rasterization time. It should be straightforward to translate most API methods to their Skia equivalent.

15-2 Background images. Elements can have a background-image. Implement the basics of this CSS property: a url(...) value for the background-image property. Avoid loading the image if the background-image property does not actually end up used on any element. For a bigger challenge, also allow the web page to set the size of the background image with the background-size CSS property.

15-3 object-fit. Implement the object-fit CSS property. It determines how the image within an <img> element is sized relative to its container element. This will require clipping images with a different aspect ratio.

15-4 Lazy loading. Downloading images can use quite a bit of data.In the early days of the web, computer networks were slow enough that browsers had a user setting to disable downloading of images until the user expressly asked for them. While browsers default to downloading all images on the page immediately, the loading attribute on img elements can instruct a browser to only download images if they are close to the visible area of the page. This kind of optimization is generally called lazy loading. Implement loading. Make sure the page is laid out correctly both before and after the image finishes loading.

15-5 Iframe aspect ratio. Implement the aspect-ratio CSS property and use it to provide an implicit sizing to iframes and images when only one of width or height is specified (or when the image is not yet loaded, if you do Exercise 15-4).

15-6 Image placeholders. Building on top of lazy loading, implement placeholder styling of images that haven’t loaded yet. This is done by setting a 0×0 sizing, unless width or height is specified. Also add support for hiding the “broken image” if the alt attribute is missing or empty.That’s because if alt text is provided, the browser can assume the image is important to the meaning of the website, and so it should tell the user that they are missing out on some of the content if it fails to load. But otherwise, the broken image icon is probably just ugly clutter.

15-7 Media queries. Implement the width media query. Make sure it works inside iframes. Also make sure it works even when the width of an iframe is changed by its parent frame.

15-8 Target origin for postMessage. Implement the targetOrigin parameter to postMessage. This parameter is a string which indicates the frame origins that are allowed to receive the message.

15-9 Multi-frame focus. In our browser, pressing Tab cycles through the elements in the focused frame. But this means it’s impossible to access focusable elements in other frames by keyboard alone. Fix it to move between frames after iterating through all focusable elements in one frame.

15-10 Iframe history. Ensure that iframes affect browser history. For example, if you click on a link inside an iframe, and then hit the back button, it should go back inside the iframe. Make sure that this works even when the user clicks links in multiple frames in various orders.It’s debatable whether this is a good feature of iframes, as it causes a lot of confusion for web developers who embed iframes they don’t plan on navigating.

15-11 Iframes added or removed by script. The innerHTML API can cause iframes to be added or removed, but our browser doesn’t load or unload them when this happens. Fix this: new iframes should be loaded and old ones unloaded.

15-12 X-Frame-Options. Implement this header, which disallows a web page from appearing in an iframe.