Supporting Embedded Content

Twitter · Blog · Patreon · Discussions

While our toy browser can render complex styles, visual effects, and animations, all of those apply basically just to text. Yet web pages contain a variety of non-text embedded content, from images to other web pages. Support for embedded content has powerful implications for browser architecture, performance, security, and open information access, and has played a key role throughout the web’s history.

Images

Images are certainly the most popular kind of embedded content on the web,So it’s a little ironic that images only make their appearance in chapter 15 of this book! It’s because Tkinter doesn’t support many image formats or proper sizing and clipping, so I had to wait for the introduction of Skia. dating back to early 1993.This history is also the reason behind a lot of inconsistencies, like src versus href or img versus image. They’re included on web pages via the <img> tag, which looks like this:

<img src="https://browser.engineering/im/hes.jpg">

And which renders something like this:

A computer operator using a hypertext editing system in 1969
Hypertext Editing System
(Gregory Lloyd from Wikipedia, CC BY 2.0)

Luckily, implementing images isn’t too hard, so let’s just get started. There are four steps to displaying images in our browser:

  1. Download the image from a URL.
  2. Decode the image into a buffer in memory.
  3. Lay the image out on the page.
  4. Paint the image in the display list.

Let’s start with downloading images from a URL. Naturally, that happens over HTTP, which we already have a request function for. However, while all of the content we’ve downloaded so far—HTML, CSS, and JavaScript—has been textual, images typically use binary data formats. We’ll need to extend request to support binary data.

The change is pretty minimal: instead of passing the "r" flag to makefile, pass a "b" flag indicating binary mode:

def request(url, top_level_url, payload=None):
    # ...
    response = s.makefile("b")
    # ...

Now every time we read from response, we will get bytes of binary data, not a str with textual data, so we’ll need to change some HTTP parser code to explicitly decode the data:

def request(url, top_level_url, payload=None):
    # ...
    statusline = response.readline().decode("utf8")
    # ...
    while True:
        line = response.readline().decode("utf8")
        # ...
    # ...

Note that I didn’t add a decode call when we read the body; that’s because the body might actually be binary data, and we want to return that binary data directly to the browser. Now, every existing call to request, which wants textual data, needs to decode the response. For example, in load, you’ll want to do something like this:

class Tab:
    def load(self, url, body=None):
        # ...
        headers, body = request(url, self.url, body)
        body = body.decode("utf8")
        # ...

Make sure to make this change everywhere in your browser that you call request, including inside XMLHttpRequest_send and in several other places in load.

When we download images, however, we won’t call decode, and just use the binary data directly.

class Tab:
    def load(self, url, body=None):
        # ...
        images = [node
            for node in tree_to_list(self.nodes, [])
            if isinstance(node, Element)
            and node.tag == "img"]
        for img in images:
            src = img.attributes.get("src", "")
            image_url = resolve_url(src, self.url)
            assert self.allowed_request(image_url), \
                "Blocked load of " + image_url + " due to CSP"
            header, body = request(image_url, self.url)

Once we’ve downloaded the image, we need to turn it into a Skia Image object. That requires the following code:

class Tab:
    def load(self, url, body=None):
        for img in images:
            # ...
            img.encoded_data = body
            data = skia.Data.MakeWithoutCopy(body)
            img.image = skia.Image.MakeFromEncoded(data)

There are two tricky steps here: the requested data is turned into a Skia Data object using the MakeWithoutCopy method, and then into an image with MakeFromEncoded.

Because we used MakeWithoutCopy, the Data object just stores a reference to the existing body and doesn’t own that data. That’s essential, because encoded image data can be large—maybe megabytes—and copying that data wastes memory and time. But that also means that the data will become invalid if body is ever garbage-collected; that’s why I save the body in an encoded_data field.This is a bit of a hack. Perhaps a better solution would be to write the response directly into a Skia Data object using the writable_data API. It would require some refactoring of the rest of the browser which is why I’m choosing to avoid it.

These download and decode steps can both fail; if that happens we’ll load a “broken image” placeholder (I used this one):

BROKEN_IMAGE = skia.Image.open("Broken_Image.png")

class Tab:
    def load(self, url, body=None):
        for img in images:
            try:
                # ...
            except Exception as e:
                print("Exception loading image: url="
                    + image_url + " exception=" + str(e))
                img.image = BROKEN_IMAGE

Now that we’ve downloaded and saved the image, we need to use it. Recall that the Image object is created using a MakeFromEncoded method. That name reminds us that the image we’ve downloaded isn’t raw image bytes. In fact, all of the image formats you know—JPG, PNG, and the many more obscure ones—encode the image data using various sophisticated algorithms. The image therefore needs to be decoded before it can be used.

Luckily, Skia will automatically do the decoding for us, so drawing the image is pretty simple:

class DrawImage(DisplayItem):
    def __init__(self, image, rect):
        super().__init__(rect)
        self.image = image

    def execute(self, canvas):
        canvas.drawImageRect(self.image, self.rect)

Skia applies a variety of clever optimizations to decoding, such as directly decoding the image to its eventual size and caching the decoded image as long as possible.There’s also is an HTML API to control decoding, so that the web page author can indicate when to pay that cost. That’s because raw image data can be quite large:Decoding costs both a lot of memory and also a lot of time, since just writing out all of those bytes can take a big chunk of our render budget. Optimizing image handling is essential to a performant browser. a pixel is usually stored as four bytes, so a 12 megapixel camera (as you can find on phones these days) produces 48 megabytes of raw data.

But because image decoding can be so expensive, Skia actually has several algorithms for decoding to different sizes, some of which are faster but result in a worse-looking image.Image formats like JPEG are also lossy, meaning that they don’t faithfully represent all of the information in the original picture, so there’s a time/quality trade-off going on before the file is saved. Typically these formats try to drop “noisy details” that a human is unlikely to notice, just like different resizing algorithms might. For example, just for resizing an image, there’s fast, simple, “nearest neighbor” resizing and the slower but higher-quality “bilinear” or even “Lanczos” resizing algorithms.

To give web page authors control over this performance bottleneck, there’s an image-rendering CSS property that indicates which algorithm to use. Let’s add that as an argument to DrawImage:

class DrawImage(DisplayItem):
    def __init__(self, image, rect, quality):
        # ...
        if quality == "high-quality":
            self.quality = skia.FilterQuality.kHigh_FilterQuality
        elif quality == "crisp-edges":
            self.quality = skia.FilterQuality.kLow_FilterQuality
        else:
            self.quality = skia.FilterQuality.kMedium_FilterQuality

    def execute(self, canvas):
        paint = skia.Paint(FilterQuality=self.quality)
        canvas.drawImageRect(self.image, self.rect, paint)

With the images downloaded and decoded, all we need to see the downloaded images is to add images into our browser’s layout tree.

The HTTP Content-Type header lets the web server tell the browser whether a document contains text or binary data. The header contains a value called a MIME type, such as text/html, text/css, and text/javascript for HTML, CSS, and JavaScript; image/png and image/jpeg for PNG and JPEG images; and many others for various font, video, audio, and data formats.“MIME” stands for Multipurpose Internet Mail Extensions, and was originally intended for enumerating all of the acceptable data formats for email attachments. These days the loop has basically closed: most email clients are now “webmail” clients, accessed through your browser, and most emails are now HTML, encoded with the text/html MIME type. Many mail clients do still have an option to encode the email in text/plain, however. Interestingly, we didn’t need to the image format in the code above. That’s because many image formats start with “magic bytes”; for example, PNG files always start with byte 137 followed by the letters “PNG”. These magic bytes are often more reliable than web-server-provided MIME types, so such “format sniffing” is common inside browsers and their supporting libraries.

Embedded layout

Based on your experience with prior chapters, you can probably guess how to add images to our browser’s layout and paint process. We’ll need to create an ImageLayout method; add a new image case to BlockLayout’s recurse method; and generate a DrawImage command from ImageLayout’s paint method.

As we do this, you might recall doing something very similar for <input> elements. In fact, text areas and buttons are very similar to images: both are leaf nodes of the DOM, placed into lines, affected by text baselines, and painting custom content.Images aren’t quite like text because a text node is potentially an entire run of text, split across multiple lines, while an image is an atomic inline. The other types of embedded content in this chapter are also atomic inlines. Since they are so similar, let’s try to reuse the same code for both.

Let’s split the existing InputLayout into a superclass called EmbedLayout, containing most of the existing code, and a new subclass with the input-specific code, InputLayout:In a real browser, input elements are usually called widgets because they have a lot of special rendering rules that sometimes involve CSS.

class EmbedLayout:
    def __init__(self, node, parent, previous, frame):
        # ...

    def get_ascent(self, font_multiplier=1.0):
        return -self.height

    def get_descent(self, font_multiplier=1.0):
        return 0

    def layout(self):
        self.zoom = self.parent.zoom
        self.font = font(self.node.style, self.zoom)
        if self.previous:
            space = self.previous.font.measureText(" ")
            self.x = \
                self.previous.x + space + self.previous.width
        else:
            self.x = self.parent.x
class InputLayout(EmbedLayout):
    def __init__(self, node, parent, previous):
        super().__init__(node, parent, previous)

    def layout(self):
        super().layout()

The idea is that EmbedLayout should provide common layout code for all kinds of embedded content, while its subclasses like InputLayout should provide the custom code for that type of content. Different types of embedded content might have different widths and heights, so that should happen in InputLayout, and each subclass has its own unique definition of paint:

class InputLayout(EmbedLayout):
    def layout(self):
        # ...
        self.width = device_px(INPUT_WIDTH_PX, self.zoom)
        self.height = linespace(self.font)

    def paint(self, display_list):
        # ...

ImageLayout can now inherit most of its behavior from EmbedLayout, but take its width and height from the image itself:

class ImageLayout(EmbedLayout):
    def __init__(self, node, parent, previous):
        super().__init__(node, parent, previous)
    def layout(self):
        super().layout()
        self.width = device_px(self.node.image.width(), self.zoom)
        self.img_height = device_px(self.node.image.height(), self.zoom)
        self.height = max(self.img_height, linespace(self.font))

Notice that the height of the image depends on the font size of the element. Though odd, this is how image layout actually works: a line with a single, very small, image on it will still be tall enough to contain text.In fact, a page with only a single image and no text or CSS at all still has its layout affected by a font—the default font. This is a common source of confusion for web developers. In a real browser, it can be avoided by forcing an image into a block or other layout mode via the display CSS property. The underlying reason for this is because, as a type of inline layout, images are designed to flow along with related text, which means the bottom of the image should line up with the text baseline (in fact, img_height is saved in the code above to ensure they line up).

Painting an image is also straightforward:

class ImageLayout(EmbedLayout):
    def paint(self, display_list):
        cmds = []
        rect = skia.Rect.MakeLTRB(
            self.x, self.y + self.height - self.img_height,
            self.x + self.width, self.y + self.height)
        quality = self.node.style.get("image-rendering", "auto")
        cmds.append(DrawImage(self.node.image, rect, quality))
        display_list.extend(cmds)

Now we need to create ImageLayouts in BlockLayout. Input elements are created in an input method, so we could could duplicate it calling it image…but input is itself a duplicate of text, so this would be a lot of almost-identical methods. The only part of these methods that differs is the part that computes the width of the new inline child; most of the rest of the logic is shared.

Let’s instead refactor the shared code into new methods which text, input, and input can call. First, all of these methods need a font to determine how big of a spaceYes, this is how real browsers do it too. to leave after the inline; let’s make a function for that:

def font(style, zoom):
    weight = style["font-weight"]
    variant = style["font-style"]
    size = float(style["font-size"][:-2])
    font_size = device_px(size, zoom)
    return get_font(font_size, weight, variant)

There’s also shared code that handles line layout; let’s put that into a new add_inline_child method. We’ll need parameters for the layout class to instantiate and a word parameter that is only passed for some layout classes.

class BlockLayout:
    def add_inline_child(self, node, w, child_class, word=None):
        if self.cursor_x + w > self.x + self.width:
            self.new_line()
        line = self.children[-1]
        if word:
            child = child_class(node, line, self.previous_word, word)
        else:
            child = child_class(node, line, self.previous_word, frame)
        line.children.append(child)
        self.previous_word = child
        self.cursor_x += w + font(node.style, self.zoom).measureText(" ")

We can redefine text and input in a satisfying way now:

class BlockLayout:
    def text(self, node):
        node_font = font(node.style, self.zoom)
        for word in node.text.split():
            w = node_font.measureText(word)
            self.add_inline_child(node, w, TextLayout, word)

    def input(self, node):
        w = device_px(INPUT_WIDTH_PX, self.zoom)
        self.add_inline_child(node, w, InputLayout) 

Adding image is now also straightforward:

class BlockLayout:
    def recurse(self, node):
            # ...
            elif node.tag == "img":
                self.image(node)
    
    def image(self, node):
        w = device_px(node.image.width(), self.zoom)
        self.add_inline_child(node, w, ImageLayout)

Now that we have ImageLayout nodes in our layout tree, we’ll be painting DrawImage commands to our display list and showing the image on the screen!

But what about our second output modality, screen readers? That’s what the alt attribute is for. It works like this:

<img src="https://browser.engineering/im/hes.jpg"
alt="A computer operator using a hypertext editing system in 1969">

Implementing this in AccessibilityNode is very easy:

class AccessibilityNode:
    def __init__(self, node):
        else:
            # ...
            elif node.tag == "img":
                self.role = "image"

    def build(self):
        # ...
        elif self.role == "image":
            if "alt" in self.node.attributes:
                self.text = "Image: " + self.node.attributes["alt"]
            else:
                self.text = "Image"

As we continue to implement new features for the web platform, we’ll always need to think about how to make features work in multiple modalities.

Videos are similar to images, but demand more bandwidth, time, and memory; they also have complications like Digital Rights Management (DRM). The <video> tag addresses some of that, with built-in support for advanced video codecs,In video, it’s called a “codec”, but in images it’s called a “format”–go figure. DRM, and hardware acceleration. It also provides media controls like a play/pause button and volume controls.

Modifying Image Sizes

So far, an image’s size on the screen is its size in pixels, possibly zoomed.Note that zoom already may cause an image to render at a size different than its regular size, even before introducing the features in this section. But in fact it’s generally valuable for authors to control the size of embedded content. There are a number of ways to do this,For example, the width and height CSS properties (not to be confused with the width and height attributes!), which were an exercise in Chapter 13. but one way is the special width and height attributes.Images have these mostly for historical reasons, because these attributes were invented before CSS existed.

If both those attributes are present, things are pretty easy: we just read from them when laying out the element, both in image:

class BlockLayout:
    def image(self, node):
        if "width" in node.attributes:
            w = device_px(int(node.attributes["width"]), self.zoom)
        else:
            w = device_px(node.image.width(), self.zoom)
        # ...

And in ImageLayout:

class ImageLayout(EmbedLayout):
    def layout(self):
        # ...
        width_attr = self.node.attributes.get("width")
        height_attr = self.node.attributes.get("height")
        image_width = self.node.image.width()
        image_height = self.node.image.height()

        if width_attr and height_attr:
            self.width = device_px(int(width_attr), self.zoom)
            self.img_height = device_px(int(height_attr), self.zoom)
        else:
            self.width = device_px(image_width, self.zoom)
            self.img_height = device_px(image_height, self.zoom)
        # ...

This works great, but it has a major flaw: if the ratio of width to height isn’t the same as the underlying image size, the image ends up stretched in weird ways. Sometimes that’s on purpose but usually it’s a mistake. So browsers let authors specify just one of width and height, and compute the other using the image’s aspect ratio.Despite it being easy to implement, this feature of real web browsers only appeared in 2021. Before that, developers resorted to things like the padding-top hack. Sometimes design oversights take a long time to fix.

Implementing this aspect ratio tweak is easy:

class ImageLayout(EmbedLayout):
    # ...
    def layout(self):
        # ...
        aspect_ratio = image_width / image_height

        if width_attr and height_attr:
            # ...
        elif width_attr:
            self.width = device_px(int(width_attr), self.zoom)
            self.img_height = self.width / aspect_ratio
        elif height_attr:
            self.img_height = device_px(int(height_attr), self.zoom)
            self.width = self.img_height * aspect_ratio
        else:
            # ...
        # ...

Your browser should now be able to render this example page correctly.

Our browser computes an aspect ratio from the loaded image dimensions, but that’s not available before an image loads, which is a problem in real browsers where images are loaded asynchronously and where the image size can respond to layout parameters. Not knowing the aspect ratio can cause the layout to shift when the image loads, which can be frustrating for users. The aspect-ratio property is one way web pages can address this issue.

Interactive widgets

So far, our browser has two kinds of embedded content: images and input elements. While both are important and widely-used,As are variations like the <canvas> element. Instead of loading an image from the network, JavaScript can draw on a <canvas> element via an API. Unlike images, <canvas> element’s don’t have intrinsic sizes, but besides that they are pretty similar. they don’t offer quite the customizabilityThere’s actually ongoing work aimed at allowing web pages to customize what input elements look like, and it builds on earlier work supporting custom elements and forms. This problem is quite challenging, interacting with platform independence, accessibility, scripting, and styling. and flexibility that complex embedded content use cases like maps, PDFs, ads, and social media controls require. So in modern browsers, these are handled by embedding one web page within another using the <iframe> element.

Semantically, an <iframe> is almost exactly a Tab inside a Tab—it has its own HTML document, CSS, and scripts. And layout-wise, an <iframe> is a lot like the <img> tag, with width and height attributes. So implementing basic iframes just requires handling three significant differences:

We’ll get to these differences, but for now, let’s start working on the idea of a Tab within a Tab. What we’re going to do is split the Tab class into two pieces: Tab will own the event loop and script environments, Frames that do the rest.

It’s good to plan out complicated refactors like this in some detail. A Tab will:

And the new Frame class will:

Create these two classes and split the methods between them accordingly.

Naturally, every Frame will need a reference to its Tab; it’s also convenient to have access to the parent frame and the corresponding <iframe> element:

class Frame:
    def __init__(self, tab, parent_frame, frame_element):
        self.tab = tab
        self.parent_frame = parent_frame
        self.frame_element = frame_element
        # ...

Now let’s look at how Frames are created. The first place is in Tab’s load method, which needs to create the root frame:

class Tab:
    def __init__(self, browser):
        # ...
        self.root_frame = None

    def load(self, url, body=None):
        self.history.append(url)
        # ...
        self.root_frame = Frame(self, None, None)
        self.root_frame.load(url, body)

Note that the guts of load now lives in the Frame, because the Frame owns the DOM tree. The Frame can also construct child Frames, for <iframe> elements:

class Frame:
    def load(self, url, body=None):
        # ...
        iframes = [node
                   for node in tree_to_list(self.nodes, [])
                   if isinstance(node, Element)
                   and node.tag == "iframe"
                   and "src" in node.attributes]
        for iframe in iframes:
            document_url = resolve_url(iframe.attributes["src"],
                self.tab.root_frame.url)
            if not self.allowed_request(document_url):
                print("Blocked iframe", document_url, "due to CSP")
                iframe.frame = None
                continue
            iframe.frame = Frame(self.tab, self, iframe)
            iframe.frame.load(document_url)
        # ...

So we’ve now got a tree of frames inside a single tab. But because we will sometimes need direct access to an arbitrary frame, let’s also give each frame an identifier, which I’m calling a window ID:

class Frame:
    def __init__(self, tab, parent_frame, frame_element):
        # ...
        self.window_id = len(self.tab.window_id_to_frame)
        self.tab.window_id_to_frame[self.window_id] = self

class Tab:
    def __init__(self, browser):
        # ...
        self.window_id_to_frame = {}

Now that we have frames being created, let’s work on rendering those frames to the screen.

For quite a while, browsers also supported embedded content in the form of plugins like Java applets or Flash. But there were performance, security, and accessibility problems because plugins typically implemented their own rendering, sandboxing, and UI primitives. Over time, new APIs have closed the gap between web-native content and “non-web” plugins,For example, in the last decade the <canvas> element has gained support for hardware-accelerated 3D content, while WebAssembly can run at near-native speed. and plugins have therefore become less common. Personally, I think that’s a good thing: the web is about making information accessible to everyone, and that requires open standards, including for embedded content.

Iframe rendering

Rendering is split between the Tab and its Frames: the Frame does style and layout, while the Tab does accessibility and paint.Why split the rendering pipeline this way? Because the output of accessibility and paint is combined across all frames—a single display list, and a single accessibility tree—while the DOMs and layout trees don’t intermingle. We’ll need to implement that split, and also add code to trigger each Frame’s rendering from the Tab.

Let’s start with splitting the rendering pipeline. The main method here is still the Tab’s render method, which first calls render on each frame to do style and layout:

class Tab:
    def render(self):
        self.measure_render.start_timing()

        for id, frame in self.window_id_to_frame.items():
            frame.render()

        if self.needs_accessibility:
            # ...

        if self.pending_hover:
            # ...

Note that the needs_accessibility, pending_hover, and other flags are all still on the Tab, because they relate to the Tab’s part of rendering. Meanwhile, style and layout happen in the Frame now:

class Frame:
    def __init__(self, tab, parent_frame, frame_element):
        # ...
        self.needs_style = False
        self.needs_layout = False

    def set_needs_render(self):
        self.needs_style = True
        self.tab.set_needs_accessibility()
        self.tab.set_needs_paint()

    def set_needs_layout(self):
        self.needs_layout = True
        self.tab.set_needs_accessibility()
        self.tab.set_needs_paint()

    def render(self):
        if self.needs_style:
            # ...

        if self.needs_layout:
            # ...

Again, these dirty bits move to the Frame because they relate to the frame’s part of rendering.

Yet unlike images, iframes have no intrinsic size–the layout size of an <iframe> element does not depend on its content.There was an attempt to provide iframes with intrinsic sizing in the past, but it was removed from the HTML specification when no browser implemented it. This may change in the future, as there are good use cases for a “seamless” iframe whose layout coordinates with its parent frame. That means there’s a crucial extra bit of communication that needs to happen between the parent and child frames: how wide and tall should a frame be laid out? This is defined by the attributes and CSS of the iframe element:

class BlockLayout:
    # ...
    def recurse(self, node):
        # ...
            elif node.tag == "iframe" and \
                 "src" in node.attributes:
                self.iframe(node)
    # ...
    def iframe(self, node):
        if "width" in self.node.attributes:
            w = device_px(int(self.node.attributes["width"]),
            self.zoom)
        else:
            w = IFRAME_WIDTH_PX + device_px(2, self.zoom)
        self.add_inline_child(node, w, IframeLayout, self.frame)

The IframeLayout layout code is also similar, inheriting from EmbedLayout, but without the aspect ratio code:

class IframeLayout(EmbedLayout):
    def __init__(self, node, parent, previous, parent_frame):
        super().__init__(node, parent, previous, parent_frame)

    def layout(self):
        # ...
        if width_attr:
            self.width = device_px(int(width_attr) + 2, self.zoom)
        else:
            self.width = device_px(IFRAME_WIDTH_PX + 2, self.zoom)

        if height_attr:
            self.height = device_px(int(height_attr) + 2, self.zoom)
        else:
            self.height = device_px(IFRAME_HEIGHT_PX + 2, self.zoom)

Note that if the width isn’t specified, it uses a default value, chosen a long time ago based on the average screen sizes of the day:

IFRAME_WIDTH_PX = 300
IFRAME_HEIGHT_PX = 150

The extra 2 pixels (corrected for zoom, of course) provide room for a border later on.

Now, note that this code is run in the parent frame. We need to get this width and height over to the child frame, so it can know its width and height during layout. So let’s add a field for that in the child frame:

class Frame:
    def __init__(self, tab, parent_frame, frame_element):
        # ...
        self.frame_width = 0
        self.frame_height = 0

And we can set those when the parent frame is laid out:

class IframeLayout(EmbedLayout):
    def layout(self):
        # ...
        if self.node.frame:
            self.node.frame.frame_height = \
                self.height - device_px(2, self.zoom)
            self.node.frame.frame_width = \
                self.width - device_px(2, self.zoom)

The conditional is only there to handle the (unusual) case of an iframe blocked due by CSP.

The root frame, of course, fills the whole window:

class Tab:
    def load(self, url, body=None):
        # ...
        self.root_frame.frame_width = WIDTH
        self.root_frame.frame_height = HEIGHT - CHROME_PX

Note that there’s a tricky dependency order here. We need the parent frame to do layout before the child frame, so the child frame has an up-to-date width and height when it does layout. That order is guaranteed for us by Python (3.7 or later), where dictionaries are sorted by insertion order, but if you’re following along in another language, you might need to sort frames before rendering them.

Alright, we’ve now got frames styled and laid out, and just need to paint them. Unlike layout and style, all the frames in a tab produce a single, unified display list, so we’re going to need to work recursively. We’ll have the Tab paint the root Frame:

class Tab:
    def render(self):
        if self.needs_paint:
            self.display_list = []
            self.root_frame.paint(self.display_list)
            self.needs_paint = False

We’ll then have the Frame call the layout tree’s paint method:

class Frame:
    def paint(self, display_list):
        self.document.paint(display_list)

Most of the layout tree’s paint methods don’t need to change, but to paint an IframeLayout, we’ll need to paint the child frame:

class IframeLayout(EmbedLayout):
    def paint(self, display_list):
        frame_cmds = []

        rect = skia.Rect.MakeLTRB(
            self.x, self.y,
            self.x + self.width, self.y + self.height)
        bgcolor = self.node.style.get("background-color",
                                 "transparent")
        if bgcolor != "transparent":
            radius = device_px(float(
                self.node.style.get("border-radius", "0px")[:-2]),
                self.zoom)
            frame_cmds.append(DrawRRect(rect, radius, bgcolor))

        if self.node.frame:
            self.node.frame.paint(frame_cmds)

Note the last line, where we recursively paint the child frame.

Before putting those commands in the display list, though, we need to add a border, clip content outside of it, and transform the coordinate system:

class IframeLayout(EmbedLayout):
    def paint(self, display_list):
        # ...

        diff = device_px(1, self.zoom)
        offset = (self.x + diff, self.y + diff)
        cmds = [Transform(offset, rect, self.node, frame_cmds)]
        inner_rect = skia.Rect.MakeLTRB(
            self.x + diff, self.y + diff,
            self.x + self.width - diff, self.y + self.height - diff)
        cmds = paint_visual_effects(self.node, cmds, inner_rect)
        paint_outline(self.node, cmds, rect, self.zoom)
        display_list.extend(cmds)

The Transform shifts over the child frame contents so that its top-left corner starts in the right place,This book doesn’t go into the details of the CSS box model, but the width and height attributes of an iframe refer to the content box, and adding the border width yields the border box. Note also that the clip we’re appling is an overflow clip, which is not quite the same as an iframe clip, and the differences have to do with the box model as well. As a result, what we’ve implemented is somewhat incorrect with respect to all of those factors. while paint_outline adds the border and paint_visual_effects clips content outside the viewable area of the iframe. Conveniently, we’ve already implemented all of these features and can simply trigger them from our browser CSS file:Another good reason to delay iframes and images until chapter 15 perhaps?

iframe {
    outline: 1px solid black;
    overflow: clip;
}

Finally, let’s also add iframes to the accessibility tree. Like the display list, the accessibility tree is global across all frames. We can have iframes create iframe nodes:

class AccessibilityNode:
    def __init__(self, node):
        else:
            elif node.tag == "iframe":
                self.role = "iframe"

To build such a node, we just recurse into the frame:

class AccessibilityNode:
   def build_internal(self, child_node):
        if isinstance(child_node, Element) \
            and child_node.tag == "iframe" and child_node.frame:
            child = AccessibilityNode(child_node.frame.nodes)
        # ... 

So we’ve now got iframes showing up on the screen. The next step is interacting with them.

Before iframes, there were the <frameset> and <frame> elements. A <frameset> replaces the <body> tag and splits browser window screen among multiple <frame>s; this was an early alternative layout algorithm to the one presented in this book. Frames had confusing navigation and accessibility, and lacked the flexibility of <iframe>s, so aren’t used much these days.

Iframe input events

Now that we’ve got iframes rendering to the screen, let’s close the loop with user input. We want to add support for clicking on things inside an iframe, and also for tabbing around or scrolling inside one.

At a high level, event handlers just delegate to the root frame:

class Tab:
    def click(self, x, y):
        self.render()
        self.root_frame.click(x, y)

When an iframe is clicked, it passes the click through to the child frame, and immediately return afterwards, because iframes capture click events:

class Frame:
    def click(self, x, y):
        # ...
        while elt:
            # ...
            elif elt.tag == "iframe":
                new_x = x - elt.layout_object.x
                new_y = y - elt.layout_object.y
                elt.frame.click(new_x, new_y)
                return

Now, clicking on <a> elements will work, which means that you can now cause a frame to navigate to a new page. And because a Frame has all the loading and navigation logic that Tab used to have, it just works without any more changes!

You should now be able to load this example. Repeatedly clicking on the link will add another recursive iframe.

Let’s get the other interactions working as well, starting with focusing an element. You can focus on only one element per tab, so we will still store the focus on the Tab, but we’ll need to store the frame the focused element is on too:

class Tab:
    def __init__(self, browser):
        self.focus = None
        self.focused_frame = None

When a frame tries to focus on an element, it sets itself as the focused frame, but before it does that, it needs to un-focus the previously-focused frame:

class Frame:
    def focus_element(self, node):
        # ...
        if self.tab.focused_frame and self.tab.focused_frame != self:
            self.tab.focused_frame.set_needs_render()
        self.tab.focused_frame = self
        # ...

We need to re-render the previously-focused frame frame so that it stops drawing the focus outline.

Another interaction is pressing Tab to cycle through focusable elements in the current frame. Let’s move the advance_tab logic into Frame and just dispatch to it from the Tab:

class Tab:
    def advance_tab(self):
        frame = self.focused_frame or self.root_frame
        frame.advance_tab()

Do the same exact thing for keypress and enter, which are used for interacting with text inputs and buttons.

Another big interaction we need to support is scrolling. We’ll store the scroll offset in each Frame:

class Frame:
    def __init__(self, tab, parent_frame, frame_element):
        self.scroll = 0

Now, as you might recall from Chapter 13, scrolling happens both inside Browser and inside Tab, to reduce latency. That was already quite complicated, so to keep things simple, we won’t support both for non-root iframes. We’ll need a new commit parameter so the browser thread knows whether the root frame is focused:

class CommitData:
    def __init__(self, url, scroll, root_frame_focused, height,
        display_list, composited_updates, accessibility_tree, focus):
        # ...
        self.root_frame_focused = root_frame_focused

class Tab:
    def run_animation_frame(self, scroll):
        root_frame_focused = not self.focused_frame or \
                self.focused_frame == self.root_frame
        # ...
        commit_data = CommitData(
            # ...
            root_frame_focused,
            # ...
        )
        # ...

The Browser thread will save this information in commit and use it when the user requests a scroll:

class Browser:
    def commit(self, tab, data):
        # ...
            self.root_frame_focused = data.root_frame_focused

    def handle_down(self):
        self.lock.acquire(blocking=True)
        if self.root_frame_focused:
            # ...
        active_tab = self.tabs[self.active_tab]
        task = Task(active_tab.scrolldown)
        active_tab.task_runner.schedule_task(task)
        self.lock.release()

When a tab is asked to scroll, it then scrolls the focused frame:

class Tab:
    def scrolldown(self):
        frame = self.focused_frame or self.root_frame
        frame.scrolldown()
        self.set_needs_paint()

There’s one more subtlety to scrolling. After we scroll, we want to clamp the scroll position, to prevent the user scrolling past the last thing on the page. Right now clamp_scroll uses the window height to determine the maximum scroll amount; let’s move that function inside Frame so it can use the current frame’s height:

class Frame:
    def scrolldown(self):
        self.scroll = self.clamp_scroll(self.scroll + SCROLL_STEP)

    def clamp_scroll(self, scroll):
        height = math.ceil(self.document.height)
        maxscroll = height - self.frame_height
        return max(0, min(scroll, maxscroll))

Make sure to use the new clamp_scroll in place of the old one, everywhere in Frame:

class Frame:
    def scroll_to(self, elt):
        # ...
        self.scroll = self.clamp_scroll(new_scroll)

Scroll clamping can also come into play if a layout causes a page’s maximum height to shrink. You’ll need to move the scroll clamping logic out of Tab’s run_animation_frame method and into the Frame’s render to handle this:

class Frame:
    def render(self):
        clamped_scroll = self.clamp_scroll(self.scroll)
        if clamped_scroll != self.scroll:
            self.scroll_changed_in_frame = True
        self.scroll = clamped_scroll

There’s also a set of accessibility hover interactions that we need to support. This is hard, because the accessibility interactions happen in the browser thread, which has limited information:

We’ll make a subclass of AccessibilityNode to store this information:

class FrameAccessibilityNode(AccessibilityNode):
    pass

We’ll create one of those below each iframe node:

class AccessibilityNode:
    def build_internal(self, child_node):
        if isinstance(child_node, Element) \
            and child_node.tag == "iframe" and child_node.frame:
            child = FrameAccessibilityNode(child_node)

Hit testing now has to become recursive, so that FrameAccessibilityNode can adjust for the iframe location:

class AccessibilityNode:
    def hit_test(self, x, y):
        node = None
        if self.intersects(x, y):
            node = self
        for child in self.children:
            res = child.hit_test(x, y)
            if res: node = res
        return node

Hit testing FrameAccessibilityNodes will use the frame’s bounds to ignore clicks outside the frame bounds, and adjust clicks against the frame’s coordinates:

class FrameAccessibilityNode(AccessibilityNode):
    def hit_test(self, x, y):
        if not self.intersects(x, y): return
        new_x = x - self.bounds.x()
        new_y = y - self.bounds.y() + self.scroll
        node = self
        for child in self.children:
            res = child.hit_test(new_x, new_y)
            if res: node = res
        return node

Hit testing should now work, but the bounds of the hovered node when drawn to the screen are still wrong. For that, we’ll need a method that returns the absolute screen rect of an AccessibilityNode. And that method in turn needs parent pointers to walk up the accessibility tree, so let’s add that first:

class AccessibilityNode:
    def __init__(self, node, parent = None):
        # ...
        self.parent = parent

    def build_internal(self, child_node):
        # ...
            child = FrameAccessibilityNode(child_node, self)
        else:
            child = AccessibilityNode(child_node, self)

And now the method to map to absolute coordinates:

class AccessibilityNode:
    def absolute_bounds(self):
        rect = skia.Rect.MakeXYWH(
            self.bounds.x(), self.bounds.y(),
            self.bounds.width(), self.bounds.height())
        obj = self
        while obj:
            obj.map_to_parent(rect)
            obj = obj.parent
        return rect

This method depends on calls map_to_parent to adjust the bounds. For most accessibility nodes we don’t need to do anything, because they are in the same coordinate space as their parent:

class AccessibilityNode:
    def map_to_parent(self, rect):
        pass

A FrameAccessibilityNode, on the other hand, adjusts for the iframe’s position:

class FrameAccessibilityNode(AccessibilityNode):
    def map_to_parent(self, rect):
        rect.offset(self.bounds.x(), self.bounds.y() - self.scroll)

You should now be able to hover on nodes and have them read out by our accessibility subsystem.

Alright, we’ve now got all of our browser’s forms of user interaction properly recursing through the frame tree. It’s time to add more capabilities to iframes.

Our browser can only scroll the root frame on the browser thread, but real browsers have put in a lot of work to make scrolling happen on the browser thread as much as possible, including for iframes. The hard part is handling the many obscure combinations of containing blocks, stacking orders, scroll bars, transforms, and iframes: with scrolling on the browser thread, all of these complex interactions have be communicated from the main thread to the browser thread, and correctly interpreted by both sides.

Iframe scripts

We’ve now got users interacting with iframes—but what about scripts interacting with them? Of course, each frame can already run scripts—but right now, each Frame has its own JSContext, so these scripts can’t really interact with each other. Instead same-origin iframes should run in the same JavaScript context and should be able to access each other’s globals, call each other’s functions, and modify each other’s DOMs. Let’s implement that.

For two frames’ JavaScript environments to interact, we’ll need to put them in the same JSContext. So, instead of each Frame having a JSContext of its own, we’ll want to store JSContexts on the Tab, in a dictionary that maps origins to JS contexts:

class Tab:
    def __init__(self, browser):
        # ...
        self.origin_to_js = {}

    def get_js(self, origin):
        if origin not in self.origin_to_js:
            self.origin_to_js[origin] = JSContext(self, origin)
        return self.origin_to_js[origin]

Each Frame will then ask the Tab for its JavaScript context:

class Frame:
    def load(self, url, body=None):
        # ...
        self.js = self.tab.get_js(url_origin(url))
        # ...

So we’ve got multiple pages’ scripts using one JavaScript context. But now we’ve got to keep their variables in their own namespaces somehow. The key is going to be the window global, of type Window. In the browser, this refers to the global object, and instead of writing a global variable like a, you can always write window.a instead.There are various proposals to expose multiple global namespaces as a JavaScript API. It would definitely be convenient to have that capability in this chapter, to avoid this restriction! To keep our implementation simple, in our browser, scripts will always need to reference variable and functions via window.This also means that all global variables in a script need to do the same, even if they are not browser APIs. We’ll need to do the same in our runtime:

window.console = { log: function(x) { call_python("log", x); } }

// ...

window.Node = function(handle) { this.handle = handle; }

// ...

Do the same for every function or variable in the runtime.js file. If you miss one, you’ll get errors like this:

_dukpy.JSRuntimeError: ReferenceError: identifier 'Node' undefined
    duk_js_var.c:1258
    eval src/pyduktape.c:1 preventsyield

Then you’ll need to go find where you forgot to put window. in front of Node. You’ll also need to modify EVENT_DISPATCH_CODE to prefix classes with window:

EVENT_DISPATCH_CODE = \
    "new window.Node(dukpy.handle)" + \
    ".dispatchEvent(new window.Event(dukpy.type))"

Demos from previous chapters will need to be similarly fixed up before they work. For example, setTimeout might need to change to window.setTimeout.

To get multiple frames’ scripts to play nice inside one JavaScript context, we’ll create multiple Window objects: window_1, window_2, and so on. Before running a frame’s scripts, we’ll set window to that frame’s Window object, so that the script uses the correct Window.Some JavaScript engines support a simple API for changing the global object, but the DukPy library that we’re using isn’t one of them. There is a standard JavaScript operator called with which sort of does this, but the rules are complicated and not quite what we need here. It’s also not recommended these days.

So to begin with, let’s define the Window class when we create a JSContext:

class JSContext:
    def __init__(self, tab, url_origin):
        self.url_origin = url_origin
        # ...
        self.interp.evaljs("function Window(id) { this._id = id };")

Now, when a frame is created and wants to use a JSContext, it needs to ask for a window object to be created first:

class JSContext:
    def add_window(self, frame):
        code = "var window_{} = new Window({});".format(
            frame.window_id, frame.window_id)
        self.interp.evaljs(code)

Before running any JavaScript, we’ll want to change which window the window global refers to:

class JSContext:
    def wrap(self, script, window_id):
        return "window = window_{}; {}".format(window_id, script)

We can use this to, for example, set up the initial runtime environment for each Frame:

class JSContext:
    def add_window(self, frame):
        # ...
        with open("runtime15.js") as f:
            self.interp.evaljs(self.wrap(f.read(), frame.window_id))

We’ll need to call wrap any time we use evaljs, which also means we’ll need to add a window ID argument to a lot of methods. For example, in run we’ll add a window_id parameter:

class JSContext:
    def run(self, script, code, window_id):
        try:
            code = self.wrap(code, window_id)
            print("Script returned: ", self.interp.evaljs(code))
        except dukpy.JSRuntimeError as e:
            print("Script", script, "crashed", e)

And we’ll pass that argument from the load method:

class Frame:
    def load(self, url, body=None):
        for script in scripts:
            # ...
            task = Task(self.js.run, script_url, body,
                self.window_id)
            # ...

The same holds for various dispatching APIs. For example, to dispatch an event, we’ll need the window_id:

class JSContext:
    def dispatch_event(self, type, elt, window_id):
        # ...
        code = self.wrap(EVENT_DISPATCH_CODE, window_id)
        do_default = self.interp.evaljs(code,
            type=type, handle=handle)

Likewise, we’ll need to pass a window ID argument in click, submit_form, and keypress; I’ve omitted those code fragments. Note that you should have modified your runtime.js file to store the LISTENERS on the window object, meaning each Frame will have its own set of event listeners to dispatch to:

window.LISTENERS = {}

// ...


window.Node.prototype.dispatchEvent = function(evt) {
    var type = evt.type;
    var handle = this.handle
    var list = (window.LISTENERS[handle] &&
        window.LISTENERS[handle][type]) || [];
    for (var i = 0; i < list.length; i++) {
        list[i].call(this, evt);
    }
    return evt.do_default;
}

Do the same for requestAnimationFrame, passing around a window ID and wrapping the code so that it correctly references window.

For calls from JavaScript into the browser, we’ll need JavaScript to pass in the window ID it’s calling from:

window.document = { querySelectorAll: function(s) {
    var handles = call_python("querySelectorAll", s, window._id);
    return handles.map(function(h) { return new window.Node(h) });
}}

Then on the browser side we can use that window ID to get the Frame object:

class JSContext:
    def querySelectorAll(self, selector_text, window_id):
        frame = self.tab.window_id_to_frame[window_id]
        selector = CSSParser(selector_text).selector()
        nodes = [node for node
                 in tree_to_list(frame.nodes, [])
                 if selector.matches(node)]
        return [self.get_handle(node) for node in nodes]

We’ll need something similar in innerHTML and style because we need to set_needs_render on the relevant Frame.

Finally, for setTimeout and XMLHttpRequest, which involve a call from JavaScript into the browser and later a call from the browser into JavaScript, we’ll likewise need to pass in a window ID from JavaScript, and use that window ID when calling back into JavaScript.

I’ve omitted many of the code changes in this section because they are quite repetitive. You can find all of the needed locations by searching your codebase for evaljs; once you’ve got scripts working again, let’s make it possible for scripts in different frames to interact.

Same-origin iframes can access each other’s state, but cross-origin ones can’t. But the obscure domain property lets an iframe change its origin, moving itself in or out of same-origin status in some cases. I personally think it’s a misfeature: it’s hard to implement securely, and interferes with various sandboxing techniques; I hope it is eventually removed from the web. Instead, there are various headers where an iframe can opt into less sharing in order to get better security and performance.

Communicating between frames

We’ve now managed to run multiple Frames’ worth of JavaScript in a single JSContext, and isolated them somewhat so that they don’t mess with each others’ state. But the whole point of this exercise is to allow some interaction between same-origin frames. Let’s do that now.

The simplest way two frames can interact is that they can get access to each other’s state via the parent attribute on the Window object. If the two frames have the same origin, that lets one frame calls methods, access variables, and modify browser state for the other frame. Because we’ve had these same-origin frames share a JSContext, this isn’t too hard to implement. Basically, we’ll need a way to go from a window ID to its parent frame’s window ID:

class JSContext:
    # ...
    def parent(self, window_id):
        parent_frame = \
            self.tab.window_id_to_frame[window_id].parent_frame
        if not parent_frame:
            return None
        return parent_frame.window_id

On the JavaScript side, we now need to look up the Window object given its window ID. There are lots of ways you could do this, but the easiest is to have a global map:

class JSContext:
    def __init__(self, tab, url_origin):
        # ...
        self.interp.evaljs("WINDOWS = {}")

We’ll add each window to the global map as it’s created:

class JSContext:
    def add_window(self, frame):
        # ...
        self.interp.evaljs("WINDOWS[{}] = window_{};".format(
            frame.window_id, frame.window_id))

Now window.parent can look up the correct Window object in this global map:

Object.defineProperty(Window.prototype, 'parent', {
  configurable: true,
  get: function() {
    var parent_id = call_python('parent', window._id);
    if (parent_id != undefined) {
        var parent = WINDOWS[parent_id];
        if (parent === undefined) parent = new Window(parent_id);
        return parent;
    }
  }
});

Note that it’s possible for the lookup in WINDOWS to fail, if the parent frame is not in the same origin as the current one and therefore isn’t running in the same JSContext. In that case, this code return a fresh Window object with that id. But iframes are not allowed to access each others’ documents across origins (or call various other APIs that are unsafe), so add a method that checks for this situation and raises an exception:

class JSContext:
    def throw_if_cross_origin(self, frame):
        if url_origin(frame.url) != self.url_origin:
            raise Exception(
                "Cross-origin access disallowed from script")

Then use this method in all JSContext methods that access documents:Note that in a real browser this is woefully inadequate security. A real browser would need to very carefully lock down the entire runtime.js code and audit every single JavaScript API with a fine-toothed comb.

class JSContext:
    def querySelectorAll(self, selector_text, window_id):
        frame = self.tab.window_id_to_frame[window_id]
        self.throw_if_cross_origin(frame)

    def innerHTML_set(self, handle, s, window_id):
        frame = self.tab.window_id_to_frame[window_id]        
        self.throw_if_cross_origin(frame)

    def style_set(self, handle, s, window_id):
        frame = self.tab.window_id_to_frame[window_id]        
        self.throw_if_cross_origin(frame)

So via parent, same-origin iframes can communicate. But what about cross-origin iframes? It would be insecure to let them access each other’s variables or call each other’s methods, so instead browsers allow a form of message passing, a technique for structured communication between two different event loops that doesn’t require any shared state or locks.

Message-passing in JavaScript works like this: you call the postMessage API on the Window object you’d like to talk to, with the message itself as the first parameter and * as the second:The second parameter has to do with origin restrictions; see the exercises.

window.parent.postMessage("...", '*')

This will send the first argumentIn a real browser, you can also pass data that is not a string, such as numbers and objects. It works via a serialization algorithm called structured cloning, which converts most JavaScript objects (though not, for example, DOM nodes) to a sequence of bytes that the receiver frame can convert back into a JavaScript object. DukPy doesn’t support structured cloning natively for objects, so our browser won’t support this either. to the parent frame, which can receive the message by handling the message event on its Window object:

window.addEventListener("message", function(e) {
    console.log(e.data);
});

Note that in this second code snippet, window is the receiving Window, a different Window from the window in the first snippet.

Let’s implement postMessage, starting on the receiver side. Since this event happens on the Window, not on a Node, we’ll need a new WINDOW_LISTENERS array:

    window.WINDOW_LISTENERS = {}

Each listener will be called with a MessageEvent object:

window.MessageEvent = function(data) {
    this.type = "message";
    this.data = data;
}

The event listener and dispatching code is the same as for Node, except it’s on Window and uses WINDOW_LISTENERS. You can just duplicate those methods:

Window.prototype.addEventListener = function(type, listener) {
    // ...
}

Window.prototype.dispatchEvent = function(evt) {
    // ...
}

That’s everything on the receiver side; now let’s do the sender side. First, let’s implement the postMessage API itself. Note that this is the receiver or target window:

Window.prototype.postMessage = function(message, origin) {
    call_python("postMessage", this._id, message, origin)
}

In the browser, postMessage schedules a task on the Tab:

class JSContext:
    def postMessage(self, target_window_id, message, origin):
        task = Task(self.tab.post_message,
            message, target_window_id)
        self.tab.task_runner.schedule_task(task)

Scheduling the task is necessary because postMessage is an asynchronous API; sending a synchronous message might involve synchronizing multiple JSContexts or even multiple processes, which would add a lot of overhead and probably result in deadlocks.

The task finds the target frame and call a dispatch method:

class Tab:
    def post_message(self, message, target_window_id):
        frame = self.window_id_to_frame[target_window_id]
        frame.js.dispatch_post_message(
            message, target_window_id)

Which then calls into the JavaScript dispatchEvent method we just wrote:

POST_MESSAGE_DISPATCH_CODE = \
    "window.dispatchEvent(new window.MessageEvent(dukpy.data))"

class JSContext:
    def dispatch_post_message(self, message, window_id):
        self.interp.evaljs(
            self.wrap(POST_MESSAGE_DISPATCH_CODE, window_id),
            data=message)

You should now be able to use postMessage to send messages between frames,In this demo, for example, you should see “Message received from iframe: This is the contents of postMessage.” printed to the console. (This particular example uses a same-origin postMessage. You can test cross-origin locally by starting two local HTTP servers on different ports, then changing the URL of the example15-img.html iframe document to point to the second port.) including cross-origin frames running in different JSContexts, in a secure way.

Ads are commonly served with iframes and are big users of the web’s sandboxing, embedding, and animation primitives. This means they are a challenging source of performance and user experience problems. For example, ad analytics are important to the ad economy, but involve running a lot of code and measuring lots of data. Some web APIs, such as Intersection Observer, basically exist to make analytics computations more efficient. And, of course, the most popular browser extensions are probably ad blockers.

Isolation and timing

Iframes add a whole new layer of security challenges atop what we discussed in Chapter 10. The power to embed one web page into another creates a commensurate security risk when the two pages don’t trust each other—both in the case of embedding an untrusted page into your own page, and the reverse, where an attacker embeds your page into their own, malicious one. In both cases, we want to protect your page from any security or privacy risks caused by the other frame.

The starting point is that cross-origin iframes can’t access each other directly through JavaScript. That’s good—but what if a bug in the JavaScript engine, like a buffer overrun, lets an iframe circumvent those protections? Unfortunately, bugs like this are common enough that browsers have to defend against them. For example, browsers these days run frames from different origins in different operating system processes, and use operating system features to limit how much access those processes have.

Other parts of the browser mix content from multiple frames, like our browser’s Tab-wide display list. That means that a bug in the rasterizer could allow one frame to take over the rasterizer and then read data that ultimately came from another frame. This might seem like a rather complex attack, but it’s worth defending against, so modern browsers use sandboxing techniques to prevent it. For example, Chromium can place the rasterizer in its own process and use a Linux feature called seccomp to limit what system calls that process can make. Even if a bug compromised the rasterizer, that rasterizer wouldn’t be able to exfiltrate data over the network, preventing private date from leaking.

These isolation and sandboxing features may seem “straightforward”, in the same sense that the browser thread we added in Chapter 13 is “straightforward”. In practice, the many browser APIs mean the implementation is full of subtleties and ends up being extremely complex. Chromium, for example, took many years to ship the first implementation of site isolation.

Site isolation has become much more important recent years, due to the CPU cache timing attacks called spectre and meltdown. In short, these attacks allow an attacker to read arbitrary locations in memory—including another frame’s data, if the two frames are in the same process—by measuring the time certain operations take. Placing sensitive content in different CPU processes (which come with their own memory address spaces) is a good protection against these attacks.

That said, these kinds of timing attacks can be subtle, and there are doubtless more that haven’t been discovered yet. To try to dull this threat, browsers currently prevent access to high-precision timers that can provide the accurate timing data typically required for timing attacks. For example, browsers reduce the accuracy of APIs like Date.now or setTimeout.

Worse yet, there are browser APIs that don’t seem like timers but can be used as such.For example, the SharedArrayBuffer API lets two JavaScript threads run concurrently and share memory, which can be used to construct a clock. These API are useful, so browsers don’t quite want to remove it, but there is also no way to make it “less accurate”, since it’s not primarily a clock anyway. Browsers now require certain optional HTTP headers to be present in the parent and child frames’ HTTP responses in order to allow use of SharedArrayBuffer, though this is not a perfect solution.

The SharedArrayBuffer issue caused problems when I added JavaScript support to the embedded browser widgets on this website. I was using SharedArrayBuffer to allow synchronous calls from a JSContext to the browser, and that required APIs that browsers restrict for security reasons. Setting the security headers wouldn’t work, because Chapter 14 embeds a Youtube video, and YouTube doesn’t send those headers. In the end, I worked around the issue by not embedding the browser widget and asking the reader to open a new browser window.

Summary

This chapter introduced how the browser handles embedded content use cases like images and iframes. Reiterating the main points:

And as we hope you saw in this chapter, none of these features are too difficult to implement, though—as you’ll see in the exercises below—implementing them well requires a lot of attention to detail.

Outline

The complete set of functions, classes, and methods in our browser should now look something like this:

def print_tree(node, indent) class Text: def __init__(text, parent) def __repr__() class Element: def __init__(tag, attributes, parent) def __repr__() def resolve_url(url, current) def tree_to_list(tree, list) INHERITED_PROPERTIES def layout_mode(node) COOKIE_JAR def url_origin(url) def draw_text(canvas, x, y, text, font, color) def get_font(size, weight, style) def linespace(font) def parse_blend_mode(blend_mode_str) CHROME_PX SCROLL_STEP class MeasureTime: def __init__(name) def start_timing() def stop_timing() def text() def diff_styles(old_style, new_style) class CompositedLayer: def __init__(skia_context, display_item) def can_merge(display_item) def add(display_item) def composited_bounds() def absolute_bounds() def raster() def __repr__() def absolute_bounds(display_item) def absolute_bounds_for_obj(obj) class DrawCompositedLayer: def __init__(composited_layer) def execute(canvas) def __repr__() class Task: def __init__(task_code) def run() class TaskRunner: def __init__(tab) def schedule_task(task) def set_needs_quit() def clear_pending_tasks() def start_thread() def run() def handle_quit() class SingleThreadedTaskRunner: def __init__(tab) def schedule_task(callback) def run_tasks() def clear_pending_tasks() def start_thread() def set_needs_quit() def run() def clamp_scroll(scroll, tab_height) def add_parent_pointers(nodes, parent) class DisplayItem: def __init__(rect, children, node) def is_paint_command() def map(rect) def add_composited_bounds(rect) class DrawText: def __init__(x1, y1, text, font, color) def is_paint_command() def execute(canvas) def __repr__() class DrawLine: def __init__(x1, y1, x2, y2) def is_paint_command() def execute(canvas) def __repr__() def paint_visual_effects(node, cmds, rect) WIDTH HEIGHT INPUT_WIDTH_PX REFRESH_RATE_SEC HSTEP VSTEP SETTIMEOUT_CODE XHR_ONLOAD_CODE class Transform: def __init__(translation, rect, node, children) def execute(canvas) def map(rect) def clone(children) def __repr__() ANIMATED_PROPERTIES class SaveLayer: def __init__(sk_paint, node, children, should_save) def execute(canvas) def clone(children) def __repr__() def parse_color(color) def draw_rect(canvas, l, t, r, b, fill_color, border_color, width) class DrawRRect: def __init__(rect, radius, color) def is_paint_command() def execute(canvas) def print(indent) def __repr__() def is_focused(node) def paint_outline(node, cmds, rect, zoom) def has_outline(node) def device_px(css_px, zoom) def cascade_priority(rule) def style(node, rules, tab) def is_focusable(node) def get_tabindex(node) def announce_text(node, role) def speak_text(text) class CSSParser: def __init__(s, internal) def whitespace() def literal(literal) def word() def until_char(chars) def pair(until) def ignore_until(chars) def body() def simple_selector() def selector() def media_query() def parse() class DrawOutline: def __init__(rect, color, thickness) def is_paint_command() def execute(canvas) def __repr__() def main_func(args) class Browser: def __init__() def render() def commit(tab, data) def set_needs_animation_frame(tab) def set_needs_raster() def set_needs_composite() def set_needs_accessibility() def set_needs_draw() def composite() def clone_latest(visual_effect, current_effect) def paint_draw_list() def update_accessibility() def composite_raster_and_draw() def schedule_animation_frame() def handle_down() def handle_tab() def focus_addressbar() def clear_data() def set_active_tab(index) def go_back() def cycle_tabs() def toggle_accessibility() def speak_node(node, text) def speak_document() def toggle_mute() def is_muted() def toggle_dark_mode() def handle_click(e) def handle_hover(event) def handle_key(char) def schedule_load(url, body) def handle_enter() def increment_zoom(increment) def reset_zoom() def load(url) def load_internal(url) def raster_tab() def raster_chrome() def draw() def handle_quit() def request(url, top_level_url, payload) class DrawImage: def __init__(image, rect, quality) def execute(canvas) def __repr__() class DocumentLayout: def __init__(node, frame) def layout(width, zoom) def paint(display_list, dark_mode, scroll) def __repr__() def font(style, zoom) class BlockLayout: def __init__(node, parent, previous, frame) def layout() def recurse(node) def new_line() def add_inline_child(node, w, child_class, frame, word) def text(node) def input(node) def image(node) def iframe(node) def paint(display_list) def __repr__() class EmbedLayout: def __init__(node, parent, previous, frame) def get_ascent(font_multiplier) def get_descent(font_multiplier) def layout() class InputLayout: def __init__(node, parent, previous, frame) def layout() def paint(display_list) def __repr__() class LineLayout: def __init__(node, parent, previous) def layout() def paint(display_list) def role() def __repr__() class TextLayout: def __init__(node, parent, previous, word) def get_ascent(font_multiplier) def get_descent(font_multiplier) def layout() def paint(display_list) def rect() def __repr__() def filter_quality(node) class ImageLayout: def __init__(node, parent, previous, frame) def layout() def paint(display_list) def __repr__() IFRAME_WIDTH_PX IFRAME_HEIGHT_PX class IframeLayout: def __init__(node, parent, previous, parent_frame) def layout() def paint(display_list) def __repr__() class AttributeParser: def __init__(s) def whitespace() def literal(literal) def word(allow_quotes) def parse() class HTMLParser: def __init__(body) def parse() def get_attributes(text) def add_text(text) SELF_CLOSING_TAGS def add_tag(tag) HEAD_TAGS def implicit_tags(tag) def finish() INTERNAL_ACCESSIBILITY_HOVER EVENT_DISPATCH_CODE POST_MESSAGE_DISPATCH_CODE class JSContext: def __init__(tab, url_origin) def throw_if_cross_origin(frame) def add_window(frame) def wrap(script, window_id) def run(script, code, window_id) def dispatch_event(type, elt, window_id) def get_handle(elt) def querySelectorAll(selector_text, window_id) def getAttribute(handle, attr) def parent(window_id) def dispatch_post_message(message, window_id) def postMessage(target_window_id, message, origin) def innerHTML_set(handle, s, window_id) def style_set(handle, s, window_id) def dispatch_settimeout(handle, window_id) def setTimeout(handle, time, window_id) def dispatch_xhr_onload(out, handle, window_id) def XMLHttpRequest_send(method, url, body, isasync, handle, window_id) def now() def dispatch_RAF(window_id) def requestAnimationFrame() class AccessibilityNode: def __init__(node, parent) def build() def build_internal(child_node) def intersects(x, y) def hit_test(x, y) def map_to_parent(rect) def absolute_bounds() def __repr__() class FrameAccessibilityNode: def __init__(node, parent) def build() def hit_test(x, y) def map_to_parent(rect) def __repr__() BROKEN_IMAGE class Frame: def __init__(tab, parent_frame, frame_element) def set_needs_render() def set_needs_layout() def allowed_request(url) def load(url, body) def render() def paint(display_list) def advance_tab() def focus_element(node) def activate_element(elt) def submit_form(elt) def keypress(char) def scrolldown() def scroll_to(elt) def click(x, y) def clamp_scroll(scroll) class CommitData: def __init__(url, scroll, root_frame_focused, height, display_list, composited_updates, accessibility_tree, focus) class Tab: def __init__(browser) def load(url, body) def get_js(origin) def set_needs_render_all_frames() def set_needs_accessibility() def set_needs_paint() def request_animation_frame_callback() def run_animation_frame(scroll) def render() def click(x, y) def keypress(char) def scrolldown() def enter() def get_tabindex(node) def advance_tab() def zoom_by(increment) def reset_zoom() def go_back() def toggle_accessibility() def toggle_dark_mode() def post_message(message, target_window_id) def draw_line(canvas, x1, y1, x2, y2, color) def add_main_args() if __name__ == "__main__"

Exercises

Canvas element: Implement the <canvas> element, the 2D aspect of the getContext API, and some of the drawing commands on CanvasRenderingContext2D. Canvas layout is just like an iframe, including its default width and height. You should allocate a Skia canvas of an appropriate size when getContext("2d") is called, and implement some of the APIs that draw to the canvas.Note that once JavaScript draws to a canvas, the drawing persists forever until reset or similar is called. This allows a web developer to build up a display list with a sequence of commands, but also places the burden on them to decide when to do so, and also when to clear it when needed. This approach is called an immediate mode of rendering—as opposed to the retained mode used by HTML, which does not have this complexity for developers. (Instead, the complexity is borne by the browser.) It should be straightforward to translate most API methods to their Skia equivalent.

Background images: Elements can have a background-image. Implement the basics of this CSS property: a url(...) value for the background-image property. Avoid loading the image if the background-image property does not actually end up used on any element. For a bigger challenge, also allow the web page set the size of the background image with the background-size CSS property.

Object-fit: implement the object-fit CSS property. It determines how the image within an <img> element is sized relative to its container element.

Iframe aspect ratio. Implement the aspect-ratio CSS property and use it to provide an implicit sizing to iframes and images when only one of width or height is specified (or when the image is not yet loaded, if you did the lazy loading exercise).

Lazy loading: Even encoded images can be quite large.In the early days of the web, computer networks were slow enough that browsers had a user setting to disable downloading of images until the user expressly asked for them. Add support for the loading attribute on img elements. Your browser should only download images if they are close to the visible area of the page. This kind of optimization is generally called lazy loading. Implement a second optimization in your browser that only renders images that are within a certain number of pixels of the being visible on the screen.

Image placeholders: Building on top of lazy loading, implement placeholder styling of images that haven’t loaded yet. This is done by setting a 0x0 sizing, unless width or height is specified. Also add support for hiding the “broken image” if the alt attribute is missing or empty.That’s because if alt text is provided, the browser can assume the image is important to the meaning of the website, and so it should tell the user that they are missing out on some of the content if it fails to load. But otherwise, the broken image icon is probably just ugly clutter.

Media queries. Implement the width media query. Make sure it works inside iframes. Also make sure it works even when the width of an iframe is changed by its parent frame.

Target origin for postMessage: Implement the targetOrigin parameter to postMessage. This parameter is a string which indicates the frame origins that are allowed to receive the message.

Multi-frame focus: in our toy browser, pressing Tab cycles through the elements in the focused frame. But means it’s impossible to access focusable elements in other frames via the keyboard alone. Fix it to move between frames after iterating through all focusable elements in one frame.

Iframe history: Ensure that iframes affect browser history. For example, if you click on a link inside an iframe, and then hit back button, it should go back inside the iframe. Make sure that this works even when the user clicks links in multiple frames in various orders.It’s debatable whether this is a good feature of iframes, as it causes a lot of confusion for web developers who embed iframes they don’t plan on navigating.

Iframes under transforms: painting an iframe that has a CSS transform on it or an ancestor should already work, but event targeting for clicks doesn’t work, because click doesn’t account for that transform. Fix this. Also make sure that accessibility handles iframes under transform correctly in all cases.

Iframes added or removed by script: the innerHTML API can cause iframes to be added or removed, but our browser doesn’t load or unload them when this happens. Fix this: new iframes should be loaded and old ones unloaded.

Did you find this chapter useful?