Web Browser Engineering

Pavel Panchekha & Chris Harrelson; one page edition

Twitter · Blog · Patreon · Discussions

Preface

A computer science degree traditionally includes courses in operating systems, compilers, and databases that replace mystery with code. These courses transform Linux, Postgres, and LLVM into improvements, additions, and optimizations of an understandable core architecture. The lesson transcends the specific system studied: all computer systems, no matter how big and seemingly complex, can be studied and understood.

But web browsers are still opaque, not just to students but to industry programmers and even to researchers. This book dissipates that mystery by systematically explaining all major components of a modern web browser.

Reading This Book

Parts 1–3 of this book construct a basic browser weighing in at around 1000 lines of code, twice that after exercises. The average chapter takes 4–6 hours to read, implement, and debug for someone with a few years’ programming experience. Part 4 of this book covers advanced topics; those chapters are longer and have more code. The final browser weighs in at about 3000 lines.

Your browserThis book assumes that you will be building a web browser along the way while reading it. However, it does present nearly all the code—inlined into the book—for a working browser for every chapter. So most of the time, the book uses the term “our browser”, which refers to the conceptual browser we (you and us, the authors) have built so far. In cases where the book is referring specifically to the implementation you have built, the book says “your browser”. will “work” at each step of the way, and every chapter will build upon the last.This idea is from J. R. Wilcox, inspired in turn by S. Zdancewic’s course on compilers. That way, you will also practice growing and improving complex software. If you feel particularly interested in some component, please do flesh it out, complete the exercises, and add missing features. We’ve tried to arrange it so that this doesn’t make later chapters more difficult.

The code in this book uses Python 3, and we recommend you follow along in the same. When the book shows Python command lines, it calls the Python binary python3.This is for clarity. On some operating systems, python means Python 3, but on others that means Python 2. Check which version you have! That said, the text avoids dependencies where possible and you can try to follow along in another language. Make sure your language has libraries for TLS connections (Python has one built in), graphics (the text uses Tk, Skia, and SDL), and JavaScript evaluation (the text uses DukPy).

This book’s browser is irreverent toward standards: it handles only a sliver of the full HTML, CSS, and JavaScript languages, mishandles errors, and isn’t resilient to malicious inputs. It is also quite slow. Despite that, its architecture matches that of real browsers, providing insight into those 10 million line of code behemoths.

That said, we’ve tried to explicitly note when the book’s browser simplifies or diverges from standards. If you’re not sure how your browser should behave in some edge case, fire up your favorite web browser and try it out.

Acknowledgments

We’d like to recognize the countless people who built the web and the various web browsers. They are wonders of the modern world. Thank you! We learned a lot from the books and articles listed in this book’s bibliography—thank you to their authors. And we’re especially grateful to the many contributors to articles on Wikipedia (especially those on historic software, formats, and protocols). We are grateful for this amazing resource, one which in turn was made possible by the very thing this book is about.

Pavel: James R. Wilcox and I dreamed up this book during a late-night chat at ICFP 2018. Max Willsey proofread and helped sequence the chapters. Zach Tatlock encouraged me to develop the book into a course. And the students of CS 6968, CS 4962, and CS 4560 at the University of Utah found countless errors and suggested important simplifications. I am thankful to all of them. Most of all, I am thankful to my wife Sara, who supported my writing and gave me the strength to finish this many-year-long project.

Chris: I am eternally grateful to my wife Sara for patiently listening to my endless musings about the web, and encouraging me to turn my idea for a browser book into reality. I am also grateful to Dan Gildea for providing feedback on my browser-book concept on multiple occasions. Finally, I’m grateful to Pavel for doing the hard work of getting this project off the ground and allowing me to join the adventure. (Turns out Pavel and I had the same idea!)

A final note

This book is, and will remain, a work in progress. Please leave comments and mark typos; the book has built-in feedback tools, which you can enable with Ctrl-E (or Cmd-E on a Mac). The full source code is also available on GitHub, though we prefer to receive comments through the built-in tools.

Browsers and the Web

I—this is Chris speaking—have known the webBroadly defined, the web is the interlinked network (“web”) of web pages on the internet. If you’ve never made a web page, I recommend MDN’s Learn Web Development series, especially the Getting Started guide. This book will be easier to read if you’re familiar with the core technologies. for all of my adult life. The web for me is something of a technological companion, and I’ve never been far from it in my studies or my work. Perhaps it’s been the same for you. And using the web means using a browser. I hope, as you read this book, that you fall in love with web browsers, just like I did.

The Browser and Me

Since I first encountered the web and its predecessors,For me, bulletin board systems (BBSs) over a dial-up modem connection. A BBS, like a browser, is a window into dynamic content somewhere else on the internet. in the early 1990s, I’ve been fascinated by browsers and the concept of networked user interfaces. When I surfed the web, even in its earliest form, I felt I was seeing the future of computing. In some ways, the web and I grew together—for example, 1994, the year the web went commercial, was the same year I started college; while there I spent a fair amount of time surfing the web, and by the time I graduated in 1999, the browser had fueled the famous dot-com speculation gold rush. Not only that, but the company for which I now work, Google, is a child of the web and was founded during that time.

In my freshman year at college, I attended a presentation by a RedHat salesman. The presentation was of course aimed at selling RedHat Linux, probably calling it the “operating system of the future” and speculating about the “year of the Linux desktop”. But when asked about challenges RedHat faced, the salesman mentioned not Linux but the web: he said that someone “needs to make a good browser for Linux”.Netscape Navigator was available for Linux at that time, but it wasn’t viewed as especially fast or featureful compared to its implementation on other operating systems. Even back then, in the first years of the web, the browser was already a necessary component of every computer. He even threw out a challenge: “How hard could it be to build a better browser?” Indeed, how hard could it be? What makes it so hard? That question stuck with me for a long time.Meanwhile, the “better Linux browser than Netscape” took a long time to appear…

How hard indeed! After eleven years in the trenches working on Chrome, I now know the answer to his question: building a browser is both easy and incredibly hard, both intentional and accidental. And everywhere you look, you see the evolution and history of the web wrapped up in one codebase. It’s fun and endlessly interesting.

So that’s how I fell in love with web browsers. Now let me tell you why you will, too.

The Web in History

The web is a grand, crazy experiment. It’s natural, nowadays, to watch videos, read news, and connect with friends on the web. That can make the web seem simple and obvious, finished, already built. But the web is neither simple nor obvious (and is certainly not finished). It is the result of experiments and research, reaching back to nearly the beginning of computing,And the web also needed rich computer displays, powerful user-interface-building libraries, fast networks, and sufficient computing power and information storage capacity. As so often happens with technology, the web had many similar predecessors, but only took its modern form once all the pieces came together. about how to help people connect and learn from each other.

In the early days, the internet was a world-wide network of computers, largely at universities, labs, and major corporations, linked by physical cables and communicating over application-specific protocols. The (very) early web mostly built on this foundation. Web pages were files in a specific format stored on specific computers. The addresses for web pages named the computer and the file, and early servers did little besides read files from a disk. The logical structure of the web mirrored its physical structure.

A lot has changed. The HyperText Markup Language (HTML) for web pages is now usually dynamically assembled on the fly“Server-side rendering” is the process of assembling HTML on the server when loading a web page. Server-side rendering can use web technologies like JavaScript and even headless browsers. Yet one more place browsers are taking over! and sent on demand to your browser. The pieces being assembled are themselves filled with dynamic content—news, inbox contents, and advertisements adjusted to your particular tastes. Even the addresses no longer identify a specific computer—content distribution networks route requests to any of thousands of computers all around the world. At a higher level, most web pages are served not from someone’s home computerPeople actually did this! And when their website became popular, it often ran out of bandwidth or computing power and became inaccessible. but from a major corporation’s social media platform or cloud computing service.

With all that’s changed, some things have stayed the same, the core building blocks that are the essence of the web:

As a philosophical matter, perhaps one or another of these principles is secondary. One could try to distinguish between the networking and rendering aspects of the web. One could abstract linking and networking from the particular choice of protocol and data format. One could ask whether the browser is necessary in theory, or argue that HTTP, URLs, and hyperlinking are the only truly essential parts of the web.

Perhaps.It is indeed true that one or more of the implementation choices could be replaced, and perhaps that will happen over time. For example, JavaScript might eventually be replaced by another language or technology, HTTP by some other protocol, or HTML by a successor. Yet the web will stay the web, because any successor format is sure to support a superset of functionality, and have the same fundamental structure. The web is, after all, an experiment; the core technologies evolve and grow. But the web is not an accident; its original design reflects truths not just about computing, but about how human beings can connect and interact. The web not only survived but thrived during the virtualization of hosting and content, specifically due to the elegance and effectiveness of this original design.

The key thing to understand is that this grand experiment is not over. The essence of the web will stay, but by building web browsers you have the chance to shape its future.

Real Browser Codebases

So let me tell you what it’s like to contribute to a browser. Some time during my first few months of working on Chrome, I came across the code implementing the<br> tag—look at that, the good old <br> tag, which I’ve used many times to insert newlines into web pages! And the implementation turns out to be barely any code at all, both in Chrome and in this book’s simple browser.

But Chrome as a whole—its features, speed, security, reliability—wow. Thousands of person-years went into it. There is constant pressure to do more—to add more features, to improve performance, to keep up with the “web ecosystem”—for the thousands of businesses, millions of developers,I usually prefer “engineer”—hence the title of this book—but “developer” or “web developer” is much more common on the web. One important reason is that anyone can build a web page—not just trained software engineers and computer scientists. “Web developer” also is more inclusive of additional, critical roles like designers, authors, editors, and photographers. A web developer is anyone who makes web pages, regardless of how. and billions of users on the web.

Working on such a codebase can feel daunting. I often find lines of code last touched 15 years ago by someone I’ve never met; or even now discover files and classes that I never knew existed; or see lines of code that don’t look necessary, yet turn out to be important. What does that 15-year-old code do? What is the purpose of these new-to-me files? Is that code there for a reason?

Every browser has thousands of unfixed bugs, from the smallest of mistakes to myriad mix ups and mismatches. Every browser must be endlessly tuned and optimized to squeeze out that last bit of performance. Every browser requires painstaking work to continuously refactor the code to reduce its complexity, often through the carefulBrowsers are so performance-sensitive that, in many places, merely the introduction of an abstraction—a function call or branching overhead—can have an unacceptable performance cost! introduction of modularization and abstraction.

What makes a browser different from most massive code bases is their urgency. Browsers are nearly as old as any “legacy” codebase, but are not legacy, not abandoned or half-deprecated, not slated for replacement. On the contrary, they are vital to the world’s economy. Browser engineers must therefore fix and improve rather than abandon and replace. And since the character of the web itself is highly decentralized, the use cases met by browsers are to a significant extent not determined by the companies “owning” or “controlling” a particular browser. Other people—including you—can and do contribute ideas, proposals, and implementations.

What’s amazing is that, despite the scale and the pace and the complexity, there is still plenty of room to contribute. Every browser today is open source, which opens up its implementation to the whole community of web developers. Browsers evolve like giant research projects, where new ideas are constantly being proposed and tested out. As you would expect, some features fail and some succeed. The ones that succeed end up in specifications and are implemented by other browsers. Every web browser is open to contributions—whether fixing bugs or proposing new features or implementing promising optimizations.

And it’s worth contributing, because working on web browsers is a lot of fun.

Browser Code Concepts

HTML and CSS are meant to be black boxes—declarative application programming interfaces (APIs)—where one specifies what outcome to achieve, and the browser itself is responsible for figuring out how to achieve it. Web developers don’t, and mostly can’t, draw their web pages’ pixels on their own.

That can make the browser magical or frustrating—depending on whether it is doing the right thing! But that also makes a browser a pretty unusual piece of software, with unique challenges, interesting algorithms, and clever optimizations. Browsers are worth studying for the pure pleasure of it.

What makes that all work is the web browser’s implementations of inversion of control, constraint programming, and declarative programming. The web inverts control, with an intermediary—the browser—handling most of the rendering, and the web developer specifying rendering parameters and content to this intermediary. For example, in HTML there are many built-in form control elements that take care of the various ways the user of a web page can provide input. The developer need only specify parameters such as button names, sizing, and look-and-feel, or JavaScript extension points to handle form submission to the server. The rest of the implementation is taken care of by the browser. Further, these parameters usually take the form of constraints between the relative sizes and positions of on-screen elements instead of specifying their values directly;Constraint programming is clearest during web page layout, where font and window sizes, desired positions and sizes, and the relative arrangement of widgets is rarely specified directly. the browser solves the constraints to find those values. The same idea applies for actions: web pages mostly require that actions take place without specifying when they do. This declarative style means that from the point of view of a developer, changes “apply immediately”, but under the hood, the browser can be lazy and delay applying the changes until they become externally visible, either due to subsequent API calls or because the page has to be displayed to the user.For example, when exactly does the browser compute HTML element styles? Any change to the styles is visible to all subsequent API calls, so in that sense it applies “immediately”. But it is better for the browser to delay style recalculation, avoiding redundant work if styles change twice in quick succession. Maximally exploiting the opportunities afforded by declarative programming makes real-world browsers very complex.

There are practical reasons for the unusual design of a browser. Yes, developers lose some control and agency—when pixels are wrong, developers cannot fix them directly.Loss of control is not necessarily specific to the web—much of computing these days relies on mountains of other people’s code. But they gain the ability to deploy content on the web without worrying about the details, to make that content instantly available on almost every computing device in existence, and to keep it accessible in the future, mostly avoiding software’s inevitable obsolescence.

To me, browsers are where algorithms come to life. A browser contains a rendering engine more complex and powerful than any computer game; a full networking stack; clever data structures and parallel programming techniques; a virtual machine, an interpreted language, and a just-in-time compiler; a world-class security sandbox; and a uniquely dynamic system for storing data.

And the truth is—you use a browser all the time, maybe for reading this book! That makes the algorithms more approachable in a browser than almost anywhere else, because the web is already familiar.

The Role of the Browser

The web is at the center of modern computing. Every year the web expands its reach to more and more of what we do with computers. It now goes far beyond its original use for document-based information sharing: many people now spend their entire day in a browser, not using a single other application! Moreover, desktop applications are now often built and delivered as web apps: web pages loaded by a browser but used like installed applications.Related to the notion of a web app is a Progressive Web App, which is a web app that becomes indistinguishable from a native app through progressive enhancement. Even on mobile devices, apps often embed a browser to render parts of the application user interface (UI).The fraction of such “hybrid” apps that are shown via a “web view” is likely increasing over time. In some markets like China, “super-apps” act like a mobile web browser for web-view-based games and widgets. Perhaps in the future both desktop and mobile devices will largely be containers for web apps. Already, browsers are a critical and indispensable part of computing.

So given this centrality, it’s worth knowing how the web works. And in particular, it’s worth focusing on the browser, which is the user agentThe user agent concept views a computer, or software within the computer, as a trusted assistant and advocate of the human user. and the mediator of the web’s interactions, which ultimately is what makes the web’s principles real. The browser is also the implementer of the web: its sandbox keeps web browsing safe; its algorithms implement the declarative document model; its UI navigates links. Web pages load fast and react smoothly only when the browser is hyper-efficient.

Browsers and You

This book explains how to build a simple browser, one that can—despite its simplicity—display interesting-looking web pages and support many interesting behaviors. As you’ll see, it’s surprisingly easy, and it demonstrates all the core concepts you need to understand a real-world browser. The browser stops being a mystery when it becomes code.

The intention is for you to build your own browser as you work through the early chapters. Once it is up and running, there are endless opportunities to improve performance or add features, some of which are suggested as exercises. Many of these exercises are features implemented in real browsers, and I encourage you to try them—adding features is one of the best parts of browser development!

The book then moves on to details and advanced features that flesh out the architecture of a real browser’s rendering engine, based on my experiences with Chrome. After finishing the book, you should be able to dig into the source code of Chromium, Gecko, or WebKit and understand it without too much trouble.

I hope the book lets you appreciate a browser’s depth, complexity, and power. I hope the book passes along a browser’s beauty—its clever algorithms and data structures, its co-evolution with the culture and history of computing, its centrality in our world. But most of all, I hope the book lets you see in yourself someone building the browser of the future.

History of the Web

This chapter dives into the history of the web itself: where it came from, and how the web and browsers have evolved to date. This history is not exhaustive;For example, there is nothing much about Standard Generalized Markup Language ( SGML) or other predecessors to HTML. (Except in this footnote!) the focus is the key events and ideas that led to the web, and the goals and motivations of its inventors.

The Memex Concept

Figure 1: The original publication of “As We May Think”. (Dunkoman from Wikipedia, CC BY 2.0.)

An influential early exploration of how computers might revolutionize information is a 1945 essay by Vannevar Bush entitled “As We May Think”. This essay envisioned a machine called a memex that helps an individual human see and explore all the information in the world (see Figure 1). It was described in terms of the microfilm screen technology of the time, but its purpose and concept has some clear similarities to the web as we know it today, even if the user interface and technology details differ.

The web is, at its core, organized around the Memex-like goal of representing and displaying information, providing a way for humans to effectively learn and explore. The collective knowledge and wisdom of the species long ago exceeded the capacity of a single mind, organization, library, country, culture, group or language. However, while we as humans cannot possibly know even a tiny fraction of what it is possible to know, we can use technology to learn more efficiently than before, and, in particular, to quickly access information we need to learn, remember, or recall. Consider this imagined research session described by Vannevar Bush—one that is remarkably similar to how we sometimes use the web:

The owner of the memex, let us say, is interested in the origin and properties of the bow and arrow. […] He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items.

Computers, and the internet, allow us to process and store the information we want. But it is the web that helps us organize and find that information, that knowledge, making it useful.Google’s well-known mission statement to “organize the world’s information and make it universally accessible and useful” is almost exactly the same. This is not a coincidence—the search engine concept is inherently connected to the web, and was inspired by the design of the web and its antecedents.

“As We May Think” highlighted two features of the memex: information record lookup, and associations between related records. In fact, the essay emphasizes the importance of the latter—we learn by making previously unknown connections between known things:

When data of any sort are placed in storage, they are filed alphabetically or numerically. […] The human mind does not work that way. It operates by association.

By “association”, Bush meant a trail of thought leading from one record to the next via a human-curated link. He imagined not just a universal library, but a universal way to record the results of what we learn.

The Web Emerges

The concept of hypertext documents linked by hyperlinks was invented in 1964–65 by Project Xanadu, led by Ted Nelson.He was inspired by the long tradition of citation and criticism in academic and literary communities. The Project Xanadu research papers were heavily motivated by this use case. Hypertext is text that is marked up with hyperlinks to other text.A successor called the Hypertext Editing System was the first to introduce the back button, which all browsers now have. Since the system only had text, the “button” was itself text. Sound familiar? A web page is hypertext, and links between web pages are hyperlinks. The format for writing web pages is HTML and the protocol for loading web pages is HTTP, both of which abbreviations contain “HyperText”. See Figure 2 for an example of the early Hypertext Editing System.

Figure 2: A computer operator using the Hypertext Editing System in 1969. (Gregory Lloyd from Wikipedia, CC BY-SA 4.0 International.)

Independently of Project Xanadu, the first hyperlink system appeared for scrolling within a single document; it was later generalized to linking between multiple documents. And just like those original systems, the web has linking within documents as well as between them. For example, the URL http://browser.engineering/history.html#the-web-emerges refers to a document called “history.html”, and specifically to the element in it with the name “the-web-emerges”: this section. Visiting that URL will load this chapter and scroll to this section.

This work also formed and inspired one of the key parts of Douglas Engelbart’s mother of all demos, perhaps the most influential technology demonstration in the history of computing (see Figure 3). That demo not only showcased the key concepts of the web, but also introduced the computer mouse and graphical user interface, both of which are central components of a browser UI.That demo went beyond even this. There are some parts of it that have not yet been realized in any computer system. Watch it!

Figure 3: Doug Engelbart presenting the mother of all demos. (SRI International, via the Doug Engelbart Institute.)

There is of course a very direct connection between this research and the document–URL–hyperlink setup of the web, which built on the hypertext idea and applied it in practice. The HyperTIES system, for example, had highlighted hyperlinks and was used to develop the world’s first electronically published academic journal, the 1988 issue of the Communications of the ACM. Tim Berners-Lee cites that 1988 issue as inspiration for the World Wide Web,Nowadays the World Wide Web is called just “the web”, or “the web ecosystem”—ecosystem being another way to capture the same concept as “World Wide”. The original wording lives on in the “www” in many website domain names. in which he joined the link concept with the availability of the internet, thus realizing many of the original goals of all this work from previous decades.Just as the web itself is a realization of previous ambitions and dreams, today we strive to realize the vision laid out by the web. (No, it’s not done yet!)

The word “hyperlink” may have been coined in 1987, in connection with the HyperCard system on Apple computers. This system was also one of the first, or perhaps the first, to introduce the concept of augmenting hypertext with scripts that handle user events like clicks and perform actions that enhance the UI—just like JavaScript on a web page! It also had graphical UI elements, not just text, unlike most predecessors.

In 1989–1990, the first web browser (named WorldWideWeb, see Figure 4) and web server (named httpd, for HTTP Daemon, according to UNIX naming conventions) were born, written by Tim Berners-Lee. Interestingly, while that browser’s capabilities were in some ways inferior to the browser you will implement in this book,No CSS! No JS! Not even images! in other ways they go beyond the capabilities available even in modern browsers.For example, the first browser included the concept of an index page meant for searching within a site (vestiges of which exist today in the “index.html” convention when a URL path ends in /”), and had a WYSIWYG web page editor (the “contenteditable” HTML attribute on DOM elements (see Chapter 16) have similar semantic behavior, but built-in file saving is gone). Today, the index is replaced with a search engine, and web page editors as a concept are somewhat obsolete due to the highly dynamic nature of today’s web page rendering. On December 20, 1990 the first web page was created. The browser we will implement in this book is easily able to render this web page, even today.Also, as you can see clearly, that web page has not been updated in the meantime, and retains its original aesthetics! In 1991, Berners-Lee advertised his browser and the concept on the alt.hypertext Usenet group.

Figure 4: Screenshot of the WorldWideWeb browser. (Communications of the ACM, August 1994.)

Berners-Lee’s Brief History of the Web highlights a number of other key factors that led to the World Wide Web becoming the web we know today. One key factor was its decentralized nature, which he describes as arising from the academic culture of CERN, where he worked. The decentralized nature of the web is a key feature that distinguishes it from many systems that came before or after, and his explanation of it is worth quoting here (the italics are mine):

There was clearly a need for something like EnquireEnquire was a predecessor web-like database system, also written by Berners-Lee. but accessible to everyone. I wanted it to scale so that if two people started to use it independently, and later started to work together, they could start linking together their information without making any other changes. This was the concept of the web.

This quote captures one of the key value propositions of the web: its decentralized nature. The web was successful for several reasons, but they all had to do with decentralization:

Browsers

The first widely distributed browser may have been ViolaWWW (see Figure 5); this browser also pioneered multiple interesting features such as applets and images. It was in turn the inspiration for NCSA Mosaic (see Figure 6), which launched in 1993. One of the two original authors of Mosaic went on to co-found Netscape, which built Netscape Navigator (see Figure 7), the first commercial browser,By commercial I mean built by a for-profit entity. Netscape’s early versions were also not free software—you had to buy them from a store. They cost about $50. which launched in 1994. Feeling threatened, Microsoft launched Internet Explorer (see Figure 8) in 1995 and soon bundled it with Windows 95.

Figure 5: ViolaWWW. (Viola in a Nutshell.)
Figure 6: Mosaic. (Wikipedia, CC0 1.0.)
Figure 7: Netscape Navigator 1.22. (Wikipedia.)
Figure 8: Internet Explorer 1.0. (Wikipedia, used with permission from Microsoft.)

The era of the “first browser war” ensued: a competition between Netscape Navigator and Internet Explorer. There were also other browsers with smaller market shares; one notable example is Opera. The WebKit project began in 1999; Safari and Chromium-based browsers, such as Chrome and newer versions of Edge, descend from this codebase. Likewise, the Gecko rendering engine was originally developed by Netscape starting in 1997; the Firefox browser is descended from that codebase. During the first browser war, nearly all of the core features of this book’s simple browser were added, including CSS, DOM, and JavaScript.

The “second browser war”, which according to Wikipedia was 2004–2017, was fought between a variety of browsers, in particular Internet Explorer, Firefox, Safari, and Chrome. Initially, Safari and Chrome used the same rendering engine, but Chrome forked into Blink in 2013, which Microsoft Edge adopted by 2020. The second browser war saw the development of many features of the modern web, including widespread use of AJAXAsynchronous JavaScript and XML, where XML stands for eXtensible Markup Language., HTML5 features like <canvas>, and a huge explosion in third-party JavaScript libraries and frameworks.

Web Standards

In parallel with these developments was another, equally important, one—the standardization of web APIs. In October 1994, the World Wide Web Consortium (W3C) was founded to provide oversight and standards for web features. Prior to this point, browsers would often introduce new HTML elements or APIs, and competing browsers would have to copy them. With a standards organization, those elements and APIs could subsequently be agreed upon and documented in specifications. (These days, an initial discussion, design, and specification precedes any new feature.) Later on, the HTML specification ended up moving to a different standards body called the WHATWG, but CSS and other features are still standardized at the W3C. JavaScript is standardized at yet another standards body, TC39 (Technical Committee 39) at ECMA. HTTP is standardized by the IETF. The important point is that the standards process set up in the mid-1990s is still with us.

In the first years of the web, it was not so clear that browsers would remain standard and that one browser might not end up “winning” and becoming another proprietary software platform. There are multiple reasons this didn’t happen, among them the egalitarian ethos of the computing community and the presence and strength of the W3C. Another important reason was the networked nature of the web, and therefore the necessity for web developers to make sure their pages worked correctly in most or all of the browsers (otherwise they would lose customers), leading them to avoid proprietary extensions. On the contrary, browsers worked hard to carefully reproduce each other’s undocumented behaviors—even bugs—to make sure they continued supporting the whole web.

There never really was a point where any browser openly attempted to break away from the standard, despite fears that that might happen.Perhaps the closest the web came to fragmenting was with the late-1990s introduction of features for DHTML—early versions of the Document Object Model you’ll learn about in this book. Netscape and Internet Explorer at first had incompatible implementations of these features, and it took years, the development of a common specification, and significant pressure campaigns on the browsers before standardization was achieved. You can read about this story in much more depth from Jay Hoffman. Instead, intense competition for market share was channeled into very fast innovation and an ever-expanding set of APIs and capabilities for the web, which we nowadays refer to as the web platform, not just the “World Wide Web”. This recognizes the fact that the web is no longer a document viewing mechanism, but has evolved into a fully realized computing platform and ecosystem.There have even been operating systems built around the web! Examples include webOS, which powered some Palm smartphones, Firefox OS (that today lives on in KaiOS-based phones), and ChromeOS, which is a desktop operating system. All of these operating systems are based on using the web as the UI layer for all applications, with some JavaScript-exposed APIs on top for system integration.

Given the outcomes—multiple competing browsers and well-developed standards—it is in retrospect not that relevant which browser “won” or “lost” each of the browser “wars”. In each case the web won, because it gained users and grew in capability.

Open Source

Another important and interesting outcome of the second browser war was that all mainstream browsers todayExamples of Chromium-based browsers include Chrome, Edge, Opera (which switched to Chromium from the Presto engine in 2013), Samsung Internet, Yandex Browser, UC Browser, and Brave. In addition, there are many “embedded” browsers, based on one or another of the three engines, for a wide variety of automobiles, phones, TVs, and other electronic devices. are based on three open-source web rendering / JavaScript engines: Chromium, Gecko, and WebKit.The JavaScript engines are actually in different repositories (as are various other subcomponents), and can and do get used outside the browser as JavaScript virtual machines. One important application is the use of V8 to power node.js. However, each of the three rendering engines does have a corresponding JavaScript implementation, so conflating the two is reasonable. Since Chromium and WebKit have a common ancestral codebase, while Gecko is an open-source descendant of Netscape, all three date back to the 1990s—almost to the beginning of the web.

This is not an accident, and in fact tells us something quite interesting about the most cost-effective way to implement a rendering engine based on a commodity set of platform APIs. For example, it’s common for independent developers, not paid by the company nominally controlling the browser, to contribute code and features. There are even companies and individuals that specialize in implementing browser features! It’s also common for features in one browser to copy code from another. And every major browser being open source feeds back into the standards process, reinforcing the web’s decentralized nature.

Summary

In summary, the history went like this:

  1. Basic research was performed into ways to represent and explore information.

  2. Once the necessary technology became mature enough, the web proper was proposed and implemented.

  3. The web became popular quite quickly, and many browsers appeared in order to capitalize on the web’s opportunity.

  4. Standards organizations were introduced in order to negotiate between the browsers and avoid proprietary control.

  5. Competition between browsers grew their power and complexity at a rapid pace.

  6. Browsers appeared on all devices and operating systems, from desktop to mobile to embedded.

  7. Eventually, all web rendering engines became open source, as a recognition of their being a shared effort larger than any single entity.

The web has come a long way! But one thing seems clear: it isn’t done yet.

Exercises

iii-1 What comes next? Based on what you learned about how the web came about and took its current form, what trends do you predict for its future evolution? For example, do you think it’ll compete effectively against other non-web technologies and platforms?

iii-2 What became of the original ideas? The way the web works in practice is significantly different than the memex; one key difference is that there is no built-in way for the user of the web to add links between pages or notate them. Why do you think this is? Can you think of other goals from the original work that remain unrealized?

Downloading Web Pages

A web browser displays information identified by a URL. And the first step is to use that URL to connect to and download information from a server somewhere on the internet.

Connecting to a Server

Browsing the internet starts with a URL,“URL” stands for “uniform resource locator”, meaning that it is a portable (uniform) way to identify web pages (resources) and also that it describes how to access those files (locator). a short string that identifies a particular web page that the browser should visit.

http://example.org/index.html

Figure 1: The syntax of URLs.

A URL has three parts (see Figure 1): the scheme explains how to get the information; the host name explains where to get it; and the path explains what information to get. There are also optional parts to the URL, like ports, queries, and fragments, which we’ll see later.

From a URL, the browser can start the process of downloading the web page. The browser first asks the local operating system (OS) to put it in touch with the server described by the host name. The OS then talks to a Domain Name System (DNS) server which convertsYou can use a DNS lookup tool like nslookup.io or the dig command to do this conversion yourself. a host name like example.org into a destination IP address like 93.184.216.34.Today there are two versions of IP (Internet Protocol): IPv4 and IPv6. IPv6 addresses are a lot longer and are usually written in hexadecimal, but otherwise the differences don’t matter here. Then the OS decides which hardware is best for communicating with that destination IP address (say, wireless or wired) using what is called a routing table, and then uses device drivers to send signals over a wire or over the air.I’m skipping steps here. On wires you first have to wrap communications in ethernet frames, on wireless you have to do even more. I’m trying to be brief. Those signals are picked up and transmitted by a series of routersOr a switch, or an access point; there are a lot of possibilities, but eventually there is a router. which each choose the best direction to send your message so that it eventually gets to the destination.They may also record where the message came from so they can forward the reply back. When the message reaches the server, a connection is created. Anyway, the point of this is that the browser tells the OS, “Hey, put me in touch with example.org”, and it does.

On many systems, you can set up this kind of connection using the telnet program, like this:The “80” is the port, discussed below.

telnet example.org 80

(Note: When you see a gray outline, it means that the code in question is an example only, and not actually part of our browser’s code.)

You might need to install telnet; it is often disabled by default. On Windows, go to Programs and Features / Turn Windows features on or off in the Control Panel; you’ll need to reboot. When you run it, it’ll clear the screen instead of printing something, but other than that works normally. On macOS, you can use the nc -v command as a replacement for telnet:

nc -v example.org 80

The output is a little different but it works in the same way. On most Linux systems, you can install telnet or nc from the package manager, usually from packages called telnet and netcat.

You’ll get output that looks like this:

Trying 93.184.216.34...
Connected to example.org.
Escape character is '^]'.

This means that the OS converted the host name example.org into the IP address 93.184.216.34 and was able to connect to it.The line about escape characters is just instructions for using obscure telnet features. You can now talk to example.org.

The URL syntax is defined in RFC 3987, whose first author is Tim Berners-Lee—no surprise there! The second author is Roy Fielding, a key contributor to the design of HTTP and also well known for describing the Representational State Transfer (REST) architecture of the web in his Ph.D. thesis, which explains how REST allowed the web to grow in a decentralized way. Today, many services provide “RESTful APIs” that also follow these principles, though there does seem to be some confusion about it.

Requesting Information

Once it’s connected, the browser requests information from the server by giving its path, the path being the part of a URL that comes after the host name, like /index.html. The structure of the request is shown in Figure 2. Type this into telnet to try it.

GET /index.html HTTP/1.0
Host: example.org

Figure 2: An annotated HTTP GET request.

Here, the word GET means that the browser would like to receive information,It could say POST if it intended to send information, plus there are some other, more obscure, options. then comes the path, and finally there is the word HTTP/1.0 which tells the host that the browser speaks version 1.0 of HTTP. There are several versions of HTTP (0.9, 1.0, 1.1, 2.0, and 3.0). The HTTP 1.1 standard adds a variety of useful features, like keep-alive, but in the interest of simplicity our browser won’t use them. We’re also not implementing HTTP 2.0; it is much more complex than the 1.x series, and is intended for large and complex web applications, which our browser can’t run anyway.

After the first line, each line contains a header, which has a name (like Host) and a value (like example.org). Different headers mean different things; the Host header, for example, tells the server who you think it is.This is useful when the same IP address corresponds to multiple host names and hosts multiple websites (for example, example.com and example.org). The Host header tells the server which of multiple websites you want. These websites basically require the Host header to function properly. Hosting multiple domains on a single computer is very common. There are lots of other headers one could send, but let’s stick to just Host for now.

Finally, after the headers comes a single blank line; that tells the host that you are done with headers. So type a blank line into telnet (hit Enter twice after typing the two lines of the request) and you should get a response from example.org.

HTTP/1.0 is standardized in RFC 1945, and HTTP/1.1 in RFC 2616. HTTP was designed to be simple to understand and implement, making it easy for any kind of computer to adopt it. It’s no coincidence that you can type HTTP directly into telnet! Nor is it an accident that HTTP is a “line-based protocol”, using plain text and newlines, similar to the Simple Mail Transfer Protocol (SMTP) for email. Ultimately, the whole pattern derives from early computers only having line-based text input. In fact, one of the first two browsers had a line-mode UI.

The Server’s Response

The server’s response starts with the line in Figure 3.

HTTP/1.0 200 OK

Figure 3: Annotated first line of an HTTP response.

This tells you that the host confirms that it, too, speaks HTTP/1.0, and that it found your request to be “OK” (which has a numeric code of 200). You may be familiar with 404 Not Found; that’s another numeric code and response, as are 403 Forbidden or 500 Server Error. There are lots of these codes, and they have a pretty neat organization scheme:The status text like OK can actually be anything and is just there for humans, not for machines.

Note the genius of having two sets of error codes (400s and 500s) to tell you who is at fault, the server or the browser.More precisely, who the server thinks is at fault. You can find a full list of the different codes on Wikipedia, and new ones do get added here and there.

After the 200 OK line, the server sends its own headers. When I did this, I got these headers (but yours will differ):

Age: 545933
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Mon, 25 Feb 2019 16:49:28 GMT
Etag: "1541025663+gzip+ident"
Expires: Mon, 04 Mar 2019 16:49:28 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (sec/96EC)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1270
Connection: close

There is a lot here, about the information you are requesting (Content-Type, Content-Length, and Last-Modified), about the server (Server, X-Cache), about how long the browser should cache this information (Cache-Control, Expires, Etag), and about all sorts of other stuff. Let’s move on for now.

After the headers there is a blank line followed by a bunch of HTML code. This is called the body of the server’s response, and your browser knows that it is HTML because of the Content-Type header, which says that it is text/html. It’s this HTML code that contains the content of the web page itself.

The HTTP request/response transaction is summarized in Figure 4. Let’s now switch gears from making manual connections to Python.

Figure 4: An HTTP request and response pair are how a web browser gets web pages from a web server.

Wikipedia has nice lists of HTTP headers and response codes. Some of the HTTP response codes are almost never used, like 402 “Payment Required”. This code was intended to be used for “digital cash or (micro) payment systems”. While e-commerce is alive and well without the response code 402, micropayments have not (yet?) gained much traction, even though many people (including me!) think they are a good idea.

Telnet in Python

So far we’ve communicated with another computer using telnet. But it turns out that telnet is quite a simple program, and we can do the same programmatically. It’ll require extracting the host name and path from the URL, creating a socket, sending a request, and receiving a response.In Python, there’s a library called urllib.parse for parsing URLs, but I think implementing our own will be good for learning. Plus, it makes this book less Python-specific.

Let’s start with parsing the URL. I’m going to make parsing a URL return a URL object, and I’ll put the parsing code into the constructor:

class URL:
    def __init__(self, url):
        # ...

The __init__ method is Python’s peculiar syntax for class constructors, and the self parameter, which you must always make the first parameter of any method, is Python’s analog of this in C++ or Java.

Let’s start with the scheme, which is separated from the rest of the URL by ://. Our browser only supports http, so let’s check that, too:

class URL:
    def __init__(self, url):
        self.scheme, url = url.split("://", 1)
        assert self.scheme == "http"

Now we must separate the host from the path. The host comes before the first /, while the path is that slash and everything after it:

class URL:
    def __init__(self, url):
        # ...
        if "/" not in url:
            url = url + "/"
        self.host, url = url.split("/", 1)
        self.path = "/" + url

(When you see a code block with a # ..., like this one, that means you’re adding code to an existing method or block.) The split(s, n) method splits a string at the first n copies of s. Note that there’s some tricky logic here for handling the slash between the host name and the path. That (optional) slash is part of the path.

Now that the URL has the host and path fields, we can download the web page at that URL. We’ll do that in a new method, request:

class URL:
    def request(self):
        # ...

Note that you always need to write the self parameter for methods in Python. In the future, I won’t always make such a big deal out of defining a method—if you see a code block with code in a method or function that doesn’t exist yet, that means we’re defining it.

The first step to downloading a web page is connecting to the host. The operating system provides a feature called “sockets” for this. When you want to talk to other computers (either to tell them something, or to wait for them to tell you something), you create a socket, and then that socket can be used to send information back and forth. Sockets come in a few different kinds, because there are multiple ways to talk to other computers:

By picking all of these options, we can create a socket like so:While this code uses the Python socket library, your favorite language likely contains a very similar library; the API is basically standardized. In Python, the flags we pass are defaults, so you can actually call socket.socket(); I’m keeping the flags here in case you’re following along in another language.

import socket

class URL:
    def request(self):
        s = socket.socket(
            family=socket.AF_INET,
            type=socket.SOCK_STREAM,
            proto=socket.IPPROTO_TCP,
        )

Once you have a socket, you need to tell it to connect to the other computer. For that, you need the host and a port. The port depends on the protocol you are using; for now it should be 80.

class URL:
    def request(self):
        # ...
        s.connect((self.host, 80))

This talks to example.org to set up the connection and prepare both computers to exchange data.

Naturally this won’t work if you’re offline. It also might not work if you’re behind a proxy, or in a variety of more complex networking environments. The workaround will depend on your setup—it might be as simple as disabling your proxy, or it could be much more complex.

Note that there are two parentheses in the connect call: connect takes a single argument, and that argument is a pair of a host and a port. This is because different address families have different numbers of arguments.

The “sockets” API, which Python more or less implements directly, derives from the original “Berkeley sockets” API design for 4.2 BSD Unix in 1983. Of course, Windows and Linux merely reimplement the API, but macOS and iOS actually do still use large amounts of code descended from BSD Unix.

Request and Response

Now that we have a connection, we make a request to the other server. To do so, we send it some data using the send method:

class URL:
    def request(self):
        # ...
        request = "GET {} HTTP/1.0\r\n".format(self.path)
        request += "Host: {}\r\n".format(self.host)
        request += "\r\n"
        s.send(request.encode("utf8"))

The send method just sends the request to the server.send actually returns a number, in this case 47. That tells you how many bytes of data you sent to the other computer; if, say, your network connection failed midway through sending the data, you might want to know how much you sent before the connection failed. There are a few things in this code that have to be exactly right. First, it’s very important to use \r\n instead of \n for newlines. It’s also essential that you put two \r\n newlines at the end, so that you send that blank line at the end of the request. If you forget that, the other computer will keep waiting on you to send that newline, and you’ll keep waiting on its response.Computers are endlessly literal-minded.

Also note the encode call. When you send data, it’s important to remember that you are sending raw bits and bytes; they could form text or an image or video. But a Python string is specifically for representing text. The encode method converts text into bytes, and there’s a corresponding decode method that goes the other way.When you call encode and decode you need to tell the computer what character encoding you want it to use. This is a complicated topic. I’m using utf8 here, which is a common character encoding and will work on many pages, but in the real world you would need to be more careful. Python reminds you to be careful by giving different types to text and to bytes:

>>> type("text")
<class 'str'>
>>> type("text".encode("utf8"))
<class 'bytes'>

If you see an error about str versus bytes, it’s because you forgot to call encode or decode somewhere.

To read the server’s response, you could use the read function on sockets, which gives whatever bits of the response have already arrived. Then you write a loop to collect those bits as they arrive. However, in Python you can use the makefile helper function, which hides the loop:If you’re in another language, you might only have socket.read available. You’ll need to write the loop, checking the socket status, yourself.

class URL:
    def request(self):
        # ...
        response = s.makefile("r", encoding="utf8", newline="\r\n")

Here, makefile returns a file-like object containing every byte we receive from the server. I am instructing Python to turn those bytes into a string using the utf8 encoding, or method of associating bytes to letters.Hard-coding utf8 is not correct, but it’s a shortcut that will work alright on most English-language websites. In fact, the Content-Type header usually contains a charset declaration that specifies the encoding of the body. If it’s absent, browsers still won’t default to utf8; they’ll guess, based on letter frequencies, and you will see ugly � strange áççêñ£ß when they guess wrong. I’m also informing Python of HTTP’s weird line endings.

Let’s now split the response into pieces. The first line is the status line:I could have asserted that 200 is required, since that’s the only code our browser supports, but it’s better to just let the browser render the returned body, because servers will generally output a helpful and user-readable HTML error page even for error codes. This is another way in which the web is easy to implement incrementally.

class URL:
    def request(self):
        # ...
        statusline = response.readline()
        version, status, explanation = statusline.split(" ", 2)

Note that I do not check that the server’s version of HTTP is the same as mine; this might sound like a good idea, but there are a lot of misconfigured servers out there that respond in HTTP 1.1 even when you talk to them in HTTP 1.0.Luckily the protocols are similar enough to not cause confusion.

After the status line come the headers:

class URL:
    def request(self):
        # ...
        response_headers = {}
        while True:
            line = response.readline()
            if line == "\r\n": break
            header, value = line.split(":", 1)
            response_headers[header.casefold()] = value.strip()

For the headers, I split each line at the first colon and fill in a map of header names to header values. Headers are case-insensitive, so I normalize them to lower case.I used casefold instead of lower, because it works better for more languages. Also, whitespace is insignificant in HTTP header values, so I strip off extra whitespace at the beginning and end.

Headers can describe all sorts of information, but a couple of headers are especially important because they tell us that the data we’re trying to access is being sent in an unusual way. Let’s make sure none of those are present.Exercise 1-9 describes how your browser should handle these headers if they are present.

class URL:
    def request(self):
        # ...
        assert "transfer-encoding" not in response_headers
        assert "content-encoding" not in response_headers

The usual way to send the data, then, is everything after the headers:

class URL:
    def request(self):
        # ...
        content = response.read()
        s.close()

It’s the body that we’re going to display, so let’s return that:

class URL:
    def request(self):
        # ...
        return content

Now let’s actually display the text in the response body.

The Content-Encoding header lets the server compress web pages before sending them. Large, text-heavy web pages compress well, and as a result the page loads faster. The browser needs to send an Accept-Encoding header in its request to list the compression algorithms it supports. Transfer-Encoding is similar and also allows the data to be “chunked”, which many servers seem to use together with compression.

Displaying the HTML

The HTML code in the response body defines the content you see in your browser window when you go to http://example.org/index.html. I’ll be talking much, much more about HTML in future chapters, but for now let me keep it very simple.

In HTML, there are tags and text. Each tag starts with a < and ends with a >; generally speaking, tags tell you what kind of thing some content is, while text is the actual content.That said, some tags, like img, are content, not information about it. Most tags come in pairs of a start and an end tag; for example, the title of the page is enclosed in a pair of tags: <title> and </title>. Each tag, inside the angle brackets, has a tag name (like title here), and then optionally a space followed by attributes, and its pair has a / followed by the tag name (and no attributes).

So, to create our very, very simple web browser, let’s take the page HTML and print all the text, but not the tags, in it.If this example causes Python to produce a SyntaxError pointing to the end on the last line, it is likely because you are running Python 2 instead of Python 3. Make sure you are using Python 3. I’ll do this in a new function, show:Note that this is a global function and not the URL class.

def show(body):
    in_tag = False
    for c in body:
        if c == "<":
            in_tag = True
        elif c == ">":
            in_tag = False
        elif not in_tag:
            print(c, end="")

This code is pretty complex. It goes through the request body character by character, and it has two states: in_tag, when it is currently between a pair of angle brackets, and not in_tag. When the current character is an angle bracket, it changes between those states; normal characters, not inside a tag, are printed.The end argument tells Python not to print a newline after the character, which it otherwise would.

We can now load a web page just by stringing together request and show:Like show, this is a global function.

def load(url):
    body = url.request()
    show(body)

Add the following code to run load from the command line:

if __name__ == "__main__":
    import sys
    load(URL(sys.argv[1]))

The first line is Python’s version of a main function, run only when executing this script from the command line. The code reads the first argument (sys.argv[1]) from the command line and uses it as a URL. Try running this code on the URL http://example.org/:

python3 browser.py http://example.org/

You should see some short text welcoming you to the official example web page. You can also try using it on this chapter!

HTML, just like URLs and HTTP, is designed to be very easy to parse and display at a basic level. And in the beginning there were very few features in HTML, so it was possible to code up something not so much more fancy than what you see here, yet still display the content in a usable way. Even our super simple and basic HTML parser can already print out the text of the browser.engineering website.

Encrypted Connections

So far, our browser supports the http scheme. That’s a pretty common scheme. But more and more websites are migrating to the https scheme, and many websites require it.

The difference between http and https is that https is more secure—but let’s be a little more specific. The https scheme, or more formally HTTP over TLS (Transport Layer Security), is identical to the normal http scheme, except that all communication between the browser and the host is encrypted. There are quite a few details to how this works: which encryption algorithms are used, how a common encryption key is agreed to, and of course how to make sure that the browser is connecting to the correct host. The difference in the protocol layers involved is shown in Figure 5.

Figure 5: The difference between HTTP and HTTPS is the addition of a TLS layer.

Luckily, the Python ssl library implements all of these details for us, so making an encrypted connection is almost as easy as making a regular connection. That ease of use comes with accepting some default settings which could be inappropriate for some situations, but for teaching purposes they are fine.

Making an encrypted connection with ssl is pretty easy. Suppose you’ve already created a socket, s, and connected it to example.org. To encrypt the connection, you use ssl.create_default_context to create a context ctx and use that context to wrap the socket s:

import ssl
ctx = ssl.create_default_context()
s = ctx.wrap_socket(s, server_hostname=host)

Note that wrap_socket returns a new socket, which I save back into the s variable. That’s because you don’t want to send any data over the original socket; it would be unencrypted and also confusing. The server_hostname argument is used to check that you’ve connected to the right server. It should match the Host header.

On macOS, you’ll need to run a program called “Install Certificates” before you can use Python’s ssl package on most websites.

Let’s try to take this code and add it to request. First, we need to detect which scheme is being used:

class URL:
    def __init__(self, url):
        self.scheme, url = url.split("://", 1)
        assert self.scheme in ["http", "https"]
        # ...

(Note that here you’re supposed to replace the existing scheme parsing code with this new code. It’s usually clear from context, and the code itself, what you need to replace.)

Encrypted HTTP connections usually use port 443 instead of port 80:

class URL:
    def __init__(self, url):
        # ...
        if self.scheme == "http":
            self.port = 80
        elif self.scheme == "https":
            self.port = 443

We can use that port when creating the socket:

class URL:
    def request(self):
        # ...
        s.connect((self.host, self.port))
        # ...

Next, we’ll wrap the socket with the ssl library:

class URL:
    def request(self):
        # ...
        if self.scheme == "https":
            ctx = ssl.create_default_context()
            s = ctx.wrap_socket(s, server_hostname=self.host)
        # ...

Your browser should now be able to connect to HTTPS sites.

While we’re at it, let’s add support for custom ports, which are specified in a URL by putting a colon after the host name, as in Figure 6.

http://example.org:8080/index.html

Figure 6: Where the port goes in a URL.

If the URL has a port we can parse it out and use it:

class URL:
    def __init__(self, url):
        # ...
        if ":" in self.host:
            self.host, port = self.host.split(":", 1)
            self.port = int(port)

Custom ports are handy for debugging. Python has a built-in web server you can use to serve files on your computer. For example, if you run

python3 -m http.server 8000 -d /some/directory

then going to http://localhost:8000/ should show you all the files in that directory. This is a good way to test your browser.

TLS is pretty complicated. You can read the details in RFC 8446, but implementing your own is not recommended. It’s very difficult to write a custom TLS implementation that is not only correct but secure.

At this point you should be able to run your program on any web page. Here is what it should output for a simple example:


  
    This is a simple
    web page with some
    text in it.
  

Summary

This chapter went from an empty file to a rudimentary web browser that can:

Yes, this is still more of a command-line tool than a web browser, but it already has some of the core capabilities of a browser.

Outline

The complete set of functions, classes, and methods in our browser should look something like this:

class URL: def __init__(url) def request() def show(body) def load(url)

Exercises

1-1 HTTP/1.1. Along with Host, send the Connection header in the request function with the value close. Your browser can now declare that it is using HTTP/1.1. Also add a User-Agent header. Its value can be whatever you want—it identifies your browser to the host. Make it easy to add further headers in the future.

1-2 File URLs. Add support for the file scheme, which allows the browser to open local files. For example, file:///path/goes/here should refer to the file on your computer at location /path/goes/here. Also make it so that, if your browser is started without a URL being given, some specific file on your computer is opened. You can use that file for quick testing.

1-3 data. Yet another scheme is data, which allows inlining HTML content into the URL itself. Try navigating to data:text/html,Hello world! in a real browser to see what happens. Add support for this scheme to your browser. The data scheme is especially convenient for making tests without having to put them in separate files.

1-4 Entities. Implement support for the less-than (&lt;) and greater-than (&gt;) entities. These should be printed as < and >, respectively. For example, if the HTML response was &lt;div&gt;, the show method of your browser should print <div>. Entities allow web pages to include these special characters without the browser interpreting them as tags.

1-5 view-source. Add support for the view-source scheme; navigating to view-source:http://example.org/ should show the HTML source instead of the rendered page. Add support for this scheme. Your browser should print the entire HTML file as if it was text. You’ll want to have also implemented Exercise 1-4.

1-6 Keep-alive. Implement Exercise 1-1; however, do not send the Connection: close header. Instead, when reading the body from the socket, only read as many bytes as given in the Content-Length header and don’t close the socket afterward. Instead, save the socket, and if another request is made to the same server reuse the same socket instead of creating a new one. This will speed up repeated requests to the same server, which are common.

1-7 Redirects. Error codes in the 300 range request a redirect. When your browser encounters one, it should make a new request to the URL given in the Location header. Sometimes the Location header is a full URL, but sometimes it skips the host and scheme and just starts with a / (meaning the same host and scheme as the original request). The new URL might itself be a redirect, so make sure to handle that case. You don’t, however, want to get stuck in a redirect loop, so make sure to limit how many redirects your browser can follow in a row. You can test this with the URL http://browser.engineering/redirect, which redirects back to this page, and its /redirect2 and /redirect3 cousins which do more complicated redirect chains.

1-8 Caching. Typically, the same images, styles, and scripts are used on multiple pages; downloading them repeatedly is a waste. It’s generally valid to cache any HTTP response, as long as it was requested with GET and received a 200 response.Some other status codes like 301 and 404 can also be cached. Implement a cache in your browser and test it by requesting the same file multiple times. Servers control caches using the Cache-Control header. Add support for this header, specifically for the no-store and max-age values. If the Cache-Control header contains any value other than these two, it’s best not to cache the response.

1-9 Compression. Add support for HTTP compression, in which the browser informs the server that compressed data is acceptable. Your browser must send the Accept-Encoding header with the value gzip. If the server supports compression, its response will have a Content-Encoding header with value gzip. The body is then compressed. Add support for this case. To decompress the data, you can use the decompress method in the gzip module. GZip data is not utf8-encoded, so pass "rb" to makefile to work with raw bytes instead. Most web servers send compressed data in a Transfer-Encoding called chunked.There are also a couple of Transfer-Encodings that compress the data. They aren’t commonly used. You’ll need to add support for that, too.

Drawing to the Screen

A web browser doesn’t just download a web page; it also has to show that page to the user. In the twenty-first century, that means a graphical application.There are some obscure text-based browsers: I used w3m as my main browser for most of 2011. I don’t anymore. So in this chapter we’ll equip our browser with a graphical user interface.

Creating Windows

Desktop and laptop computers run operating systems that provide desktop environments: windows, buttons, and a mouse. So responsibility ends up split: programs control their windows, but the desktop environment controls the screen. Therefore:

Doing all of this by hand is a bit of a drag, so programs usually use a graphical toolkit to simplify these steps. Python comes with a graphical toolkit called Tk in the Python package tkinter.The library is called Tk, and it was originally written for a different language called Tcl. Python contains an interface to it, hence the name. Using it is quite simple:

import tkinter
window = tkinter.Tk()
tkinter.mainloop()

Here, tkinter.Tk() asks the desktop environment to create a window and returns an object that you can use to draw to the window. The tkinter.mainloop() call enters a loop that looks like this:This pseudocode may look like an infinite loop that locks up the computer, but it’s not. Either the operating system will multitask among threads and processes, or the pendingEvents call will sleep until events are available, or both; in any case, other code will run and create events for the loop to respond to.

while True:
    for evt in pendingEvents():
        handleEvent(evt)
    drawScreen()
Figure 1: Flowchart of an event-handling cycle.

Here, pendingEvents first asks the desktop environment for recent mouse clicks or key presses, then handleEvent calls your application to update state, and then drawScreen redraws the window. This event loop pattern (see Figure 1) is common in many applications, from web browsers to video games, because in complex graphical applications it ensures that all events are eventually handled and the screen is eventually updated.

Although you’re probably writing your browser on a desktop computer, many people access the web through mobile devices such as phones or tablets. On mobile devices there’s still a screen, a rendering loop, and most other things discussed in this book.For example, most real browsers have both desktop and mobile editions, and the rendering engine code is almost exactly the same for both.

But there are several differences worth noting. Applications are usually full-screen, with only one application drawing to the screen at a time. There’s no mouse and only a virtual keyboard, so the main form of interaction is touch. There is the concept of a “visual viewport” that is not present on a desktop, to accommodate “desktop-only” and “mobile-ready” sites, as well as pinch zoom.Look at the source of this chapter’s webpage. In the <head> you’ll see a “viewport” <meta> tag. This tag tells the browser that the page supports mobile devices; without it, the browser assumes that the site is “desktop-only” and renders it differently, such as allowing the user to use a pinch-zoom or double-tap gesture to focus in on one part of the page. Once zoomed in, the part of the page visible on the screen is the “visual viewport” and the whole documents’ bounds are the “layout viewport”. This is kind of a mix between zooming and scrolling that’s usually absent on desktop. And screen pixel density is much higher, but the total screen resolution is usually lower. Supporting all of these differences is doable, but quite a bit of work. This book won’t go further into implementing them, except in some cases as exercises.

Also, power efficiency is much more important, because the device runs on a battery, while at the same time the central processing unit (CPU) and memory are significantly slower and less capable. That makes it much more important to take advantage of any graphical processing unit (GPU)—the slow CPU makes good performance harder to achieve. Mobile browsers are challenging!

Drawing to the Window

Our browser will draw the web page text to a canvas, a rectangular Tk widget that you can draw circles, lines, and text on. For example, you can create a canvas with Tk like this:You may be familiar with the HTML <canvas> element, which is a similar idea: a two-dimensional rectangle in which you can draw shapes.

window = tkinter.Tk()
canvas = tkinter.Canvas(window, width=800, height=600)
canvas.pack()

The first line creates the window, and the second creates the Canvas inside that window. We pass the window as an argument, so that Tk knows where to display the canvas. The other arguments define the canvas’s size; I chose 800 × 600 because that was a common old-timey monitor size.This size, called Super Video Graphics Array (SVGA), was standardized in 1987, and probably did seem super back then. The third line is a Tk peculiarity, which positions the canvas inside the window. Tk also has widgets like buttons and dialog boxes, but our browser won’t use them: we will need finer-grained control over appearance, which a canvas provides.This is why desktop applications are more uniform than web pages: desktop applications generally use widgets provided by a common graphical toolkit, which makes them look similar.

To keep it all organized let’s put this code in a class:

WIDTH, HEIGHT = 800, 600

class Browser:
    def __init__(self):
        self.window = tkinter.Tk()
        self.canvas = tkinter.Canvas(
            self.window, 
            width=WIDTH,
            height=HEIGHT
        )
        self.canvas.pack()

Once you’ve made a canvas, you can call methods that draw shapes on the canvas. Let’s do that inside load, which we’ll move into the new Browser class:

class Browser:
    def load(self, url):
        # ...
        self.canvas.create_rectangle(10, 20, 400, 300)
        self.canvas.create_oval(100, 100, 150, 150)
        self.canvas.create_text(200, 150, text="Hi!")

To run this code, create a Browser, call load, and then start the Tk mainloop:

if __name__ == "__main__":
    import sys
    Browser().load(URL(sys.argv[1]))
    tkinter.mainloop()

You ought to see: a rectangle, starting near the top-left corner of the canvas and ending at its center; then a circle inside that rectangle; and then the text “Hi!” next to the circle, as in Figure 2.

Figure 2: The expected example output with a rectangle, circle, and text.

Coordinates in Tk refer to x positions from left to right and y positions from top to bottom. In other words, the bottom of the screen has larger y values, the opposite of what you might be used to from math. Play with the coordinates above to figure out what each argument refers to.The answers are in the online documentation.

The Tk canvas widget is quite a bit more powerful than what we’re using it for here. As you can see from the tutorial, you can move the individual things you’ve drawn to the canvas, listen to click events on each one, and so on. I’m not using those features in this book, because I want to teach you how to implement them yourself.

Laying Out Text

Let’s draw a simple web page on this canvas. So far, our browser steps through the web page source code character by character and prints the text (but not the tags) to the console window. Now we want to draw the characters on the canvas instead.

To start, let’s change the show function from the previous chapter into a function that I’ll call lexForeshadowing future developments… which just returns the textual content of an HTML document without printing it:

def lex(body):
    text = ""
    # ...
    for c in body:
        # ...
        elif not in_tag:
            text += c
    return text

Then, load will draw that text, character by character:

def load(self, url):
    # ...
    for c in text:
        self.canvas.create_text(100, 100, text=c)

Let’s test this code on a real web page. For reasons that might seem inscrutable,It’s to delay a discussion of basic typography to the next chapter. let’s test it on the first chapter of 西游记 or Journey to the West, a classic Chinese novel about a monkey. Run this URLRight click on the link and "Copy URL". through request, lex, and load. You should see a window with a big blob of black pixels inset a little from the top left corner of the window.

Why a blob instead of letters? Well, of course, because we are drawing every letter in the same place, so they all overlap! Let’s fix that:

HSTEP, VSTEP = 13, 18
cursor_x, cursor_y = HSTEP, VSTEP
for c in text:
    self.canvas.create_text(cursor_x, cursor_y, text=c)
    cursor_x += HSTEP

The variables cursor_x and cursor_y point to where the next character will go, as if you were typing the text into a word processor. I picked the magic numbers—13 and 18—by trying a few different values and picking one that looked most readable.In Chapter 3, we’ll replace the magic numbers with font metrics.

The text now forms a line from left to right. But with an 800-pixel-wide canvas and 13 pixels per character, one line only fits about 60 characters. You need more than that to read a novel, so we also need to wrap the text once we reach the edge of the screen:

for c in text:
    # ...
    if cursor_x >= WIDTH - HSTEP:
        cursor_y += VSTEP
        cursor_x = HSTEP

The code increases cursor_y and resets cursor_xIn the olden days of typewriters, increasing y meant feeding in a new line, and resetting x meant returning the carriage that printed letters to the left edge of the page. So the American Standard Code for Information Interchange (ASCII) standardized two separate characters—“carriage return” and “line feed”—for these operations, so that ASCII could be directly executed by teletypewriters. That’s why headers in HTTP are separated by \r\n, even though modern computers have no mechanical carriage. once cursor_x goes past 787 pixels.Not 800, because we started at pixel 13 and I want to leave an even gap on both sides. The sequence is shown in Figure 3. Wrapping the text this way makes it possible to read more than a single line.

Here’s a widget demonstrating that concept:

At this point you should be able to load up our example page in your browser and have it look something like Figure 4.

Figure 4: The first chapter of Journey to the West rendered in our browser.

Now we can read a lot of text, but still not all of it: if there’s enough text, not all of the lines will fit on the screen. We want users to scroll the page to look at different parts of it.

In English text, you can’t wrap to the next line in the middle of a word (without hyphenation at least), but in Chinese that’s the default, even for words made up of multiple characters. For example, 开关 meaning “switch” is composed of “on” and “off”, but it’s just fine to line-break after . You can change the default with the word-break CSS property: break-all allows line breaks anywhere, while auto-phrase prevents them inside even inside Chinese or Japanese words or phrases such as 开关. The “auto” part here refers to the fact that the words aren’t identified by the author but instead auto-detected, often using dynamic programming based on a word frequency table.

Scrolling Text

Scrolling introduces a layer of indirection between page coordinates (this text is 132 pixels from the top of the page) and screen coordinates (since you’ve scrolled 60 pixels down, this text is 72 pixels from the top of the screen)—see Figure 5. Generally speaking, a browser lays out the page—determines where everything on the page goes—in terms of page coordinates and then rasters the page—draws everything—in terms of screen coordinates.Sort of. What actually happens is that the page is first drawn into a bitmap or GPU texture, then that bitmap/texture is shifted according to the scroll, and the result is rendered to the screen. Chapter 11 will have more on this topic.

Figure 5: The difference between page and screen coordinates.

Our browser will have the same split. Right now load computes both the position of each character and draws it: layout and rendering. Let’s instead have a layout function to compute and store the position of each character, and a separate draw function to then draw each character based on the stored position. This way, layout can operate with page coordinates and only draw needs to think about screen coordinates.

Let’s start with layout. Instead of calling canvas.create_text on each character, let’s add it to a list, together with its position. Since layout doesn’t need to access anything in Browser, it can be a standalone function:

def layout(text):
    display_list = []
    cursor_x, cursor_y = HSTEP, VSTEP
    for c in text:
        display_list.append((cursor_x, cursor_y, c))
        # ...
    return display_list

The resulting list of things to display is called a display list.The term “display list” is standard. Since layout is all about page coordinates, we don’t need to change anything else about it to support scrolling.

Once the display list is computed, draw needs to loop through it and draw each character. Since draw does need access to the canvas, we make it a method on Browser:

class Browser:
    def draw(self):
        for x, y, c in self.display_list:
            self.canvas.create_text(x, y, text=c)

Now load just needs to call layout followed by draw:

class Browser:
    def load(self, url):
        body = url.request()
        text = lex(body)
        self.display_list = layout(text)
        self.draw()

Now we can add scrolling. Let’s add a field for how far you’ve scrolled:

class Browser:
    def __init__(self):
        # ...
        self.scroll = 0

The page coordinate y then has screen coordinate y - self.scroll:

def draw(self):
    for x, y, c in self.display_list:
        self.canvas.create_text(x, y - self.scroll, text=c)

If you change the value of scroll the page will now scroll up and down. But how does the user change scroll?

Most browsers scroll the page when you press the up and down keys, rotate the scroll wheel, drag the scroll bar, or apply a touch gesture to the screen. To keep things simple, let’s just implement the down key.

Tk allows you to bind a function to a key, which instructs Tk to call that function when the key is pressed. For example, to bind to the down arrow key, write:

def __init__(self):
    # ...
    self.window.bind("<Down>", self.scrolldown)

Here, self.scrolldown is an event handler, a function that Tk will call whenever the down arrow key is pressed.scrolldown is passed an event object as an argument by Tk, but since scrolling down doesn’t require any information about the key press besides the fact that it happened, scrolldown ignores that event object. All it needs to do is increment scroll and redraw the canvas:

SCROLL_STEP = 100

def scrolldown(self, e):
    self.scroll += SCROLL_STEP
    self.draw()

If you try this out, you’ll find that scrolling draws all the text a second time. That’s because we didn’t erase the old text before drawing the new text. Call canvas.delete to clear the old text:

def draw(self):
    self.canvas.delete("all")
    # ...

Scrolling should now work!

Storing the display list makes scrolling faster: the browser isn’t doing layout every time you scroll. Modern browsers take this further, retaining much of the display list even when the web page changes due to JavaScript or user interaction.

In general, scrolling is the most common user interaction with web pages. Real browsers have accordingly invested a tremendous amount of time making it fast; we’ll get to some more of the ways they do this later in the book.

Faster Rendering

Applications have to redraw page contents quickly for interactions to feel fluid,On older systems, applications drew directly to the screen, and if they didn’t update, whatever was there last would stay in place, which is why in error conditions you’d often have one window leave “trails” on another. Modern systems use compositing, which avoids trails and also improves performance and isolation. Applications still redraw their window contents, though, to change what is displayed. Chapter 13 discusses compositing in more detail. and must respond quickly to clicks and key presses so the user doesn’t get frustrated. “Feel fluid” can be made more precise. Graphical applications such as browsers typically aim to redraw at a speed equal to the refresh rate, or frame rate, of the screen, and/or a fixed 60 Hz.Most screens today have a refresh rate of 60 Hz, and that is generally considered fast enough to look smooth. However, new hardware is increasingly appearing with higher refresh rates, such as 120 Hz. It’s not yet clear if browsers can be made that fast. Some rendering engines, games in particular, refresh at lower rates on purpose if they know the rendering speed can’t keep up. This means that the browser has to finish all its work in less than 1/60th of a second, or 16 ms, in order to keep up. For this reason, 16 ms is called the animation frame budget of the application.

But scrolling in our browser is pretty slow.How fast exactly seems to depend a lot on your operating system and default font. Why? It turns out that loading information about the shape of a character, inside create_text, takes a while. To speed up scrolling we need to make sure to do it only when necessary (while at the same time ensuring the pixels on the screen are always correct).

Real browsers have a lot of quite tricky optimizations for this, but for our browser let’s limit ourselves to a simple improvement: skip drawing characters that are offscreen:

for x, y, c in self.display_list:
    if y > self.scroll + HEIGHT: continue
    if y + VSTEP < self.scroll: continue
    # ...

The first if statement skips characters below the viewing window; the second skips characters above it. In that second if statement, y + VSTEP is the bottom edge of the character, because characters that are halfway inside the viewing window still have to be drawn.

Scrolling should now be pleasantly fast, and hopefully close to the 16 ms animation frame budget.On my computer, it was still about double that budget, so there is work to do—we’ll get to that in future chapters. And because we split layout and draw, we don’t need to change layout at all to implement this optimization.

You should also keep in mind that not all web page interactions are animations—there are also discrete actions such as mouse clicks. Research has shown that it usually suffices to respond to a discrete action in [100 ms]—below that threshold, most humans are not sensitive to discrete action speed. This is very different from interactions such as scroll, where a speed of less than 60 Hz or so is quite noticeable. The difference between the two has to do with the way the human mind processes movement (animation) versus discrete action, and the time it takes for the brain to decide upon such an action, execute it, and understand its result.

Summary

This chapter went from a rudimentary command-line browser to a graphical user interface with text that can be scrolled. The browser now:

And here is our browser rendering this very web page (it’s fullly interactive—after clicking on it to focus, you should be able to scroll with the down arrow):This is the full browser source code, cross-compiled to JavaScript and running in an iframe. Click “restart” to choose a new web page to render, then “start” to render it. Subsequent chapters will include one of these at the end of the chapter so you can see how it improves.

Next, we’ll make this browser work on English text, handling complexities like variable-width characters, line layout, and formatting.

Outline

The complete set of functions, classes, and methods in our browser should look something like this:

class URL: def __init__(url) def request() def lex(body) WIDTH, HEIGHT HSTEP, VSTEP def layout(text) SCROLL_STEP class Browser: def __init__() def draw() def load(url) def scrolldown(e)

Exercises

2-1 Line breaks. Change layout to end the current line and start a new one when it sees a newline character. Increment y by more than VSTEP to give the illusion of paragraph breaks. There are poems embedded in Journey to the West; now you’ll be able to make them out.

2-2 Mouse wheel. Add support for scrolling up when you hit the up arrow. Make sure you can’t scroll past the top of the page. Then bind the <MouseWheel> event, which triggers when you scroll with the mouse wheel.It will also trigger with touchpad gestures, if you don’t have a mouse. The associated event object has an event.delta value which tells you how far and in what direction to scroll. Unfortunately, macOS and Windows give the event.delta objects opposite sign and different scales, and on Linux scrolling instead uses the <Button-4> and <Button-5> events.The Tk manual has more information about this. Cross-platform applications are much harder to write than cross-browser ones!

2-3 Resizing. Make the browser resizable. To do so, pass the fill and expand arguments to canvas.pack, and call and bind to the <Configure> event, which happens when the window is resized. The window’s new width and height can be found in the width and height fields on the event object. Remember that when the window is resized, the line breaks must change, so you will need to call layout again.

2-4 Scrollbar. Stop your browser from scrolling down past the last display list entry.This is not quite right in a real browser; the browser needs to account for extra whitespace at the bottom of the screen or the possibility of objects purposefully drawn offscreen. In Chapter 5, we’ll implement this correctly. At the right edge of the screen, draw a blue, rectangular scrollbar. Make sure the size and position of the scrollbar reflects what part of the full document the browser can see, as in Figure 5. Hide the scrollbar if the whole document fits onscreen.

2-5 Emoji. Add support for emoji to your browser 😀. Emoji are characters, and you can call create_text to draw them, but the results aren’t very good. Instead, head to the OpenMoji project, download the emoji for “grinning face” as a PNG file, resize it to 16 × 16 pixels, and save it to the same folder as the browser. Use Tk’s PhotoImage class to load the image and then the create_image method to draw it to the canvas. In fact, download the whole OpenMoji library (look for the “Get OpenMojis” button at the top right)—then your browser can look up whatever emoji is used in the page.

2-6 about:blank. Currently, a malformed URL causes the browser to crash. It would be much better to have error recovery for that, and instead show a blank page, so that the user can fix the error. To do this, add support for the special about:blank URL, which should just render a blank page, and cause malformed URLs to automatically render as if they were about:blank.

2-7 Alternate text direction. Not all languages read and lay out from left to right. Arabic, Persian, and Hebrew are good examples of right-to-left languages. Implement basic support for this with a command-line flag to your browser.Once we get to Chapter 4 you could instead use the dir attribute on the <body> element. English sentences should still lay out left-to-right, but they should grow from the right side of the screen (load this example in your favorite browser to see what I mean).Sentences in an actual right-to-left language should do the opposite. And then there is vertical writing mode for some East Asian languages like Chinese and Japanese.

Formatting Text

In the last chapter, our browser created a graphical window and drew a grid of characters to it. That’s OK for Chinese, but English text features characters of different widths grouped into words that you can’t break across lines.There are lots of languages in the world, and lots of typographic conventions. A real web browser supports every language from Arabic to Zulu, but this book focuses on English. Text is near-infinitely complex, but this book cannot be infinitely long! In this chapter, we’ll add those capabilities. You’ll even be able to read this chapter in your browser!

What is a Font?

So far, we’ve called create_text with a character and two coordinates to write text to the screen. But we never specified its font, size, or style. To talk about those things, we need to create and use font objects.

What is a font, exactly? Well, in the olden days, printers arranged little metal slugs on rails, covered them with ink, and pressed them to a sheet of paper, creating a printed page (see Figure 1). The metal shapes came in boxes, one per letter, so you’d have a (large) box of e’s, a (small) box of x’s, and so on. The boxes came in cases (see Figure 2), one for upper-case and one for lower-case letters. The set of cases was called a font.The word is related to foundry, which would create the little metal shapes. Naturally, if you wanted to print larger text, you needed different (bigger) shapes, so those were a different font; a collection of fonts was called a type, which is why we call it typing. Variations—like bold or italic letters—were called that type’s “faces”.

Figure 1: A drawing of printing press workers. (By Daniel Nikolaus Chodowiecki. Wikipedia, public domain.)
Figure 2: Metal types in letter cases and a composing stick. (By Willi Heidelbach. Wikipedia, CC BY 2.5.)

This nomenclature reflects the world of the printing press: metal shapes in boxes in cases from different foundries. Our modern world instead has dropdown menus, and the old words no longer match it. “Font” can now mean font, typeface, or type,Let alone “font family”, which can refer to larger or smaller collections of types. and we say a font contains several different weights (like “bold” and “normal”),But sometimes other weights as well, like “light”, “semibold”, “black”, and “condensed”. Good fonts tend to come in many weights. several different styles (like “italic” and “roman”, which is what not-italic is called),Sometimes there are other options as well, like maybe there’s a small-caps version; these are sometimes called options as well. And don’t get me started on automatic versus manual italics. and arbitrary sizes.A font looks especially good at certain sizes where hints tell the computer how best to align it to the pixel grid. Welcome to the world of magic ink.This term comes from an essay by Bret Victor that discusses how the graphical possibilities of computers can make for better and easier-to-use applications.

Yet Tk’s font objects correspond to the older meaning of font: a type at a fixed size, style, and weight. For example:You can only create Font objects, or any other kinds of Tk objects, after calling tkinter.Tk(), and you need to import tkinter.font separately.

import tkinter.font
window = tkinter.Tk()
bi_times = tkinter.font.Font(
    family="Times",
    size=16,
    weight="bold",
    slant="italic",
)

Your computer might not have “Times” installed; you can list the available fonts with tkinter.font.families() and pick something else.

Font objects can be passed to create_text’s font argument:

canvas.create_text(200, 100, text="Hi!", font=bi_times)

In the olden times, American typesetters kept their boxes of metal shapes arranged in a California job case, which combined lower- and upper-case letters side by side in one case, making typesetting easier. The upper-/lower-case nomenclature dates from centuries earlier.

Measuring Text

Text takes up space vertically and horizontally, and the font object’s metrics and measure methods measure that space:On your computer, you might get different numbers. That’s right—text rendering is OS-dependent, because it is complex enough that everyone uses one of a few libraries to do it, usually libraries that ship with the OS. That’s why macOS fonts tend to be “blurrier” than the same font on Windows: different libraries make different trade-offs.

>>> bi_times.metrics()
{'ascent': 15, 'descent': 4, 'linespace': 19, 'fixed': 0}
>>> bi_times.measure("Hi!")
24

The metrics call yields information about the vertical dimensions of the text (see Figure 3): the linespace is how tall the text is, which includes an ascent which goes “above the line” and a descent that goes “below the line”.The fixed parameter is actually a boolean and tells you whether all letters are the same width, so it doesn’t really fit here. The ascent and descent matter when words in different sizes sit on the same line: they ought to line up “along the line”, not along their tops or bottoms.

Let’s dig deeper. Remember that bi_times is size-16 Times: why does font.metrics report that it is actually 19 pixels tall? Well, first of all, a size of 16 means 16 points, which are defined as 72nds of an inch, not 16 pixels,Actually, the definition of a “point” is a total mess, with many different length units all called “point” around the world. The Wikipedia page has the details, but a traditional American/British point is actually slightly less than 1/72 of an inch. The 1/72 standard comes from PostScript, but some systems predate it; TeX , for example, hews closer to the traditional point, approximating it as 1/72.27 of an inch. which your monitor probably has around 100 of per inch.Tk doesn’t use points anywhere else in its API. It’s supposed to use pixels if you pass it a negative number, but that doesn’t appear to work. Those 16 points measure not the individual letters but the metal blocks the letters were once carved from, so the letters themselves must be less than 16 points. In fact, different size-16 fonts have letters of varying heights:You might even notice that Times has different metrics in this code block than in the earlier one where we specified a bold, italic Times font. The bold, italic Times font is taller, at least on my current macOS system!

>>> tkinter.font.Font(family="Courier", size=16).metrics()
{'fixed': 1, 'ascent': 13, 'descent': 4, 'linespace': 17}
>>> tkinter.font.Font(family="Times", size=16).metrics()
{'fixed': 0, 'ascent': 14, 'descent': 4, 'linespace': 18}
>>> tkinter.font.Font(family="Helvetica", size=16).metrics()
{'fixed': 0, 'ascent': 15, 'descent': 4, 'linespace': 19}

The measure() method is more direct: it tells you how much horizontal space text takes up, in pixels. This depends on the text, of course, since different letters have different widths:Note that the sum of the individual letters’ lengths is not the length of the word. Tk uses fractional pixels internally, but rounds up to return whole pixels in the measure call. Plus, some fonts use something called kerning to shift letters a little bit when particular pairs of letters are next to one another, or even shaping to make two letters look one glyph.

>>> bi_times.measure("Hi!")
24
>>> bi_times.measure("H")
13
>>> bi_times.measure("i")
5
>>> bi_times.measure("!")
7
>>> 13 + 5 + 7
25

You can use this information to lay text out on the page. For example, suppose you want to draw the text “Hello, world!” in two pieces, so that “world!” is italic. Let’s use two fonts:

font1 = tkinter.font.Font(family="Times", size=16)
font2 = tkinter.font.Font(family="Times", size=16, slant='italic')

We can now lay out the text, starting at (200, 200):

x, y = 200, 200
canvas.create_text(x, y, text="Hello, ", font=font1)
x += font1.measure("Hello, ")
canvas.create_text(x, y, text="world!", font=font2)

You should see “Hello,” and “world!”, correctly aligned and with the second word italicized.

Unfortunately, this code has a bug, though one masked by the choice of example text: replace “world!” with “overlapping!” and the two words will overlap. That’s because the coordinates x and y that you pass to create_text tell Tk where to put the center of the text. It only worked for “Hello, world!” because “Hello,” and “world!” are the same length!

Luckily, the meaning of the coordinate you pass in is configurable. We can instruct Tk to treat the coordinate we gave as the top-left corner of the text by setting the anchor argument to "nw", meaning the “northwest” corner of the text:

x, y = 200, 225
canvas.create_text(x, y, text="Hello, ", font=font1, anchor='nw')
x += font1.measure("Hello, ")
canvas.create_text(x, y, text="overlapping!", font=font2, anchor='nw')

Modify the draw function to set anchor to "nw"; we didn’t need to do that in the previous chapter because all Chinese characters are the same width.

If you find font metrics confusing, you’re not the only one! In 2012, the Michigan Supreme Court heard Stand Up for Democracy v. Secretary of State, a case ultimately about a ballot referendum’s validity that centered on the definition of font size. The court decided (correctly) that font size is the size of the metal blocks that letters were carved from and not the size of the letters themselves.

Word by Word

In Chapter 2, the layout function looped over the text character by character and moved to the next line whenever we ran out of space. That’s appropriate in Chinese, where each character more or less is a word. But in English you can’t move to the next line in the middle of a word. Instead, we need to lay out the text one word at a time:This code splits words on whitespace. It’ll thus break on Chinese, since there won’t be whitespace between words. Real browsers use language-dependent rules for laying out text, including for identifying word boundaries.

def layout(text):
    # ...
    for word in text.split():
        # ...
    return display_list

Unlike Chinese characters, words are different sizes, so we need to measure the width of each word:

import tkinter.font

def layout(text):
    font = tkinter.font.Font()
    # ...
    for word in text.split():
        w = font.measure(word)
    # ...

Here I’ve chosen to use Tk’s default font. Now, if we draw the text at cursor_x, its right end would be at cursor_x + w. That might be past the right edge of the page, and in this case we need to make space by wrapping to the next line:

def layout(text):
    for word in text.split():
        # ...
        if cursor_x + w > WIDTH - HSTEP:
            cursor_y += font.metrics("linespace") * 1.25
            cursor_x = HSTEP

Note that this code block only shows the insides of the for loop. The rest of layout should be left alone. Also, I call metrics with an argument; that just returns the named metric directly. Finally, note that I multiply the linespace by 1.25 when incrementing y. Try removing the multiplier: you’ll see that the text is harder to read because the lines are too close together.Designers say the text is too “tight”. Instead, it is common to add “line spacing” or “leading”So named because in metal type days, thin pieces of lead were placed between the lines to space them out. Lead is a softer metal than what the actual letter pieces were made of, so it could compress a little to keep pressure on the other pieces. Pronounce it “led-ing” not “leed-ing”. between lines. The 25% line spacing is a typical amount.

So now cursor_x and cursor_y have the location to the start of the word, so we add to the display list and update cursor_x to point to the end of the word:

def layout(text):
    for word in text.split():
        # ...
        display_list.append((cursor_x, cursor_y, word))
        cursor_x += w + font.measure(" ")

I increment cursor_x by w + font.measure(" ") instead of w because I want to have spaces between the words: the call to split() removed all of the whitespace, and this adds it back. I don’t add the space to w in the if condition, though, because you don’t need a space after the last word on a line.

Breaking lines in the middle of a word is called hyphenation, and can be turned on via the hyphens CSS property. The state of the art is the Knuth–Liang hyphenation algorithm, which uses a dictionary of word fragments to prioritize possible hyphenation points. At first, the CSS specification was incompatible with this algorithm, but the recent text-wrap-style property fixed that.

Styling Text

Right now, all of the text on the page is drawn with one font. But web pages sometimes specify that text should be bold or italic using the <b> and <i> tags. It’d be nice to support that, but right now, the code resists this: the layout function only receives the text of the page as input, and so has no idea where the bold and italics tags are.

Let’s change lex to return a list of tokens, where a token is either a Text object (for a run of characters outside a tag) or a Tag object (for the contents of a tag). You’ll need to write the Text and Tag classes:If you’re familiar with Python, you might want to use the dataclass library, which makes it easier to define these sorts of utility classes.

class Text:
    def __init__(self, text):
        self.text = text

class Tag:
    def __init__(self, tag):
        self.tag = tag

lex must now gather text into Text and Tag objects:If you’ve done some or all of the exercises in prior chapters, your code will look different. Code snippets in the book always assume you haven’t done the exercises, so you’ll need to port your modifications.

def lex(body):
    out = []
    buffer = ""
    in_tag = False
    for c in body:
        if c == "<":
            in_tag = True
            if buffer: out.append(Text(buffer))
            buffer = ""
        elif c == ">":
            in_tag = False
            out.append(Tag(buffer))
            buffer = ""
        else:
            buffer += c
    if not in_tag and buffer:
        out.append(Text(buffer))
    return out

Here I’ve renamed the text variable to buffer, since it now stores either text or tag contents before they can be used. The name also reminds us that, at the end of the loop, we need to check whether there’s buffered text and what we should do with it. Here, lex dumps any accumulated text as a Text object. Otherwise, if you never saw an angle bracket, you’d return an empty list of tokens. But unfinished tags, like in Hi!<hr, are thrown out.This may strike you as an odd decision: why not finish up the tag for the author? I don’t know, but dropping the tag is what browsers do.

Note that Text and Tag are asymmetric: lex avoids empty Text objects, but not empty Tag objects. That’s because an empty Tag object represents the HTML code <>, while an empty Text object represents no content at all.

Since we’ve modified lex, we are now passing layout not just the text of the page, but also the tags in it. So layout must loop over tokens, not text:

def layout(tokens):
    # ...
    for tok in tokens:
        if isinstance(tok, Text):
            for word in tok.text.split():
                # ...
    # ...

layout can also examine tag tokens to change font when directed by the page. Let’s start with support for weights and styles, with two corresponding variables:

weight = "normal"
style = "roman"

Those variables must change when the bold and italics open and close tags are seen:

if isinstance(tok, Text):
    # ...
elif tok.tag == "i":
    style = "italic"
elif tok.tag == "/i":
    style = "roman"
elif tok.tag == "b":
    weight = "bold"
elif tok.tag == "/b":
    weight = "normal"

Note that this code correctly handles not only <b>bold</b> and <i>italic</i> text, but also <b><i>bold italic</i></b> text.It even handles incorrectly nested tags like <b>b<i>bi</b>i</i>, but it does not handle <b><b>twice</b>bolded</b> text. We’ll return to this in Chapter 6.

The style and weight variables are used to select the font:

if isinstance(tok, Text):
    for word in tok.text.split():
        font = tkinter.font.Font(
            size=16,
            weight=weight,
            slant=style,
        )
        # ...

Since the font is computed in layout but used in draw, we’ll need to add the font used to each entry in the display list:

if isinstance(tok, Text):
    for word in tok.text.split():
        # ...
        display_list.append((cursor_x, cursor_y, word, font))

Make sure to update draw to expect and use this extra font field in display list entries.

Italic fonts were developed in Italy (hence the name) to mimic a cursive handwriting style called “chancery hand”. Non-italic fonts are called roman because they mimic text on Roman monuments. There is an obscure third option: oblique fonts, which look like roman fonts but are slanted.

A Layout Object

With all of these tags, layout has become quite large, with lots of local variables and some complicated control flow. That is one sign that something deserves to be a class, not a function:

class Layout:
    def __init__(self, tokens):
        self.display_list = []

Every local variable in layout then becomes a field of Layout:

self.cursor_x = HSTEP
self.cursor_y = VSTEP
self.weight = "normal"
self.style = "roman"

The core of the old layout is a loop over tokens, and we can move the body of that loop to a method on Layout:

def __init__(self, tokens):
    # ...
    for tok in tokens:
        self.token(tok)

def token(self, tok):
    if isinstance(tok, Text):
        for word in tok.text.split():
            # ...
    elif tok.tag == "i":
        self.style = "italic"
    # ...

In fact, the body of the isinstance(tok, Text) branch can be moved to its own method:

def word(self, word):
    font = tkinter.font.Font(
        size=16,
        weight=self.weight,
        slant=self.style,
    )
    w = font.measure(word)
    # ...

Now that everything has moved out of Browser’s old layout function, it can be replaced with calls into Layout:

class Browser:
    def load(self, url):
        body = url.request()
        tokens = lex(body)
        self.display_list = Layout(tokens).display_list
        self.draw()

When you do big refactors like this, it’s important to work incrementally. It might seem more efficient to change everything at once, but that efficiency brings with it a risk of failure: trying to do so much that you get confused and have to abandon the whole refactor. So take a moment to test that your browser still works before you move on.

Anyway, this refactor isolated all of the text-handling code into its own method, with the main token function just branching on the tag name. Let’s take advantage of the new, cleaner organization to add more tags. With font weights and styles working, size is the next frontier in typographic sophistication. One simple way to change font size is the <small> tag and its deprecated sister tag <big>.In your web design projects, use the CSS font-size property to change text size instead of <big> and <small>. But since we haven’t yet implemented CSS for our browser (see Chapter 6), we’re stuck using tags here.

Our experience with font styles and weights suggests a simple approach that customizes the size field in Layout. It starts out with:

self.size = 12

That variable is used to create the font object:

font = tkinter.font.Font(
    size=self.size,
    weight=self.weight,
    slant=self.style,
)

And we can change the size in <big> and <small> tags by updating this variable:

def token(self, tok):
    # ...
    elif tok.tag == "small":
        self.size -= 2
    elif tok.tag == "/small":
        self.size += 2
    elif tok.tag == "big":
        self.size += 4
    elif tok.tag == "/big":
        self.size -= 4

Try wrapping a whole paragraph in <small>, like you would a bit of fine print, and enjoy your newfound typographical freedom.

All of <b>, <i>, <big>, and <small> date from an earlier, pre-CSS era of the web. Nowadays, CSS can change how an element appears, so visual tag names like <b> and <small> are out of favor. That said, <b>, <i>, and <small> still have some appearance-independent meanings.

Text of Different Sizes

Start mixing font sizes, like <small>a</small><big>A</big>, and you’ll quickly notice a problem with the font size code: the text is aligned along its top, as if it’s hanging from a clothes line. But you know that English text is typically written with all letters aligned at an invisible baseline instead.

Let’s think through how to fix this. If the bigger text is moved up, it would overlap with the previous line, so the smaller text has to be moved down. That means its vertical position has to be computed later, after the big text passes through token. But since the small text comes through the loop first, we need a two-pass algorithm for lines of text: the first pass identifies what words go in the line and computes their x positions, while the second pass vertically aligns the words and computes their y positions (see Figure 4).

Figure 4: How lines are laid out when multiple fonts are involved. All words are drawn using a shared baseline. The ascent and descent of the whole line is then determined by the maximum ascent and descent of all words in the line, and leading is added before and after the line.

Let’s start with phase one. Since one line contains text from many tags, we need a field on Layout to store the line-to-be. That field, line, will be a list, and text will add words to it instead of to the display list. Entries in line will have x but not y positions, since y positions aren’t computed in the first phase:

class Layout:
    def __init__(self, tokens):
        # ...
        self.line = []
        # ...
    
    def word(self, word):
        # ...
        self.line.append((self.cursor_x, word, font))

The new line field is essentially a buffer, where words are held temporarily before they can be placed. The second phase is that buffer being flushed when we’re finished with a line:

class Layout:
    def word(self, word):
        if self.cursor_x + w > WIDTH - HSTEP:
            self.flush()

As usual with buffers, we also need to make sure the buffer is flushed once all tokens are processed:

class Layout:
    def __init__(self, tokens):
        # ...
        self.flush()

This new flush function has three responsibilities:

  1. it must align the words along the baseline (see Figure 5);
  2. it must add all those words to the display list; and
  3. it must update the cursor_x and cursor_y fields.

Here’s what it looks like, step by step:

Figure 5: Aligning the words on a line.

Since we want words to line up “on the line”, let’s start by computing where that line should be. That depends on the tallest word on the line:

def flush(self):
    if not self.line: return
    metrics = [font.metrics() for x, word, font in self.line]
    max_ascent = max([metric["ascent"] for metric in metrics])

The baseline is then max_ascent below self.y—or actually a little more to account for the leading:Actually, 25% leading doesn’t add 25% of the ascent above the ascender and 25% of the descent below the descender. Instead, it adds 12.5% of the line height in both places, which is subtly different when fonts are mixed. But let’s skip that subtlety here.

baseline = self.cursor_y + 1.25 * max_ascent

Now that we know where the line is, we can place each word relative to that line and add it to the display list:

for x, word, font in self.line:
    y = baseline - font.metrics("ascent")
    self.display_list.append((x, y, word, font))

Note how y starts at the baseline, and moves up by just enough to accommodate that word’s ascent. Now cursor_y must move far enough down below baseline to account for the deepest descender:

max_descent = max([metric["descent"] for metric in metrics])
self.cursor_y = baseline + 1.25 * max_descent

Finally, flush must update the Layout’s cursor_x and line fields:

self.cursor_x = HSTEP
self.line = []

Now all the text is aligned along the line, even when text sizes are mixed. Plus, this new flush function is convenient for other line-breaking jobs. For example, in HTML the <br> tagWhich is a self-closing tag, so there’s no </br>. Many tags that are content, instead of annotating it, are like this. Some people like adding a final slash to self-closing tags, as in <br/>, but this is not required in HTML. ends the current line and starts a new one:

def token(self, tok):
    # ...
    elif tok.tag == "br":
        self.flush()

Likewise, paragraphs are defined by the <p> and </p> tags, so </p> also ends the current line:

def token(self, tok):
    # ...
    elif tok.tag == "/p":
        self.flush()
        self.cursor_y += VSTEP

I add a bit extra to cursor_y here to create a little gap between paragraphs.

By this point you should be able to load up your browser and display an example page, which should look something like Figure 6.

Figure 6: Screenshot of a web page demonstrating different text sizes.

Actually, browsers support not only horizontal but also vertical writing systems, like some traditional East Asian writing styles. A particular challenge is Mongolian script, which is written in lines running top to bottom, left to right. Many Mongolian government websites use the script.

Font Caching

Now that you’ve implemented styled text, you’ve probably noticed—unless you’re on macOSWhile we can’t confirm this in the documentation, it seems that the macOS “Core Text” APIs cache fonts more aggressively than Linux and Windows. The optimization described in this section won’t hurt any on macOS, but also won’t improve speed as much as on Windows and Linux.—that on a large web page like this chapter our browser has slowed significantly from the previous chapter. That’s because text layout, and specifically the part where you measure each word, is quite slow.You can profile Python programs by replacing your python3 command with python3 -m cProfile. Look for the lines corresponding to the measure and metrics calls to see how much time is spent measuring text.

Unfortunately, it’s hard to make text measurement much faster. With proportional fonts and complex font features like hinting and kerning, measuring text can require pretty complex computations. But on a large web page, some words likely appear a lot—for example, this chapter includes the word “the” over 200 times. Instead of measuring these words over and over again, we could measure them once, and then cache the results. On normal English text, this usually results in a substantial speedup.

Caching is such a good idea that most text libraries already implement it, typically caching text measurements in each Font object. But since our text method creates a new Font object for each word, the caching is ineffective. To make caching work, we need to reuse Font objects when possible instead of making new ones.

We’ll store our cache in a global FONTS dictionary:

FONTS = {}

The keys to this dictionary will be size/weight/style triples, and the values will be Font objects.Actually, the values are a font object and a tkinter.Label object. This dramatically improves the performance of metrics for some reason, and is recommended by the Python documentation. We can put the caching logic itself in a new get_font function:

def get_font(size, weight, style):
    key = (size, weight, style)
    if key not in FONTS:
        font = tkinter.font.Font(size=size, weight=weight,
            slant=style)
        label = tkinter.Label(font=font)
        FONTS[key] = (font, label)
    return FONTS[key][0]

Then the word method can call get_font instead of creating a Font object directly:

class Layout:
    def word(self, word):
        font = get_font(self.size, self.weight, self.style)
        # ...

Now identical words will use identical fonts and text measurements will hit the cache.

Fonts for scripts like Chinese can be megabytes in size, so they are generally stored on disk and only loaded into memory on demand. That makes font loading slow and caching even more important. Browsers also have extensive caches for measuring, shaping, and rendering text. Because web pages have a lot of text, these caches turn out to be one of the most important parts of speeding up rendering.

Summary

The previous chapter introduced a browser that laid out characters in a grid. Now it does standard English text layout, so:

You can now use our browser to read an essay, a blog post, or even a book!

Outline

The complete set of functions, classes, and methods in our browser should look something like this:

class URL: def __init__(url) def request() class Text: def __init__(text) class Tag: def __init__(tag) def lex(body) FONTS def get_font(size, weight, style) WIDTH, HEIGHT HSTEP, VSTEP class Layout: def __init__(tokens) def token(tok) def flush() def word(word) SCROLL_STEP class Browser: def __init__() def draw() def load(url) def scrolldown(e)

Exercises

3-1 Centered text. The page titles on this book’s website are centered; make your browser do the same for text between <h1 class="title"> and </h1>. Each line has to be centered individually, because different lines will have different lengths.In early HTML there was a <center> tag that did exactly this, but nowadays centering is typically done in CSS, through the text-align property. The approach in this exercise is of course non-standard, and just for learning purposes.

3-2 Superscripts. Add support for the <sup> tag. Text in this tag should be smaller (perhaps half the normal text size) and be placed so that the top of a superscript lines up with the top of a normal letter.

3-3 Soft hyphens. The soft hyphen character, written \N{soft hyphen} in Python, represents a place where the text renderer can, but doesn’t have to, insert a hyphen and break the word across lines. Add support for it.If you’ve done Exercise 1-4 on HTML entities, you might also want to add support for the &shy; entity, which expands to a soft hyphen. If a word doesn’t fit at the end of a line, check if it has soft hyphens, and if so break the word across lines. Remember that a word can have multiple soft hyphens in it, and make sure to draw a hyphen when you break a word. The word “super­cali­fragi­listic­expi­ali­docious” is a good test case.

3-4 Small caps. Make the <abbr> element render text in small caps, like this. Inside an <abbr> tag, lower-case letters should be small, capitalized, and bold, while all other characters (upper case, numbers, etc.) should be drawn in the normal font.

3-5 Preformatted text. Add support for the <pre> tag. Unlike normal paragraphs, text inside <pre> tags doesn’t automatically break lines, and whitespace like spaces and newlines are preserved. Use a fixed-width font like Courier New or SFMono as well. Make sure tags work normally inside <pre> tags: it should be possible to bold some text inside a <pre>. The results will look best if you also do Exercise 1-4.

Constructing an HTML Tree

So far, our browser sees web pages as a stream of open tags, close tags, and text. But HTML is actually a tree, and though the tree structure hasn’t been important yet, it will be central to later features like CSS, JavaScript, and visual effects. So this chapter adds a proper HTML parser and converts the layout engine to use it.

A Tree of Nodes

The HTML treeThis is the tree that is usually called the DOM tree, for Document Object Model. I’ll keep calling it the HTML tree for now. has one node for each open and close tag pair and a node for each span of text.In reality there are other types of nodes too, like comments, doctypes, CDATA sections, and processing instructions. There are even some deprecated types! A simple HTML document showing the structure is shown in Figure 1.

Figure 1: An HTML document, showing tags, text, and the nesting structure.

For our browser to use a tree, tokens need to evolve into nodes. That means adding a list of children and a parent pointer to each one. Here’s the new Text class, representing text at the leaf of the tree:

class Text:
    def __init__(self, text, parent):
        self.text = text
        self.children = []
        self.parent = parent

Since it takes two tags (the open and the close tag) to make a node, let’s rename the Tag class to Element, and make it look like this:

class Element:
    def __init__(self, tag, parent):
        self.tag = tag
        self.children = []
        self.parent = parent

I added a children field to both Text and Element, even though text nodes never have children, for consistency.

Constructing a tree of nodes from source code is called parsing. A parser builds a tree one element or text node at a time. But that means the parser needs to store an incomplete tree as it goes. For example, suppose the parser has so far read this bit of HTML:

<html><video></video><section><h1>This is my webpage

The parser has seen five tags (and one text node). The rest of the HTML will contain more open tags, close tags, and text, but no matter which tokens it sees, no new nodes will be added to the <video> tag, which has already been closed. So that node is “finished”. But the other nodes are unfinished: more children can be added to the <html>, <section>, and <h1> nodes, depending on what HTML comes next—see Figure 2.

Since the parser reads the HTML file from beginning to end, these unfinished tags are always in a certain part of the tree. The unfinished tags have always been opened but not yet closed; they are always later in the source than the finished nodes; and they are always children of other unfinished tags. To leverage these facts, let’s represent an incomplete tree by storing a list of unfinished tags, ordered with parents before children. The first node in the list is the root of the HTML tree; the last node in the list is the most recent unfinished tag.In Python, and most other languages, it’s faster to add and remove from the end of a list, instead of the beginning.

Parsing is a little more complex than lex, so we’re going to want to break it into several functions, organized in a new HTMLParser class. That class can also store the source code it’s analyzing and the incomplete tree:

class HTMLParser:
    def __init__(self, body):
        self.body = body
        self.unfinished = []

Before the parser starts, it hasn’t seen any tags at all, so the unfinished list storing the tree starts empty. But as the parser reads tokens, that list fills up. Let’s start that by aspirationally renaming the lex function we have now to parse:

class HTMLParser:
    def parse(self):
        # ...

We’ll need to do a bit of surgery on parse. Right now parse creates Tag and Text objects and appends them to the out array. We need it to create Element and Text objects and add them to the unfinished tree. Since a tree is a bit more complex than a list, I’ll move the adding-to-a-tree logic to two new methods, add_text and add_tag.

def parse(self):
    text = ""
    in_tag = False
    for c in self.body:
        if c == "<":
            in_tag = True
            if text: self.add_text(text)
            text = ""
        elif c == ">":
            in_tag = False
            self.add_tag(text)
            text = ""
        else:
            text += c
    if not in_tag and text:
        self.add_text(text)
    return self.finish()

The out variable is gone, and note that I’ve also moved the return value to a new finish method, which converts the incomplete tree to the final, complete tree. So: how do we add things to the tree?

HTML derives from a long line of document processing systems. Its predecessor, SGML, traces back to RUNOFF and is a sibling to troff, now used for Linux manual pages. The committee that standardized SGML now works on the .odf, .docx, and .epub formats.

Constructing the Tree

Let’s talk about adding nodes to a tree. To add a text node we add it as a child of the last unfinished node:

class HTMLParser:
    def add_text(self, text):
        parent = self.unfinished[-1]
        node = Text(text, parent)
        parent.children.append(node)

On the other hand, tags are a little more complex since they might be an open or a close tag:

class HTMLParser:
    def add_tag(self, tag):
        if tag.startswith("/"):
            # ...
        else:
            # ...

An open tag adds an unfinished node to the end of the list:

def add_tag(self, tag):
    # ...
    else:
        parent = self.unfinished[-1]
        node = Element(tag, parent)
        self.unfinished.append(node)

A close tag instead finishes the last unfinished node by adding it to the previous unfinished node in the list:

def add_tag(self, tag):
    if tag.startswith("/"):
        node = self.unfinished.pop()
        parent = self.unfinished[-1]
        parent.children.append(node)
    # ...

Once the parser is done, it turns our incomplete tree into a complete tree by just finishing any unfinished nodes:

class HTMLParser:
    def finish(self):
        while len(self.unfinished) > 1:
            node = self.unfinished.pop()
            parent = self.unfinished[-1]
            parent.children.append(node)
        return self.unfinished.pop()

This is almost a complete parser, but it doesn’t quite work at the beginning and end of the document. The very first open tag is an edge case without a parent:

def add_tag(self, tag):
    # ...
    else:
        parent = self.unfinished[-1] if self.unfinished else None
        # ...

The very last tag is also an edge case, because there’s no unfinished node to add it to:

def add_tag(self, tag):
    if tag.startswith("/"):
        if len(self.unfinished) == 1: return
        # ...

Ok, that’s all done. Let’s test our parser out and see how well it works!

The ill-considered JavaScript document.write method allows JavaScript to modify the HTML source code while it’s being parsed! This is actually a bad idea. An implementation of document.write must have the HTML parser stop to execute JavaScript, but that slows down requests for images, CSS, and JavaScript used later in the page. To solve this, modern browsers use speculative parsing to start loading additional resources even before parsing is done.

Debugging a Parser

How do we know our parser does the right thing—that it builds the right tree? Well the place to start is seeing the tree it produces. We can do that with a quick, recursive pretty-printer:

def print_tree(node, indent=0):
    print(" " * indent, node)
    for child in node.children:
        print_tree(child, indent + 2)

Here we’re printing each node in the tree, and using indentation to show the tree structure. Since we need to print each node, it’s worth taking the time to give them a nice printed form, which in Python means defining the __repr__ function:

class Text:
    def __repr__(self):
        return repr(self.text)

class Element:
    def __repr__(self):
        return "<" + self.tag + ">"

In general it’s a good idea to define __repr__ methods for any data objects, and to have those __repr__ methods print all the relevant fields.

Try this out on the web page corresponding to this chapter, parsing the HTML source code and then calling print_tree to visualize it:

body = URL(sys.argv[1]).request()
nodes = HTMLParser(body).parse()
print_tree(nodes)

You’ll see something like this at the beginning:

 <!doctype html>
   '\n'
   <html lang="en-US" xml:lang="en-US">
     '\n'
     <head>
       '\n  '
       <meta charset="utf-8" />

Immediately a couple of things stand out. Let’s start at the top, with the <!doctype html> tag.

This special tag, called a doctype, is always the very first thing in an HTML document. But it’s not really an element at all, nor is it supposed to have a close tag. Our browser won’t be using the doctype for anything, so it’s best to throw it away:Real browsers use doctypes to switch between standards-compliant and legacy parsing and layout modes.

def add_tag(self, tag):
    if tag.startswith("!"): return
    # ...

This ignores all tags that start with an exclamation mark, which not only throws out doctype declarations but also comments, which in HTML are written <!-- comment text -->.

Just throwing out doctypes isn’t quite enough though—if you run your parser now, it will crash. That’s because after the doctype comes a newline, which our parser treats as text and tries to insert into the tree. Except there isn’t a tree, since the parser hasn’t seen any open tags. For simplicity, let’s just have our browser skip whitespace-only text nodes to side-step the problem:Real browsers retain whitespace to correctly render make<span></span>up as one word and make<span> </span>up as two. Our browser won’t. Plus, ignoring whitespace simplifies later chapters by avoiding a special case for whitespace-only text tags.

def add_text(self, text):
    if text.isspace(): return
    # ...

The first part of the parsed HTML tree for the browser.engineering home page now looks something like this:

 <html lang="en-US" xml:lang="en-US">
   <head>
     <meta charset="utf-8" /="">
       <link rel="prefetch" ...>
         <link rel="prefetch" ...>

Our next problem: why’s everything so deeply indented? Why aren’t these open elements ever closed?

In SGML, document type declarations contained a URL which defined the valid tags, and in older versions of HTML that was also recommended. Browsers do use the absence of a document type declaration to identify very old, pre-SGML versions of HTML,There’s also this crazy thing called “almost standards” or “limited quirks” mode, due to a backward-incompatible change in table cell vertical layout. Yes. I don’t need to make these up! but don’t use the URL, so <!doctype html> is the best document type declaration for modern HTML.

Self-closing Tags

Elements like <meta> and <link> are what are called self-closing: these tags don’t surround content, so you don’t ever write </meta> or </link>. Our parser needs special support for them. In HTML, there’s a specific list of these self-closing tags (the specification calls them “void” tags):A lot of these tags are obscure. Browsers also support some additional, obsolete self-closing tags not listed here, like keygen.

SELF_CLOSING_TAGS = [
    "area", "base", "br", "col", "embed", "hr", "img", "input",
    "link", "meta", "param", "source", "track", "wbr",
]

Our parser needs to auto-close tags from this list:

def add_tag(self, tag):
    # ...
    elif tag in self.SELF_CLOSING_TAGS:
        parent = self.unfinished[-1]
        node = Element(tag, parent)
        parent.children.append(node)

This code looks right, but it doesn’t quite work right. Why not? Because our parser is looking for a tag named meta, but it’s finding a tag named “meta name=...”. The self-closing code isn’t triggered because the <meta> tag has attributes.

HTML attributes add information about an element; open tags can have any number of attributes. Attribute values can be quoted, unquoted, or omitted entirely. Let’s focus on basic attribute support, ignoring values that contain whitespace, which are a little complicated.

Since we’re not handling whitespace in values, we can split on whitespace to get the tag name and the attribute–value pairs:

class HTMLParser:
    def get_attributes(self, text):
        parts = text.split()
        tag = parts[0].casefold()
        attributes = {}
        for attrpair in parts[1:]:
            # ...
        return tag, attributes

HTML tag names are case insensitive, as by the way are attribute names, so I case-fold them.Lower-casing text is the wrong way to do case-insensitive comparisons in languages like Cherokee. In HTML specifically, tag names only use the ASCII characters so lower-casing them would be sufficient, but I’m using Python’s casefold function because it’s a good habit to get into. Then, inside the loop, I split each attribute–value pair into a name and a value. The easiest case is an unquoted attribute, where an equal sign separates the two:

def get_attributes(self, text):
    # ...
    for attrpair in parts[1:]:
        if "=" in attrpair:
            key, value = attrpair.split("=", 1)
            attributes[key.casefold()] = value
    # ...

The value can also be omitted, like in <input disabled>, in which case the attribute value defaults to the empty string:

for attrpair in parts[1:]:
    # ...
    else:
        attributes[attrpair.casefold()] = ""

Finally, the value can be quoted, in which case the quotes have to be stripped out:Quoted attributes allow whitespace between the quotes. Parsing that properly requires something like a finite state machine instead of just splitting on whitespace.

if "=" in attrpair:
    # ...
    if len(value) > 2 and value[0] in ["'", "\""]:
        value = value[1:-1]
    # ...

We’ll store these attributes inside Elements:

class Element:
    def __init__(self, tag, attributes, parent):
        self.tag = tag
        self.attributes = attributes
        # ...

That means we’ll need to call get_attributes at the top of add_tag to get the attributes we need to construct an Element.

def add_tag(self, tag):
    tag, attributes = self.get_attributes(tag)
    # ...

Remember to use tag and attribute instead of text in add_tag, and try your parser again:

 <html>
    <head>
      <meta>
      <link>
      <link>
      <link>
      <link>
      <link>
      <meta>

It’s close! Yes, if you print the attributes, you’ll see that attributes with whitespace (like author on one of the meta tags) are mis-parsed as multiple attributes, and the final slash on the self-closing tags is incorrectly treated as an extra attribute. A better parser would fix these issues. But let’s instead leave our parser as is—these issues aren’t going to be a problem for the browser we’re building—and move on to integrating it with our browser.

Putting a slash at the end of self-closing tags, like <br/>, became fashionable when XHTML looked like it might replace HTML, and old-timers like me never broke the habit. But unlike in XML, in HTML self-closing tags are identified by name, not by some special syntax, so the slash is optional.

Using the Node Tree

Right now, the Layout class works token by token; we now want it to go node by node instead. So let’s separate the old token method into two parts: all the cases for open tags will go into a new open_tag method and all the cases for close tags will go into a new close_tag method:The case for text tokens is no longer needed because our browser can just call the existing add_text method directly.

class Layout:
    def open_tag(self, tag):
        if tag == "i":
            self.style = "italic"
        # ...

    def close_tag(self, tag):
        if tag == "i":
            self.style = "roman"
        # ...

Now we need the Layout object to walk the node tree, calling open_tag, close_tag, and text in the right order:

def recurse(self, tree):
    if isinstance(tree, Text):
        for word in tree.text.split():
            self.word(word)
    else:
        self.open_tag(tree.tag)
        for child in tree.children:
            self.recurse(child)
        self.close_tag(tree.tag)

The Layout constructor can now call recurse instead of looping through the list of tokens. We’ll also need the browser to construct the node tree, like this:

class Browser:
    def load(self, url):
        body = url.request()
        self.nodes = HTMLParser(body).parse()
        self.display_list = Layout(self.nodes).display_list
        self.draw()

Run it—the browser should now use the parsed HTML tree.

The doctype syntax is a form of versioning—declaring which version of HTML the web page is using. But in fact, the html value for doctype signals not just a particular version of HTML, but more generally the HTML living standard.It is not expected that any new doctype version for HTML will ever be added again. It’s called a “living standard” because it changes all the time as features are added. The mechanism for these changes is simply browsers shipping new features, not any change to the “version” of HTML. In general, the web is an unversioned platform—new features are often added as enhancements, but only so long as they don’t break existing ones.Features can be removed, but only if they stop being used by the vast majority of sites. This makes it very hard to remove web features compared with other platforms.

Handling Author Errors

The parser now handles HTML pages correctly—at least when the HTML is written by the sorts of goody-two-shoes programmers who remember the <head> tag, close every open tag, and make their bed in the morning. Mere mortals lack such discipline and so browsers also have to handle broken, confusing, headless HTML. In fact, modern HTML parsers are capable of transforming any string of characters into an HTML tree, no matter how confusing the markup.Yes, it’s crazy, and for a few years in the early 2000s the W3C tried to do away with it. They failed.

The full algorithm is, as you might expect, complicated beyond belief, with dozens of ever-more-special cases forming a taxonomy of human error, but one of its nicer features is implicit tags. Normally, an HTML document starts with a familiar boilerplate:

<!doctype html>
<html>
  <head>
  </head>
  <body>
  </body>
</html>

In reality, all six of these tags, except the doctype, are optional: browsers insert them automatically when the web page omits them. Let’s insert implicit tags in our browser via a new implicit_tags function. We’ll want to call it in both add_text and add_tag:

class HTMLParser:
    def add_text(self, text):
        if text.isspace(): return
        self.implicit_tags(None)
        # ...

    def add_tag(self, tag):
        tag, attributes = self.get_attributes(tag)
        if tag.startswith("!"): return
        self.implicit_tags(tag)
        # ...

Note that implicit_tags isn’t called for the ignored whitespace and doctypes. Let’s also call it in finish, to make sure that an <html> and <body> tag are created even for empty strings:

class HTMLParser:
    def finish(self):
        if not self.unfinished:
            self.implicit_tags(None)
        # ...

The argument to implicit_tags is the tag name (or None for text nodes), which we’ll compare to the list of unfinished tags to determine what’s been omitted:

class HTMLParser:
    def implicit_tags(self, tag):
        while True:
            open_tags = [node.tag for node in self.unfinished]
            # ...

implicit_tags has a loop because more than one tag could have been omitted in a row; every iteration around the loop will add just one. To determine which implicit tag to add, if any, requires examining the open tags and the tag being inserted.

Let’s start with the easiest case, the implicit <html> tag. An implicit <html> tag is necessary if the first tag in the document is something other than <html>:

while True:
    # ...
    if open_tags == [] and tag != "html":
        self.add_tag("html")

Both <head> and <body> can also be omitted, but to figure out which it is we need to look at which tag is being added:

while True:
    # ...
    elif open_tags == ["html"] \
         and tag not in ["head", "body", "/html"]:
        if tag in self.HEAD_TAGS:
            self.add_tag("head")
        else:
            self.add_tag("body")

Here, HEAD_TAGS lists the tags that you’re supposed to put into the <head> element:The <script> tag can go in either the head or the body section, but it goes into the head by default.

class HTMLParser:
    HEAD_TAGS = [
        "base", "basefont", "bgsound", "noscript",
        "link", "meta", "title", "style", "script",
    ]

Note that if both the <html> and <head> tags are omitted, implicit_tags is going to insert both of them by going around the loop twice. In the first iteration open_tags is [], so the code adds an <html> tag; then, in the second iteration, open_tags is ["html"], so it adds a <head> tag.These add_tag methods themselves call implicit_tags, which means you can get into an infinite loop if you forget a case. I’ve been careful to make sure that every tag added by implicit_tags doesn’t itself trigger more implicit tags.

Finally, the </head> tag can also be implicit if the parser is inside the <head> and sees an element that’s supposed to go in the <body>:

while True:
    # ...
    elif open_tags == ["html", "head"] and \
         tag not in ["/head"] + self.HEAD_TAGS:
        self.add_tag("/head")

Technically, the </body> and </html> tags can also be implicit. But since our finish function already closes any unfinished tags, that doesn’t need any extra code. So all that’s left for implicit_tags is to exit out of the loop:

while True:
    # ...
    else:
        break

Of course, there are more rules for handling malformed HTML: formatting tags, nested paragraphs, embedded Scalable Vector Graphics (SVG) and MathML, and all sorts of other complexity. Each has complicated rules abounding with edge cases. But let’s end our discussion of handling author errors here.

The rules for malformed HTML may seem arbitrary, and they are: they evolved over years of trying to guess what people “meant” when they wrote that HTML, and are now codified in the HTML parsing standard. Of course, sometimes these rules “guess” wrong—but as so often happens on the web, it’s more important that every browser does the same thing, rather than each trying to guess what the right thing is.

And now for the payoff! Figure 3 shows a screenshot of this book’s website, loaded in our own browser.To be fair, it actually looks about the same with the Chapter 3 browser.

Figure 3: Screenshot of http://browser.engineering/ viewed in this chapter’s version of the browser.

Thanks to implicit tags, you can mostly skip the <html>, <body>, and <head> elements, and they’ll be implicitly added back for you. In fact, the HTML parser’s many states guarantee something stricter than that: every HTML document has exactly one <head> and one <body>, in the expected order.At least, per document. An HTML file that uses frames or templates can have more than one <head> and <body>, but they correspond to different documents.

Summary

This chapter taught our browser that HTML is a tree, not just a flat list of tokens. We added:

The tree structure of HTML is essential to display visually complex web pages, as we will see in the next chapter.

Outline

The complete set of functions, classes, and methods in our browser should look something like this:

class URL: def __init__(url) def request() class Text: def __init__(text, parent) def __repr__() class Element: def __init__(tag, attributes, parent) def __repr__() def print_tree(node, indent) class HTMLParser: SELF_CLOSING_TAGS HEAD_TAGS def __init__(body) def parse() def get_attributes(text) def add_text(text) def add_tag(tag) def implicit_tags(tag) def finish() FONTS def get_font(size, weight, style) WIDTH, HEIGHT HSTEP, VSTEP class Layout: def __init__(tree) def recurse(tree) def open_tag(tag) def close_tag(tag) def flush() def word(word) SCROLL_STEP class Browser: def __init__() def draw() def load(url) def scrolldown(e)

Exercises

4-1 Comments. Update the HTML lexer to support comments. Comments in HTML begin with <!-- and end with -->. However, comments aren’t the same as tags: they can contain any text, including left and right angle brackets. The lexer should skip comments, not generating any token at all. Check: is <!--> a comment, or does it just start one?

4-2 Paragraphs. It’s not clear what it would mean for one paragraph to contain another. Change the parser so that a document like <p>hello<p>world</p> results in two sibling paragraphs instead of one paragraph inside another; real browsers do this too. Do the same for <li> elements, but make sure nested lists are still possible.

4-3 Scripts. JavaScript code embedded in a <script> tag uses the left angle bracket to mean “less than”. Modify your lexer so that the contents of <script> tags are treated specially: no tags are allowed inside <script>, except the </script> close tag.Technically it’s just </script followed by a space, tab, \v, \r, slash, or greater than sign. If you need to talk about </script> tags inside JavaScript code, you have to split it into multiple strings.

4-4Quoted attributes. Quoted attributes can contain spaces and right angle brackets. Fix the lexer so that this is supported properly. Hint: the current lexer is a finite state machine, with two states (determined by in_tag). You’ll need more states.

4-5 Syntax highlighting. Implement the view-source protocol as in Exercise 1-5, but make it syntax-highlight the source code of HTML pages. Keep source code for HTML tags in a normal font, but make text contents bold. If you’ve implemented it, wrap text in <pre> tags as well to preserve line breaks. Hint: subclass the HTML parser and use it to implement your syntax highlighter.

4-6 Mis-nested formatting tags. Extend your HTML parser to support markup like <b>Bold <i>both</b> italic</i>. This requires keeping track of the set of open text formatting elements and inserting implicit open and close tags when text formatting elements are closed in the wrong order. The bold/italic example, for example, should insert an implicit </i> before the </b> and an implicit <i> after it.

Laying Out Pages

So far, layout has been a linear process that handles open tags and close tags independently. But web pages are trees, and look like them: borders and backgrounds visually nest inside one another. To support that, this chapter switches to tree-based layout, where the tree of elements is transformed into a tree of layout objects before drawing. In the process, we’ll make web pages more colorful with backgrounds.

The Layout Tree

Right now, our browser lays out an element’s open and close tags separately. Both tags modify global state, like the cursor_x and cursor_y variables, but they aren’t otherwise connected, and information about the element as a whole, like its width and height, is never computed. That makes it pretty hard to draw a background behind an element, let alone more complicated visual effects. So web browsers structure layout differently.

In a browser, layout is about producing a layout tree, whose nodes are layout objects, each associated with an HTML elementElements like <script> don’t generate layout objects, and some elements generate multiple layout objects (<li> elements have an extra one for the bullet point!), but mostly it’s one layout object each. and each with a size and a position. The browser walks the HTML tree to produce the layout tree, then computes the size and position for each layout object, and finally draws each layout object to the screen.

Let’s start by looking at how the existing Layout class is used:

class Browser:
    def load(self, url):
        # ...
        self.display_list = Layout(self.nodes).display_list
        #...

Here, a Layout object is created briefly and then thrown away. Let’s instead make it the beginning of our layout tree by storing it in a Browser field:

class Browser:
    def load(self, url):
        # ...
        self.document = Layout(self.nodes)
        self.document.layout()
        #...

Note that I’ve renamed the Layout constructor to a layout method, so that constructing a layout object and actually laying it out can be different steps. The constructor now just stores the node it was passed:

class Layout:
    def __init__(self, node):
        self.node = node

So far, we still don’t have a tree—we just have a single Layout object. To make it into a tree, we’ll need to add child and parent pointers. I’m also going to add a pointer to the previous sibling, because that’ll be useful for computing sizes and positions later:

class Layout:
    def __init__(self, node, parent, previous):
        self.node = node
        self.parent = parent
        self.previous = previous
        self.children = []

That said, requiring a parent and previous object now makes it tricky to construct a Layout object in Browser, since the root of the layout tree obviously can’t have a parent. To rectify that, let me add a second kind of layout object to serve as the root of the layout tree.I don’t want to just pass None for the parent, because the root layout object also computes its size and position differently, as we’ll see later in this chapter. I think of that root as the document itself, so let’s call it DocumentLayout:

class DocumentLayout:
    def __init__(self, node):
        self.node = node
        self.parent = None
        self.children = []

    def layout(self):
        child = Layout(self.node, self, None)
        self.children.append(child)
        child.layout()

Note an interesting thing about this new layout method: its role is to create the child layout objects and then recursively call their layout methods. This is a common pattern for constructing trees; we’ll be seeing it a lot throughout this book.

Now when we construct a DocumentLayout object inside load, we’ll be building a tree; a very short tree, more of a stump (just the “document” and the HTML element below it), but a tree nonetheless!

By the way, since we now have DocumentLayout, let’s rename Layout so it’s less ambiguous. I like BlockLayout as a name, because we ultimately want it to represent a block of text, like a paragraph or a heading:

class BlockLayout:
    # ...

Make sure to rename the Layout constructor call in DocumentLayout as well. As always, test your browser and make sure that after all of these refactors, everything still works.

The layout tree isn’t accessible to web developers, so it hasn’t been standardized, and its structure differs between browsers. Even the names don’t match! Chrome calls it a layout tree, Safari a render tree, and Firefox a frame tree.

Block Layout

So far, we’ve focused on text layout—and text is laid out horizontally in lines.In European languages, at least! But web pages are really constructed out of larger blocks, like headings, paragraphs, and menus, that stack vertically one after another. We need to add support for this kind of layout to our browser, and the way we’re going to do that involves expanding on the layout tree we’ve already built.

The core idea is that we’ll have a whole tree of BlockLayout objects (with a DocumentLayout at the root). Some will represent leaf blocks that contain text, and they’ll lay out their contents the way we’ve already implemented. But there will also be new, intermediate BlockLayouts with BlockLayout children, and they will stack their children vertically. (An example is shown in Figure 1.)

Figure 1: An example of an HTML tree and the corresponding layout tree.

To create these intermediate BlockLayout children, we can use a loop like this:

class BlockLayout:
    def layout_intermediate(self):
        previous = None
        for child in self.node.children:
            next = BlockLayout(child, self, previous)
            self.children.append(next)
            previous = next

I’ve called this method layout_intermediate, but only so you can add it to the code right away and then compare it with the existing recurse method.

This code is tricky, so read it carefully. It involves two trees: the HTML tree, which node and child point to; and the layout tree, which self, previous, and next point to. The two trees have similar structure, so it’s easy to get confused. But remember that this code constructs the layout tree from the HTML tree, so it reads from node.children (in the HTML tree) and writes to self.children (in the layout tree).

So we have two ways to lay out an element: either calling recurse and flush, or this layout_intermediate function. To determine which one a layout object should use, we’ll need to know what kind of content its HTML node contains: text and text-related tags like <b>, or blocks like <p> and <h1>. That function looks something like this:

class BlockLayout:
    def layout_mode(self):
        if isinstance(self.node, Text):
            return "inline"
        elif any([isinstance(child, Element) and \
                  child.tag in BLOCK_ELEMENTS
                  for child in self.node.children]):
            return "block"
        elif self.node.children:
            return "inline"
        else:
            return "block"

Here, the list of BLOCK_ELEMENTS is basically what you expect, a list of all the tags that describe blocks and containers:Taken from the HTML living standard.

BLOCK_ELEMENTS = [
    "html", "body", "article", "section", "nav", "aside",
    "h1", "h2", "h3", "h4", "h5", "h6", "hgroup", "header",
    "footer", "address", "p", "hr", "pre", "blockquote",
    "ol", "ul", "menu", "li", "dl", "dt", "dd", "figure",
    "figcaption", "main", "div", "table", "form", "fieldset",
    "legend", "details", "summary"
]

Our layout_mode method has to handle one tricky case, where a node contains both block children like a <p> element and also text children like a text node or a <b> element. It’s probably best to think of this as a kind of error on the part of the web developer. And just like with implicit tags in Chapter 4, we need a repair mechanism to make sense of the situation; I’ve chosen to use block mode in this case.In real browsers, that repair mechanism is called “anonymous block boxes” and is more complex than what’s described here; see Exercise 5-5.

So now BlockLayout can determine what kind of layout to do based on the layout_mode of its HTML node:

class BlockLayout:
    def layout(self):
        mode = self.layout_mode()
        if mode == "block":
            previous = None
            for child in self.node.children:
                next = BlockLayout(child, self, previous)
                self.children.append(next)
                previous = next
        else:
            self.cursor_x = 0
            self.cursor_y = 0
            self.weight = "normal"
            self.style = "roman"
            self.size = 12

            self.line = []
            self.recurse(self.node)
            self.flush()

Finally, since BlockLayouts can now have children, the layout method next needs to recursively call layout so those children can construct their children, and so on recursively:

class BlockLayout:
    def layout(self):
        # ...
        for child in self.children:
            child.layout()

Our browser is now constructing a whole tree of BlockLayout objects; you can use print_tree to see this tree in the Browser’s load method. You’ll see that large web pages like this chapter produce large and complex layout trees! Now we need each of these BlockLayout objects to have a size and position somewhere on the page.

In CSS, the layout mode is set by the display property. The oldest CSS layout modes, like inline and block, are set on the children instead of the parent, which leads to hiccups like anonymous block boxes. Newer properties like inline-block, flex, and grid are set on the parent, which avoids this kind of error.

Size and Position

In the previous chapter, the Layout object was responsible for the whole web page, so it just laid out its content starting at the top of the page. Now that we have multiple BlockLayout objects each containing a different paragraph of text, we’re going to need to do things a little differently, computing a size and position for each layout object independently.

Let’s add x, y, width, and height fields for each layout object type:

class BlockLayout:
    def __init__(self, node, parent, previous):
        # ...
        self.x = None
        self.y = None
        self.width = None
        self.height = None

Do the same for DocumentLayout. Now we need to update the layout method to use these fields.

Let’s start with cursor_x and cursor_y. Instead of having them denote absolute positions on the page, let’s make them relative to the BlockLayout’s x and y. So they now need to start from 0 instead of HSTEP and VSTEP, in both layout and flush:

class BlockLayout:
    def layout(self):
        else:
            self.cursor_x = 0
            self.cursor_y = 0

    def flush(self):
        # ...
        self.cursor_x = 0
        # ...

Since these fields are now relative, we’ll need to add the block’s x and y position in flush when computing the display list:

class BlockLayout:
    def flush(self):
        # ...
        for rel_x, word, font in self.line:
            x = self.x + rel_x
            y = self.y + baseline - font.metrics("ascent")
            self.display_list.append((x, y, word, font))
        # ...

Similarly, to wrap lines, we can’t compare cursor_x to WIDTH, because cursor_x is a relative position while WIDTH is an absolute position; instead, we’ll wrap lines when cursor_x reaches the block’s width:

class BlockLayout:
    def word(self, word):
        # ...
        if self.cursor_x + w > self.width:
            # ...
        # ...

So now that leaves us with the problem of computing these x, y, and width fields. Let’s recall that BlockLayouts represent blocks of text like paragraphs or headings, and are stacked vertically one atop another. That means each one starts at its parent’s left edge and goes all the way across its parent:In the next chapter, we’ll add support for author-defined styles, which in real browsers modify these layout rules by setting custom widths or changing how x and y positions are computed.

class BlockLayout:
    def layout(self):
        self.x = self.parent.x
        self.width = self.parent.width
        # ...

A layout object’s vertical position depends on whether there’s a previous sibling. If there is one, the layout object starts right after it; otherwise, it starts at its parent’s top edge:

class BlockLayout:
    def layout(self):
        if self.previous:
            self.y = self.previous.y + self.previous.height
        else:
            self.y = self.parent.y
        # ...

Finally, height is a little tricky. A BlockLayout that contains other blocks should be tall enough to contain all of its children, so its height should be the sum of its children’s heights:

class BlockLayout:
    def layout(self):
        # ...
        if mode == "block":
            self.height = sum([
                child.height for child in self.children])

However, a BlockLayout that contains text doesn’t have children; instead, it needs to be tall enough to contain all its text, which we can conveniently read off from cursor_y:Since the height is just equal to cursor_y, why not rename cursor_y to height instead? You could, it would work fine, but I would rather not. As you can see from, say, the y computation, the height field is a public field, read by other layout objects to compute their positions. As such, I’d rather make sure it always has the right value, whereas cursor_y changes as we lay out a paragraph of text and therefore sometimes has the “wrong” value. Keeping these two fields separate avoids a whole class of nasty bugs where the height field is read “too soon” and therefore gets the wrong value.

class BlockLayout:
    def layout(self):
        # ...
        else:
            self.height = self.cursor_y

These rules seem simple enough, but there’s a subtlety here I have to explain. Consider the x position. To compute a block’s x position, the x position of its parent block must already have been computed. So a block’s x must therefore be computed before its children’s x. That means the x computation has to go before the recursive layout call.

On the other hand, an element’s height field depends on its children’s heights. So while x must be computed before the recursive call, height has to be computed after. Similarly, since the y position of a block depends on its previous sibling’s y position, the recursive layout calls have to start at the first sibling and iterate through the list forward.

That is, the layout method should perform its steps in this order (see Figure 2):

You can see these steps in action in this widget:

Figure 2: A flowchart showing how widths are computed top-down, from parent to child, while heights are computed bottom-up, from child to parent.

This kind of dependency reasoning is crucial to layout and more broadly to any kind of computation on trees. If you get the order of operations wrong, some layout object will try to read a value that hasn’t been computed yet, and the browser will have a bug. We’ll come back to this issue of dependencies in Chapter 16, where it will become even more important.

DocumentLayout needs some layout code too, though since the document always starts in the same place it’s pretty simple:

class DocumentLayout:
    def layout(self):
        # ...
        self.width = WIDTH - 2*HSTEP
        self.x = HSTEP
        self.y = VSTEP
        child.layout()
        self.height = child.height

Note that there’s some padding around the contents—HSTEP on the left and right, and VSTEP above and below. That’s so the text won’t run into the very edge of the window and get cut off.

Anyway, with all of the sizes and positions now computed correctly, our browser should display all of the text on the page in the right places.

Formally, computations on a tree like this can be described by an attribute grammar. Attribute grammar engines analyze dependencies between different attributes to determine the right order to traverse the tree and calculate each attribute.

Recursive Painting

Our layout method is now doing quite a bit of work: computing sizes and positions; creating child layout objects; recursively laying out those child layout objects; and aggregating the display lists so the text can be drawn to the screen. This is a bit messy, so let’s take a moment to extract just one part of this, the display list part. Along the way, we can stop copying the display list contents over and over again as we go up the layout tree.

I think it’s most convenient to do that by adding a paint function to each layout object, whose return value is the display list entries for that object. Then there is a separate function, paint_tree, that recursively calls paint on all layout objects:

def paint_tree(layout_object, display_list):
    display_list.extend(layout_object.paint())

    for child in layout_object.children:
        paint_tree(child, display_list)

For DocumentLayout, there is nothing to paint:

class DocumentLayout:
    def paint(self):
        return []

You can now delete the line that computes a DocumentLayout’s display_list field.

For a BlockLayout object, we need to copy over the display_list field that it computes during recurse and flush:And again, delete the line that computes a BlockLayout’s display_list field by copying from child layout objects.

class BlockLayout:
    def paint(self):
        return self.display_list

Now the browser can use paint_tree to collect its own display_list variable:

class Browser:
    def load(self, url):
        # ...
        self.display_list = []
        paint_tree(self.document, self.display_list)
        self.draw()

Check it out: our browser is now using fancy tree-based layout! I recommend pausing to test and debug. Tree-based layout is powerful but complex, and we’re about to add more features. Stable foundations make for comfortable houses.

Layout trees are common in graphical user interface (GUI) frameworks, but there are other ways to structure layout, such as constraint-based layout. TeX’s boxes and glue and iOS’s auto-layout are two examples of this alternative paradigm.

Backgrounds

Browsers use the layout tree a lot,For example, in Chapter 7, we’ll use the size and position of each link to figure out which one the user clicked on. and one simple and visually compelling use case is drawing backgrounds.

Backgrounds are rectangles, so our first task is putting rectangles in the display list. Right now, the display list is a list of words to draw to the screen, but we can conceptualize it instead as a list of commands, of which there is currently only one type. We now want two types of commands:

class DrawText:
    def __init__(self, x1, y1, text, font):
        self.top = y1
        self.left = x1
        self.text = text
        self.font = font
    
class DrawRect:
    def __init__(self, x1, y1, x2, y2, color):
        self.top = y1
        self.left = x1
        self.bottom = y2
        self.right = x2
        self.color = color

Now BlockLayout must add DrawText objects for each word it wants to draw, but only in inline mode:Why not change the display_list field inside a BlockLayout to contain DrawText commands directly? I suppose you could, but I think it’s cleaner to create all of the draw commands in paint.

class BlockLayout:
    def paint(self):
        cmds = []
        if self.layout_mode() == "inline":
            for x, y, word, font in self.display_list:
                cmds.append(DrawText(x, y, word, font))
        return cmds

But it can also add a DrawRect command to draw a background. Let’s add a gray background to pre tags (which are used for code examples):

class BlockLayout:
    def paint(self):
        # ...
        if isinstance(self.node, Element) and self.node.tag == "pre":
            x2, y2 = self.x + self.width, self.y + self.height
            rect = DrawRect(self.x, self.y, x2, y2, "gray")
            cmds.append(rect)
        # ...

Make sure this code comes before the loop that adds DrawText objects: the background has to be drawn below that text. Note also that paint_tree calls paint before recursing into the subtree, so the subtree also paints on top of this background, as desired.

With the display list filled out, we need to draw each graphics command. Let’s add an execute method for this. On DrawText it calls create_text:

class DrawText:
    def execute(self, scroll, canvas):
        canvas.create_text(
            self.left, self.top - scroll,
            text=self.text,
            font=self.font,
            anchor='nw')

Note that execute takes the scroll amount as a parameter; this way, each graphics command does the relevant coordinate conversion itself. DrawRect does the same with create_rectangle:

class DrawRect:
    def execute(self, scroll, canvas):
        canvas.create_rectangle(
            self.left, self.top - scroll,
            self.right, self.bottom - scroll,
            width=0,
            fill=self.color)

By default, create_rectangle draws a one-pixel black border, which we don’t want for backgrounds, so make sure to pass width=0.

We still want to skip offscreen graphics commands, so let’s add a bottom field to DrawText so we know when to skip those:

def __init__(self, x1, y1, text, font):
    # ...
    self.bottom = y1 + font.metrics("linespace")

The browser’s draw method now just uses top and bottom to decide which commands to execute:

class Browser:
    def draw(self):
        self.canvas.delete("all")
        for cmd in self.display_list:
            if cmd.top > self.scroll + HEIGHT: continue
            if cmd.bottom < self.scroll: continue
            cmd.execute(self.scroll, self.canvas)

Try your browser on a page—maybe this chapter’s—with code snippets on it. You should see each code snippet set off with a gray background.

Here’s one more cute benefit of tree-based layout: we now record the height of the whole page. The browser can use that to avoid scrolling past the bottom:

def scrolldown(self, e):
    max_y = max(self.document.height + 2*VSTEP - HEIGHT, 0)
    self.scroll = min(self.scroll + SCROLL_STEP, max_y)
    self.draw()

Note the 2*VSTEP, to account for a VSTEP of whitespace at the top and bottom of the page. With layout the browser.engineering homepage now looks a bit better—see Figure 3.

So those are the basics of tree-based layout! In fact, as we’ll see in the next two chapters, this is just one part of the layout tree’s central role in the browser. But before we get to that, we need to add some styling capabilities to our browser.

Figure 3: https://browser.engineering/ viewed in this chapter’s version of the browser.

The draft CSS Painting API allows pages to extend the display list with new types of commands, implemented in JavaScript. This makes it possible to use CSS for styling with visually complex styling provided by a library.

Summary

This chapter was a dramatic rewrite of our browser’s layout engine, so:

Tree-based layout makes it possible to dramatically expand our browser’s styling capabilities. We’ll work on that in the next chapter.

Outline

The complete set of functions, classes, and methods in our browser should look something like this:

class URL: def __init__(url) def request() class Text: def __init__(text, parent) def __repr__() class Element: def __init__(tag, attributes, parent) def __repr__() def print_tree(node, indent) class HTMLParser: SELF_CLOSING_TAGS HEAD_TAGS def __init__(body) def parse() def get_attributes(text) def add_text(text) def add_tag(tag) def implicit_tags(tag) def finish() FONTS def get_font(size, weight, style) WIDTH, HEIGHT HSTEP, VSTEP BLOCK_ELEMENTS class DocumentLayout: def __init__(node) def layout() def paint() class BlockLayout: def __init__(node, parent, previous) def layout_mode() def layout() def recurse(tree) def open_tag(tag) def close_tag(tag) def flush() def word(word) def paint() class DrawText: def __init__(x1, y1, text, font) def execute(scroll, canvas) class DrawRect: def __init__(x1, y1, x2, y2, color) def execute(scroll, canvas) def paint_tree(layout_object, display_list) SCROLL_STEP class Browser: def __init__() def draw() def load(url) def scrolldown(e)

Exercises

5-1 Links bar. At the top and bottom of the web version of each chapter of this book there is a gray bar naming the chapter and offering back and forward links. It is enclosed in a <nav class="links"> tag. Have your browser give this links bar the light gray background a real browser would.

5-2 Hidden head. There’s a good chance your browser is still showing scripts, styles, and page titles at the top of every page you visit. Make it so that the <head> element and its contents are never displayed. Those elements should still be in the HTML tree, but not in the layout tree.

5-3 Bullets. Add bullets to list items, which in HTML are <li> tags. You can make them little squares, located to the left of the list item itself. Also indent <li> elements so the text inside the element is to the right of the bullet point.

5-4 Table of contents. The web version of this book has a table of contents at the top of each chapter, enclosed in a <nav id="toc"> tag, which contains a list of links. Add the text “Table of Contents”, with a gray background, above that list. Don’t modify the lexer or parser.

5-5 Anonymous block boxes. Sometimes, an element has a mix of text-like and container-like children. For example, in this HTML,

<div><i>Hello, </i><b>world!</b><p>So it began...</p></div>

the <div> element has three children: the <i>, <b>, and <p> elements. The first two are text-like; the last is container-like. This is supposed to look like two paragraphs, one for the <i> and <b> and the second for the <p>. Make your browser do that. Specifically, modify BlockLayout so it can be passed a sequence of sibling nodes, instead of a single node. Then, modify the algorithm that constructs the layout tree so that any sequence of text-like elements gets made into a single BlockLayout.

5-6 Run-ins. A “run-in heading” is a heading that is drawn as part of the next paragraph’s text.The exercise names in this section could be considered run-in headings. But since browser support for the display: run-in property is poor, this book actually doesn’t use it; the headings are actually embedded in the next paragraph. Modify your browser to render <h6> elements as run-in headings. You’ll need to implement the previous exercise on anonymous block boxes, and then add a special case for <h6> elements.

Applying Author Styles

In the previous chapter we gave each pre element a gray background. It looks OK, and it is good to have defaults, but sites want a say in how they look. Websites do that with Cascading Style Sheets (CSS), which allow web authors (and, as we’ll see, browser developers) to define how a web page ought to look.

Parsing with Functions

One way a web page can change its appearance is with the style attribute. For example, this changes an element’s background color:

<div style="background-color:lightblue"></div>

More generally, a style attribute contains property–value pairs separated by semicolons. The browser looks at those CSS property–value pairs to determine how an element looks, for example to determine its background color.

To add this to our browser, we’ll need to start by parsing these property–value pairs. I’ll use recursive parsing functions, which are a good way to build a complex parser step by step. The idea is that each parsing function advances through the text being parsed and returns the data it parsed. We’ll have different functions for different types of data, and organize them in a CSSParser class that stores the text being parsed and the parser’s current position in it:

class CSSParser:
    def __init__(self, s):
        self.s = s
        self.i = 0

Let’s start small and build up. A parsing function for whitespace increments the index i past every whitespace character:

def whitespace(self):
    while self.i < len(self.s) and self.s[self.i].isspace():
        self.i += 1

Whitespace is meaningless, so there’s no parsed data to return. But when we parse property names, we’ll want to return them:

def word(self):
    start = self.i
    while self.i < len(self.s):
        if self.s[self.i].isalnum() or self.s[self.i] in "#-.%":
            self.i += 1
        else:
            break
    if not (self.i > start):
        raise Exception("Parsing error")
    return self.s[start:self.i]

This function increments i through any word characters,I’ve chosen the set of word characters here to cover property names (which use letters and the dash), numbers (which use the minus sign, numbers, periods), units (the percent sign), and colors (which use the hash sign). Real CSS values have a more complex syntax but this is enough for our browser. much like whitespace. But to return the parsed data, it stores where it started and extracts the substring it moved through.

Parsing functions can fail. The word function we just wrote raises an exception if i hasn’t advanced through at least one character—otherwise it didn’t point at a word to begin with.You can add error text to the exception-raising code, too; I recommend doing that to help you debug problems. Likewise, to check for a literal colon (or some other punctuation character) you’d do this:

def literal(self, literal):
    if not (self.i < len(self.s) and self.s[self.i] == literal):
        raise Exception("Parsing error")
    self.i += 1

The great thing about parsing functions is that they can build on one another. For example, property–value pairs are a property, a colon, and a value,In reality, properties and values have different syntaxes, so using word for both isn’t quite right, but for our browser’s limited CSS implementation this simplification will do. with whitespace in between:

def pair(self):
    prop = self.word()
    self.whitespace()
    self.literal(":")
    self.whitespace()
    val = self.word()
    return prop.casefold(), val

We can parse sequences by calling parsing functions in a loop. For example, style attributes are a sequence of property–value pairs:

def body(self):
    pairs = {}
    while self.i < len(self.s):
        prop, val = self.pair()
        pairs[prop.casefold()] = val
        self.whitespace()
        self.literal(";")
        self.whitespace()
    return pairs

Now, in a browser, we always have to think about handling errors. Sometimes a web page author makes a mistake; sometimes our browser doesn’t support a feature some other browser does. So we should skip property–value pairs that don’t parse, but keep the ones that do.

We can skip things with this little function; it stops at any one of a set of characters and returns that character (or None if it was stopped by the end of the file):

def ignore_until(self, chars):
    while self.i < len(self.s):
        if self.s[self.i] in chars:
            return self.s[self.i]
        else:
            self.i += 1
    return None

When we fail to parse a property–value pair, we skip either to the next semicolon or to the end of the string:

def body(self):
    # ...
    while self.i < len(self.s):
        try:
            # ...
        except Exception:
            why = self.ignore_until([";"])
            if why == ";":
                self.literal(";")
                self.whitespace()
            else:
                break
    # ...

Skipping parse errors is a double-edged sword. It hides error messages, making it harder for authors to debug their style sheets; it also makes it harder to debug your parser.I suggest removing the try block when debugging. So in most programming situations this “catch-all” error handling is a code smell.

But “catch-all” error handling has an unusual benefit on the web. The web is an ecosystem of many browsers,And an ecosystem of many browser versions, some of which haven’t been written yet—but need to be supported as best we can. which (for example) support different kinds of property values.Our browser does not support parentheses in property values, for example, which real browsers use for things like the calc and url functions. CSS that parses in one browser might not parse in another. With silent parse errors, browsers just ignore stuff they don’t understand, and web pages mostly work in all of them. The principle (variously called “Postel’s Law”,After a line in the specification of TCP, written by Jon Postel. the “Digital Principle”,After a similar idea in circuit design, where transistors must be non-linear to reduce analog noise. or the “Robustness Principle”) is: produce maximally conformant output but accept even minimally conformant input.

This parsing method is formally called recursive descent parsing for an LL(1) language. Parsers that use this method can be really, really fast, at least if you put a lot of work into it. In a browser, faster parsing means pages load faster.

The style Attribute

Now that the style attribute is parsed, we can use that parsed information in the rest of the browser. Let’s do that inside a style function, which saves the parsed style attribute in the node’s style field:

def style(node):
    node.style = {}
    if isinstance(node, Element) and "style" in node.attributes:
        pairs = CSSParser(node.attributes["style"]).body()
        for property, value in pairs.items():
            node.style[property] = value

The method can recurse through the HTML tree to make sure each element gets a style:

def style(node):
    # ...
    for child in node.children:
        style(child)

Call style in the browser’s load method, after parsing the HTML but before doing layout. With the style information stored on each element, the browser can consult it for styling information during paint:

class BlockLayout:
    def paint(self):
        # ...
        bgcolor = self.node.style.get("background-color",
                                      "transparent")
        if bgcolor != "transparent":
            x2, y2 = self.x + self.width, self.y + self.height
            rect = DrawRect(self.x, self.y, x2, y2, bgcolor)
            cmds.append(rect)
        # ...

I’ve removed the default gray background from pre elements for now, but we’ll put it back soon.

Open the web version of this chapter up in your browser to test your code: the code block at the start of the chapter should now have a light blue background.

So this is one way web pages can change their appearance. And in the early days of the web,I’m talking Netscape 3. The late 1990s. something like this was the only way. But honestly, it’s a pain—you need to set a style attribute on each element, and if you redesign the page, that’s a lot of attributes to edit. CSS was invented to improve on this state of affairs:

To achieve these goals, CSS extends the style attribute with two related ideas: selectors and cascading. Selectors describe which HTML elements a list of property–value pairs apply to.CSS rules can also be guarded by “media queries”, which say that a rule should apply only in certain browsing environments (like only on mobile or only in landscape mode). Media queries are super-important for building sites that work across many devices, like reading this book on a phone. We’ll meet them in Chapter 14. The combination of the two is called a rule, as shown in Figure 1.

Figure 1: An annotated CSS rule.

Let’s add support for CSS to our browser. We’ll need to parse CSS files into selectors and property–value pairs, figure out which elements on the page match each selector, and copy those property values to the elements’ style fields.

Actually, before CSS, you’d style pages with custom presentational tags like font and center (not to mention the <b> and <i> tags that we’ve already seen). This was easy to implement but made it hard to keep pages consistent. There were also properties on <body> like text and vlink that could consistently set text colors, mainly for links.

Selectors

Selectors come in lots of types, but in our browser we’ll support two: tag selectors (p selects all <p> elements, ul selects all <ul> elements) and descendant selectors (article div selects all div elements with an article ancestor).The descendant selector associates to the left; in other words, a b c means a <c> that descends from a <b> that descends from an <a>, which maybe you’d write (a b) c if CSS had parentheses.

We’ll have a class for each type of selector to store the selector’s contents, like the tag name for a tag selector:

class TagSelector:
    def __init__(self, tag):
        self.tag = tag

Each selector class will also test whether the selector matches an element:

class TagSelector:
    def matches(self, node):
        return isinstance(node, Element) and self.tag == node.tag

A descendant selector works similarly. It has two parts, which are both themselves selectors:

class DescendantSelector:
    def __init__(self, ancestor, descendant):
        self.ancestor = ancestor
        self.descendant = descendant

Then the matches method is recursive:

class DescendantSelector:
    def matches(self, node):
        if not self.descendant.matches(node): return False
        while node.parent:
            if self.ancestor.matches(node.parent): return True
            node = node.parent
        return False

Now, to create these selector objects, we need a parser. In this case, that’s just another parsing function:Once again, using word here for tag names is actually not quite right, but it’s close enough. One side effect of using word is that a class name selector (like .main) or an identifier selector (like #signup) is mis-parsed as a tag name selector. But, luckily, that won’t cause any harm since there aren’t any elements with those tags.

class CSSParser:
    def selector(self):
        out = TagSelector(self.word().casefold())
        self.whitespace()
        while self.i < len(self.s) and self.s[self.i] != "{":
            tag = self.word()
            descendant = TagSelector(tag.casefold())
            out = DescendantSelector(out, descendant)
            self.whitespace()
        return out

A CSS file is just a sequence of selectors and blocks:

def parse(self):
    rules = []
    while self.i < len(self.s):
        self.whitespace()
        selector = self.selector()
        self.literal("{")
        self.whitespace()
        body = self.body()
        self.literal("}")
        rules.append((selector, body))
    return rules

Once again, let’s pause to think about error handling. First, when we call body while parsing CSS, we need it to stop when it reaches a closing brace:

def body(self):
    # ...
    while self.i < len(self.s) and self.s[self.i] != "}":
        try:
            # ...
        except Exception:
            why = self.ignore_until([";", "}"])
            if why == ";":
                self.literal(";")
                self.whitespace()
            else:
                break
    # ...

Second, there might also be a parse error while parsing a selector. In that case, we want to skip the whole rule:

def parse(self):
    # ...
    while self.i < len(self.s):
        try:
            # ...
        except Exception:
            why = self.ignore_until(["}"])
            if why == "}":
                self.literal("}")
                self.whitespace()
            else:
                break
    # ...

Error handling is hard to get right, so make sure to test your parser, just like the HTML parser in Chapter 4. Here are some errors you might run into:

You can also add a print statement to the start and endIf you print an open parenthesis at the start of the function and a close parenthesis at the end, you can use your editor’s “jump to other parenthesis” feature to skip through output quickly. of each parsing function with the name of the parsing function,If you also add the right number of spaces to each line it’ll be a lot easier to read. Don’t neglect debugging niceties like this! the index i,It can be especially helpful to print, say, the 20 characters around index i from the string. and the parsed data. It’s a lot of output, but it’s a sure-fire way to find really complicated bugs.

A parser receives arbitrary bytes as input, so parser bugs are usually easy for bad actors to exploit. Parser correctness is thus crucial to browser security, as many parser bugs have demonstrated. Nowadays browser developers use fuzzing to try to find and fix such bugs.

Applying Style Sheets

With the parser debugged, the next step is applying the parsed style sheet to the web page. Since each CSS rule can style many elements on the page, this will require looping over all elements and all rules. When a rule applies, its property–value pairs are copied to the element’s style information:

def style(node, rules):
    # ...
    for selector, body in rules:
        if not selector.matches(node): continue
        for property, value in body.items():
            node.style[property] = value
    # ...

Make sure to put this loop before the one that parses the style attribute: the style attribute should override style sheet values.

To try this out, we’ll need a style sheet. Every browser ships with a browser style sheet,Technically called a “user agent” style sheet. User agent, like the Memex. which defines its default styling for the various HTML elements. For our browser, it might look like this:

pre { background-color: gray; }

Let’s store that in a new file, browser.css, and have our browser read it when it starts:

DEFAULT_STYLE_SHEET = CSSParser(open("browser.css").read()).parse()

Now, when the browser loads a web page, it can apply that default style sheet to set up its default styling for each element:

def load(self, url):
    # ...
    rules = DEFAULT_STYLE_SHEET.copy()
    style(self.nodes, rules)
    # ...

The browser style sheet is the default for the whole web. But each web site can also use CSS to set a consistent style for the whole site by referencing CSS files using link elements:

<link rel="stylesheet" href="/main.css">

The mandatory rel attribute identifies this link as a style sheetFor browsers, stylesheet is the most important kind of link, but there’s also preload for loading assets that a page will use later and icon for identifying favicons. Search engines also use these links; for example, rel=canonical names the “true name” of a page and search engines use it to track pages that appear at multiple URLs. and the href attribute has the style sheet URL. We need to find all these links, download their style sheets, and apply them, as in Figure 2.

Since we’ll be doing similar tasks in the next few chapters, let’s generalize a bit and write a recursive function that turns a tree into a list of nodes:

def tree_to_list(tree, list):
    list.append(tree)
    for child in tree.children:
        tree_to_list(child, list)
    return list

I’ve written this helper to work on both HTML and layout trees, for later. We can use tree_to_list with a Python list comprehension to grab the URL of each linked style sheet:It’s kind of crazy, honestly, that Python lets you write things like this—crazy, but very convenient!

def load(self, url):
    # ...
    links = [node.attributes["href"]
             for node in tree_to_list(self.nodes, [])
             if isinstance(node, Element)
             and node.tag == "link"
             and node.attributes.get("rel") == "stylesheet"
             and "href" in node.attributes]
    # ...

Now, these style sheet URLs are usually not full URLs; they are something called relative URLs, which can be:There are other flavors, including query-relative, that I’m skipping.

To download the style sheets, we’ll need to convert each relative URL into a full URL:

class URL:
    def resolve(self, url):
        if "://" in url: return URL(url)
        if not url.startswith("/"):
            dir, _ = self.path.rsplit("/", 1)
            url = dir + "/" + url
        if url.startswith("//"):
            return URL(self.scheme + ":" + url)
        else:
            return URL(self.scheme + "://" + self.host + \
                       ":" + str(self.port) + url)

Also, because of the early web architecture, browsers are responsible for resolving parent directories (..) in relative URLs:

class URL:
    def resolve(self, url):
        if not url.startswith("/"):
            dir, _ = self.path.rsplit("/", 1)
            while url.startswith("../"):
                _, url = url.split("/", 1)
                if "/" in dir:
                    dir, _ = dir.rsplit("/", 1)
            url = dir + "/" + url

Now the browser can request each linked style sheet and add its rules to the rules list:

def load(self, url):
    # ...
    for link in links:
        style_url = url.resolve(link)
        try:
            body = style_url.request()
        except:
            continue
        rules.extend(CSSParser(body).parse())

The try/except ignores style sheets that fail to download, but it can also hide bugs in your code, so if something’s not right try removing it temporarily.

Each browser engine has its own browser style sheet (Chromium, WebKit, Gecko). Reset style sheets are often used to overcome any differences. This works because web page style sheets take precedence over the browser style sheet, just like in our browser, though real browsers fiddle with priorities to make that happen.Our browser style sheet only has tag selectors in it, so just putting them first works well enough. But if the browser style sheet had any descendant selectors, we’d encounter bugs.

Cascading

A web page can now have any number of style sheets applied to it. And since two rules can apply to the same element, rule order matters: it determines which rules take priority, and when one rule overrides another.

In CSS, the correct order is called cascade order, and it is based on the rule’s selector, with file order as a tie breaker. This system allows more specific rules to override more general ones, so that you can have a browser style sheet, a site-wide style sheet, and maybe a special style sheet for a specific web page, all co-existing.

Since our browser only has tag selectors, cascade order just counts them:

class TagSelector:
    def __init__(self, tag):
        # ...
        self.priority = 1

class DescendantSelector:
    def __init__(self, ancestor, descendant):
        # ...
        self.priority = ancestor.priority + descendant.priority

Then cascade order for rules is just those priorities:

def cascade_priority(rule):
    selector, body = rule
    return selector.priority

Now when we call style, we need to sort the rules, like this:

def load(self, url):
    # ...
    style(self.nodes, sorted(rules, key=cascade_priority))
    # ...

Note that before sorting rules, it is in file order. Python’s sorted function keeps the relative order of things with equal priority, so file order acts as a tie breaker, as it should.

That’s it: we’ve added CSS to our web browser! I mean—for background colors. But there’s more to web design than that. For example, if you’re changing background colors you might want to change foreground colors as well—the CSS color property. But there’s a catch: color affects text, and there’s no way to select a text node. How can that work?

Web pages can also supply alternative style sheets, and some browsers provide (obscure) methods to switch from the default to an alternate style sheet. The CSS standard also allows for user styles that set custom style sheets for websites, with a priority between browser and website-provided style sheets.

Inherited styles

The way text styles work in CSS is called inheritance. Inheritance means that if some node doesn’t have a value for a certain property, it uses its parent’s value instead. That includes text nodes. Some properties are inherited and some aren’t; it depends on the property. Background color isn’t inherited, but text color and other font properties are.

Let’s implement inheritance for four font properties: font-size, font-style (for italic), font-weight (for bold), and color:

INHERITED_PROPERTIES = {
    "font-size": "16px",
    "font-style": "normal",
    "font-weight": "normal",
    "color": "black",
}

The values in this dictionary are each property’s defaults. We’ll then add the actual inheritance code to the style function. It has to come before the other loops, since explicit rules should override inheritance:

def style(node, rules):
    # ...
    for property, default_value in INHERITED_PROPERTIES.items():
        if node.parent:
            node.style[property] = node.parent.style[property]
        else:
            node.style[property] = default_value
    # ...

Inheriting font size comes with a twist. Web pages can use percentages as font sizes: h1 { font-size: 150% } makes headings 50% bigger than the surrounding text. But what if you had, say, a code element inside an h1 tag—would that inherit the 150% value for font-size? Surely it shouldn’t be another 50% bigger than the rest of the heading text?

In fact, browsers resolve font size percentages to absolute pixel units before those values are inherited; it’s called a “computed style”.The full CSS standard is a bit more confusing: there are specified, computed, used, and actual values, and they affect lots of CSS properties besides font-size. But we’re not implementing those other properties in this book.

def style(node, rules):
    # ...
    if node.style["font-size"].endswith("%"):
        # ...

    for child in node.children:
        style(child, rules)

Resolving percentage sizes has just one tricky edge case: percentage sizes for the root html element. In that case the percentage is relative to the default font size:This code has to parse and unparse font sizes because our style field stores strings; in a real browser the computed style is stored parsed so this doesn’t have to happen.

def style(node, rules):
    # ...
    if node.style["font-size"].endswith("%"):
        if node.parent:
            parent_font_size = node.parent.style["font-size"]
        else:
            parent_font_size = INHERITED_PROPERTIES["font-size"]
        node_pct = float(node.style["font-size"][:-1]) / 100
        parent_px = float(parent_font_size[:-2])
        node.style["font-size"] = str(node_pct * parent_px) + "px"

Note that this happens after all of the different sources of style values are handled (so we are working with the final font-size value) but before we recurse (so any children can assume that their parent’s font-size has been resolved to a pixel value).

Styling a page can be slow, so real browsers apply tricks like bloom filters for descendant selectors, indices for simple selectors, and various forms of sharing and parallelism. Some types of sharing are also important to reduce memory usage—computed style sheets can be huge!

Font Properties

So now with all these font properties implemented, let’s change layout to use them! That will let us move our default text styles to the browser style sheet:

a { color: blue; }
i { font-style: italic; }
b { font-weight: bold; }
small { font-size: 90%; }
big { font-size: 110%; }

The browser looks up font information in BlockLayout’s word method; we’ll need to change it to use the node’s style field, and for that, we’ll need to pass in the node itself:

class BlockLayout:
    def recurse(self, node):
        if isinstance(node, Text):
            for word in node.text.split():
                self.word(node, word)
        else:
            # ...

    def word(self, node, word):
        weight = node.style["font-weight"]
        style = node.style["font-style"]
        if style == "normal": style = "roman"
        size = int(float(node.style["font-size"][:-2]) * .75)
        font = get_font(size, weight, style)
        # ...

Note that for font-style we need to translate CSS “normal” to Tk “roman” and for font-size we need to convert CSS pixels to Tk points.

Text color requires a bit more plumbing. First, we have to read the color and store it in the current line:

def word(self, node, word):
    color = node.style["color"]
    # ...
    self.line.append((self.cursor_x, word, font, color))
    # ...

The flush method then copies it from line to display_list:

def flush(self):
    # ...
    metrics = [font.metrics() for x, word, font, color in self.line]
    # ...
    for rel_x, word, font, color in self.line:
        # ...
        self.display_list.append((x, y, word, font, color))
    # ...

That display_list is converted to drawing commands in paint:

def paint(self):
    # ...
    if self.layout_mode() == "inline":
        for x, y, word, font, color in self.display_list:
            cmds.append(DrawText(x, y, word, font, color))

DrawText now needs a color argument, and needs to pass it to create_text’s fill parameter:

class DrawText:
    def __init__(self, x1, y1, text, font, color):
        # ...
        self.color = color

    def execute(self, scroll, canvas):
        canvas.create_text(
            # ...
            fill=self.color)

Phew! That was a lot of coordinated changes, so test everything and make sure it works. You should now see links on the web version of this chapter appear in blue—and you might also notice that the rest of the text has become slightly lighter.The main body text on the web is colored #333, or roughly 97% black after gamma correction. Also, now that we’re explicitly setting the text color, we should explicitly set the background color as well:My Linux machine sets the default background color to a light gray, while my macOS laptop has a “Dark Mode” where the default background color becomes a dark gray. Setting the background color explicitly avoids the browser looking strange in these situations.

class Browser:
    def __init__(self):
        # ...
        self.canvas = tkinter.Canvas(
            # ...
            bg="white",
        )
        # ...

These changes obsolete all the code in BlockLayout that handles specific tags, like the style, weight, and size properties and the open_tag and close_tag methods. Let’s refactor a bit to get rid of them:

class BlockLayout:
    def recurse(self, node):
        if isinstance(node, Text):
            for word in node.text.split():
                self.word(node, word)
        else:
            if node.tag == "br":
                self.flush()
            for child in node.children:
                self.recurse(child)

Styling not only lets web page authors style their own web pages; it also moves browser code to a simple style sheet. And that’s a big improvement: the style sheet is simpler and easier to edit. Sometimes converting code to data like this means maintaining a new format, but browsers get to reuse a format, CSS, they need to support anyway.

But of course styling also has the nice benefit of nicely rendering this book’s homepage (Figure 3). Notice how the background is no longer gray, and the links have colors.

Figure 3: https://browser.engineering/ viewed in this chapter’s version of the browser.

Usually a point is 1/72 of an inch while pixel size depends on the screen, but CSS instead defines an inch as 96 pixels, because that was once a common screen resolution. And these CSS pixels need not be physical pixels! Seem weird? This complexity is the result of changes in browsers (zooming) and hardware (high-DPIDots per inch. screens) plus the need to be compatible with older web pages meant for the time when all screens had 96 pixels per inch.

Summary

This chapter implemented a rudimentary but complete styling engine, including downloading, parsing, matching, sorting, and applying CSS files. That means we:

Our styling engine is also relatively easy to extend with properties and selectors.

Outline

The complete set of functions, classes, and methods in our browser should now look something like this:

class URL: def __init__(url) def request() def resolve(url) class Text: def __init__(text, parent) def __repr__() class Element: def __init__(tag, attributes, parent) def __repr__() def print_tree(node, indent) def tree_to_list(tree, list) class HTMLParser: SELF_CLOSING_TAGS HEAD_TAGS def __init__(body) def parse() def get_attributes(text) def add_text(text) def add_tag(tag) def implicit_tags(tag) def finish() class CSSParser: def __init__(s) def whitespace() def literal(literal) def word() def ignore_until(chars) def pair() def selector() def body() def parse() class TagSelector: def __init__(tag) def matches(node) class DescendantSelector: def __init__(ancestor, descendant) def matches(node) FONTS def get_font(size, weight, style) DEFAULT_STYLE_SHEET INHERITED_PROPERTIES def style(node, rules) def cascade_priority(rule) WIDTH, HEIGHT HSTEP, VSTEP BLOCK_ELEMENTS class DocumentLayout: def __init__(node) def layout() def paint() class BlockLayout: def __init__(node, parent, previous) def layout_mode() def layout() def recurse(node) def flush() def word(node, word) def paint() class DrawText: def __init__(x1, y1, text, font, color) def execute(scroll, canvas) class DrawRect: def __init__(x1, y1, x2, y2, color) def execute(scroll, canvas) def paint_tree(layout_object, display_list) SCROLL_STEP class Browser: def __init__() def draw() def load(url) def scrolldown(e)

Exercises

6-1 Fonts. Implement the font-family property, an inheritable property that names which font should be used in an element. Make text inside <code> elements use a nice monospaced font like Courier. Beware the font cache.

6-2 Width/height. Add support for the width and height properties to block layout. These can either be a pixel value, which directly sets the width or height of the layout object, or the word auto, in which case the existing layout algorithm is used.

6-3 Class selectors. Any HTML element can have a class attribute, whose value is a space-separated list of that element’s classes. A CSS class selector, like .main, affects all elements with the main class. Implement class selectors; they should take precedence over tag selectors. If you’ve implemented them correctly, you should see syntax highlighting for the code blocks in this book.

6-4 display. Right now, the layout_mode function relies on a hard-coded list of block elements. In a real browser, the display property controls this. Implement display with a default value of inline, and move the list of block elements to the browser style sheet.

6-5 Shorthand properties CSS “shorthand properties” set multiple related CSS properties at the same time; for example, font: italic bold 100% Times sets the font-style, font-weight, font-size, and font-family properties all at once. Add shorthand properties to your parser. (If you haven’t done Exercise 6-1, just ignore the font-family.)

6-6 Inline style sheets. The <link rel=stylesheet> syntax allows importing an external style sheet (meaning one loaded via its own HTTP request). There is also a way to provide a style sheet inline, as part of the HTML, via the <style> tag—everything up to the following </style> tag is interpreted as a style sheet.Both inline and external stylesheet apply in the order of their appearance in the HTML, though it might be easier to first implement inline style sheets applying after external ones. Inline style sheets are useful for creating self-contained example web pages, but more importantly are a way that websites can load faster by reducing the number of round-trip network requests to the server. Since style sheets typically don’t contain left angle brackets, you can implement this feature without modifying the HTML parser.

6-7 Fast descendant selectors. Right now, matching a selector like div div div div div can take a long time—it’s *O(nd)* in the worst case, where n is the length of the selector and d is the depth of the layout tree. Modify the descendant-selector matching code to run in *O(n + d)* time. It may help to have DescendantSelector store a list of base selectors instead of just two.

6-8 Selector sequences. Sometimes you want to select an element by tag and class. You do this by concatenating the selectors without anything in between.Not even whitespace! For example, span.announce selects elements that match both span and .announce. Implement a new SelectorSequence class to represent these and modify the parser to parse them. Sum priorities.Priorities for SelectorSequences are supposed to compare the number of ID, class, and tag selectors in lexicographic order, but summing the priorities of the selectors in the sequence will work fine as long as no one strings more than ten selectors together.

6-9 !important. A CSS property–value pair can be marked “important” using the !important syntax, like this:

#banner a { color: black !important; }

This gives that property–value pair (but not other pairs in the same block!) a higher priority than any other selector (except for other !important properties). Parse and implement !important, giving any property–value pairs marked this way a priority 10 000 higher than normal property–value pairs.

6-10 :has selectors. The :has selector is the inverse of a descendant selector—it styles an ancestor according to the presence of a descendant. Implement :has selectors. Analyze the asymptotic speed of your implementation. There is a clever implementation that is *O(1)* amortized per element—can you find it?In fact, browsers have to do something even more complex to implement :has efficiently.

Handling Buttons and Links

Our browser is still missing the key insight of hypertext: documents linked together by hyperlinks. It lets us watch the waves, but not surf the web. So in this chapter, we’ll implement hyperlinks, an address bar, and the rest of the browser interface—the part of the browser that decides which page we are looking at.

Where Are the Links?

The core of the web is the link, so the most important part of the browser interface is clicking on links. But before we can quite get to clicking on links, we first need to answer a more fundamental question: where on the screen are the links? Though paragraphs and headings have their sizes and positions recorded in the layout tree, formatted text (like links) does not. We need to fix that.

The big idea is to introduce two new types of layout objects: LineLayout and TextLayout. A BlockLayout will now have a LineLayout child for each line of text, which itself will contain a TextLayout for each word in that line. These new classes can make the layout tree look different from the HTML tree. So to avoid surprises, let’s look at a simple example:

<html>
  <body>
    Here is some text that is
    <br>
    spread across multiple lines
  </body>
</html>

The text in the body element wraps across two lines (because of the br element), so the layout tree will have this structure:

DocumentLayout
  BlockLayout[block] (html element)
    BlockLayout[inline] (body element)
      LineLayout (first line of text)
        TextLayout ("Here")
        TextLayout ("is")
        TextLayout ("some")
        TextLayout ("text")
        TextLayout ("that")
        TextLayout ("is")
      LineLayout (second line of text)
        TextLayout ("spread")
        TextLayout ("across")
        TextLayout ("multiple")
        TextLayout ("lines")

Note how one body element corresponds to a BlockLayout with two LineLayouts inside, and how two text nodes turn into a total of ten TextLayouts!

Let’s get started. Defining LineLayout is straightforward:

class LineLayout:
    def __init__(self, node, parent, previous):
        self.node = node
        self.parent = parent
        self.previous = previous
        self.children = []

TextLayout is only a little more tricky. A single TextLayout refers not to a whole HTML node but to a specific word. That means TextLayout needs an extra argument to know which word that is:

class TextLayout:
    def __init__(self, node, word, parent, previous):
        self.node = node
        self.word = word
        self.children = []
        self.parent = parent
        self.previous = previous

Like the other layout modes, LineLayout and TextLayout will need their own layout and paint methods, but before we get to those we need to think about how the LineLayout and TextLayout objects will be created. That has to happen during word wrapping.

Recall how word wrapping (see Chapter 3) inside BlockLayout’s word method works. That method updates a line field, which stores all the words in the current line:

self.line.append((self.cursor_x, word, font, color))

When it’s time to go to the next line, word calls flush, which computes the location of the line and each word in it, and adds all the words to a display_list field, which stores all the words in the whole inline element. With TextLayout and LineLayout, a lot of this complexity goes away. The LineLayout can compute its own location in its layout method, and instead of a display_list field, each TextLayout can just paint itself like normal. So let’s get started on this refactor.

Let’s start with adding a word to a line. Instead of a line field, we want to create TextLayout objects and add them to LineLayout objects. The LineLayouts are children of the BlockLayout, so the current line can be found at the end of the children array:

class BlockLayout:
    def word(self, node, word):
        line = self.children[-1]
        previous_word = line.children[-1] if line.children else None
        text = TextLayout(node, word, line, previous_word)
        line.children.append(text)

Now let’s think about what happens when we reach the end of the line. The current code calls flush, which does stuff like positioning text and clearing the line field. We don’t want to do all that—we just want to create a new LineLayout object. So let’s use a different method for that:

class BlockLayout:
    def word(self, node, word):
        if self.cursor_x + w > self.width:
            self.new_line()

This new_line method just creates a new line and resets some fields:

class BlockLayout:
    def new_line(self):
        self.cursor_x = 0
        last_line = self.children[-1] if self.children else None
        new_line = LineLayout(self.node, self, last_line)
        self.children.append(new_line)

Now there are a lot of fields we’re not using. Let’s clean them up. In the core layout method, we don’t need to initialize the display_list, cursor_y or line fields, since we won’t be using any of those any more. Instead, we just need to call new_line and recurse:

class BlockLayout:
    def layout(self):
        # ...
        else:
            self.new_line()
            self.recurse(self.node)

The layout method already recurses into its children to lay them out, so that part doesn’t need any change. And moreover, we can now compute the height of a paragraph of text by summing the height of its lines, so this part of the code no longer needs to be different depending on the layout mode:

class BlockLayout:
    def layout(self):
        # ...
        self.height = sum([child.height for child in self.children])

You might also be tempted to delete the flush method, since it’s no longer called from anywhere. But keep it around for just a moment—we’ll need it to write the layout method for line and text objects.

The layout objects generated by a text node need not even be consecutive. English containing a Farsi quotation, for example, can flip from left-to-right to right-to-left in the middle of a line. The text layout objects end up in a surprising order. And then there are languages laid out vertically

Line Layout, Redux

We’re now creating line and text objects, but we still need to lay them out. Let’s start with lines. Lines stack vertically and take up their parent’s full width, so computing x and y and width looks the same as for our other boxes:You could reduce the duplication with some helper methods (or even something more elaborate, like mixin classes), but in a real browser different layout modes support different kinds of extra features (like text direction or margins) and the code looks quite different.

class LineLayout:
    def layout(self):
        self.width = self.parent.width
        self.x = self.parent.x

        if self.previous:
            self.y = self.previous.y + self.previous.height
        else:
            self.y = self.parent.y
        
        # ...

Computing height, though, is different—this is where computing maximum ascents, maximum descents, and so on comes in. Before we do that, let’s look at laying out TextLayouts.

To lay out text we need font metrics, so let’s start by getting the relevant font using the same font-construction code as BlockLayout:

class TextLayout:
    def layout(self):
        weight = self.node.style["font-weight"]
        style = self.node.style["font-style"]
        if style == "normal": style = "roman"
        size = int(float(self.node.style["font-size"][:-2]) * .75)
        self.font = get_font(size, weight, style)

Next, we need to compute the word’s size and x position. We use the font metrics to compute size, and stack words left to right to compute position.

class TextLayout:
    def layout(self):
        # ...

        self.width = self.font.measure(self.word)

        if self.previous:
            space = self.previous.font.measure(" ")
            self.x = self.previous.x + space + self.previous.width
        else:
            self.x = self.parent.x

        self.height = self.font.metrics("linespace")

There’s no code here to compute the y position, however. The vertical position of one word depends on the other words in the same line, so we’ll compute that y position inside LineLayout’s layout method.The y position could have been computed in TextLayout’s layout method—but then that layout method would have to come after the baseline computation, not before. Yet font must be computed before the baseline computation. A real browser might resolve this paradox with multi-phase layout. There are many considerations and optimizations of this kind that are needed to make text layout super fast.

That method will pilfer code from the old flush method. First, let’s lay out each word:

class LineLayout:
    def layout(self):
        # ...
        for word in self.children:
            word.layout()

Next, we need to compute the line’s baseline based on the maximum ascent and descent, using basically the same code as the old flush method:

# ...
max_ascent = max([word.font.metrics("ascent")
                  for word in self.children])
baseline = self.y + 1.25 * max_ascent
for word in self.children:
    word.y = baseline - word.font.metrics("ascent")
max_descent = max([word.font.metrics("descent")
            for word in self.children])

Note that this code is reading from a font field on each word and writing to each word’s y field. That means that inside TextLayout’s layout method we need to compute x, width, height , and font, but not y, exactly how we did it.

Finally, since each line is now a standalone layout object, it needs to have a height. We compute it from the maximum ascent and descent:

# ...
self.height = 1.25 * (max_ascent + max_descent)

So that’s layout for LineLayout and TextLayout. All that’s left is painting. For LineLayout there is nothing to paint:

class LineLayout:
    def paint(self):
        return []

And each TextLayout creates a single DrawText call:

class TextLayout:
    def paint(self):
        color = self.node.style["color"]
        return [DrawText(self.x, self.y, self.word, self.font, color)]

Now we don’t need a display_list field in BlockLayout, and we can also remove the part of BlockLayout’s paint that handles it. Instead, paint_tree can just recurse into its children and paint them. So by adding LineLayout and TextLayout we made BlockLayout quite a bit simpler and shared more code between block and inline layout modes.

So, oof, well, this was quite a bit of refactoring. Take a moment to test everything—it should look exactly identical to how it did before we started this refactor. But while you can’t see it, there’s a crucial difference: each blue link on the page now has an associated layout object and its own size and position.

Actually, text rendering is way more complex than this. Letters can transform and overlap, and the user might want to color certain letters—or parts of letters—a different color. All of this is possible in HTML, and real browsers do implement support for it.

Click Handling

Now that we know where the links are, we can work on clicking them. In Tk, click handling works just like key press handling: you bind an event handler to a certain event. For click handling that event is <Button-1>, button number 1 being the left button on the mouse.Button 2 is the middle button; button 3 is the right-hand button.

class Browser:
    def __init__(self):
        # ...
        self.window.bind("<Button-1>", self.click)

Inside click, we want to figure out what link the user has clicked on. Luckily, the event handler is passed an event object, whose x and y fields refer to where the click happened:

class Browser:
    def click(self, e):
        x, y = e.x, e.y

Now, here, we have to be careful with coordinate systems. Those x and y coordinates are relative to the browser window. Since the canvas is in the top-left corner of the window, those are also the x and y coordinates relative to the canvas. We want the coordinates relative to the web page, so we need to account for scrolling:

class Browser:
    def click(self, e):
        # ...
        y += self.scroll

More generally, handling events like clicks involves reversing the usual rendering pipeline. Normally, rendering goes from elements to layout objects to page coordinates to screen coordinates; click handling goes backward, starting with screen coordinates, then converting to page coordinates, and so on. The correspondence isn’t perfectly reversed in practiceThough see some exercises in this chapter and future ones on making it a closer match. but it’s a worthwhile analogy.

So the next step is to go from page coordinates to a layout object:You could try to first find the paint command clicked on, and go from that to layout object, but in real browsers there are all sorts of reasons this won’t work, starting with invisible objects that can nonetheless be clicked on. See Exercise 7-11.

# ...
objs = [obj for obj in tree_to_list(self.document, [])
        if obj.x <= x < obj.x + obj.width
        and obj.y <= y < obj.y + obj.height]

In principle there might be more than one layout object in this list.In real browsers there are all sorts of ways this could happen, like negative margins. But remember that click handling is the reverse of painting. When we paint, we paint the tree from front to back, so when hit testing we should start at the last element:Real browsers use the z-index property to control which sibling is on top. So real browsers have to compute stacking contexts to resolve what you actually clicked on.

# ...
if not objs: return
elt = objs[-1].node

This elt node is the most specific node that was clicked. With a link, that’s usually going to be a text node. But since we want to know the actual URL the user clicked on, we need to climb back up the HTML tree to find the link element:I wrote this in a kind of curious way so it’s easy to add other types of clickable things—like text boxes and buttons—in Chapter 8.

# ...
while elt:
    if isinstance(elt, Text):
        pass
    elif elt.tag == "a" and "href" in elt.attributes:
        # ...
    elt = elt.parent

Once we find the link element itself, we need to extract the URL and load it:

# ...
elif elt.tag == "a" and "href" in elt.attributes:
    url = self.url.resolve(elt.attributes["href"])
    return self.load(url)

Note that this resolve call requires storing the current page’s URL:

class Browser:
    def __init__(self):
        # ...
        self.url = None

    def load(self, url):
        self.url = url
        # ...

Try it out! You should now be able to click on links and navigate to new web pages.

On mobile devices, a “click” happens over an area, not just at a single point. This is because mobile “taps” are often pretty inaccurate, so clicks should use area, not point information for “hit testing”. This can happen even with a normal mouse click when the click is on a rotated or scaled element.

Multiple Pages

If you’re anything like me, the next thing you tried after clicking on links is middle-clicking them to open in a new tab. Every browser now has tabbed browsing, and honestly it’s a little embarrassing that our browser doesn’t.Back in the day, browser tabs were the feature that would convince friends and relatives to switch from IE 6 to Firefox.

Fundamentally, implementing tabbed browsing requires us to distinguish between the browser itself and the tabs that show individual web pages. The canvas the browser draws to, for example, is shared by all web pages, but the layout tree and display list are specific to one page. We need to tease tabs and browsers apart.

Here’s the plan: the Browser class will own the window and canvas and all related methods, such as event handling. And it’ll also contain a list of Tab objects and the browser chrome. But the web page itself and its associated methods will live in a new Tab class.

To start, rename your existing Browser class to be just Tab, since until now we’ve only handled a single web page:

class Tab:
    # ...

Then we’ll need a new Browser class. It has to store a list of tabs and also which one is active:

class Browser:
    def __init__(self):
        self.tabs = []
        self.active_tab = None

It also owns the window and handles all events:

class Browser:
    def __init__(self):
        self.window = tkinter.Tk()
        self.canvas = tkinter.Canvas(
            # ...
        )
        self.canvas.pack()
        self.window.bind("<Down>", self.handle_down)
        self.window.bind("<Button-1>", self.handle_click)

Remove these lines from Tab’s constructor.

The handle_down and handle_click methods need page-specific information, so these handler methods just forward the event to the active tab:

class Browser:
    def handle_down(self, e):
        self.active_tab.scrolldown()
        self.draw()

    def handle_click(self, e):
        self.active_tab.click(e.x, e.y)
        self.draw()

You’ll need to tweak the Tab’s scrolldown and click methods:

Finally, the Browser’s draw call also calls into the active tab:

class Browser:
    def draw(self):
        self.canvas.delete("all")
        self.active_tab.draw(self.canvas)

Note that clearing the screen is the Browser’s job, not the Tab’s. After that, we only draw the active tab, which is how tabs are supposed to work. Tab’s draw method needs to take the canvas in as an argument:

class Tab:
    def draw(self, canvas):
        # ...

Since the Browser controls the canvas and handles events, it decides when rendering happens and which tab does the drawing. So let’s also remove the draw calls from the load and scrolldown methods. More generally, the Browser is “active” and the Tab is “passive”: all user interactions start at the Browser, which then calls into the tabs as appropriate.

We’re basically done splitting Tab from Browser, and after a refactor like this we need to test things. To do that, we’ll need to create at least one tab, like this:

class Browser:
    def new_tab(self, url):
        new_tab = Tab()
        new_tab.load(url)
        self.active_tab = new_tab
        self.tabs.append(new_tab)
        self.draw()

On startup, you should now create a Browser with one tab:

if __name__ == "__main__":
    import sys
    Browser().new_tab(URL(sys.argv[1]))
    tkinter.mainloop()

Of course, we need a way for the user to switch tabs, create new ones, and so on. Let’s turn to that next.

Browser tabs first appeared in SimulBrowse, which was a kind of custom UI for the Internet Explorer engine.Some people instead attribute tabbed browsing to Booklink’s InternetWorks browser, a browser obscure enough that it doesn’t have a Wikipedia page, though you can see some screenshots on Twitter. However, its tabs were slightly different from the modern conception, more like bookmarks than tabs. SimulBrowse instead used the modern notion of tabs. SimulBrowse (later renamed to NetCaptor) also had ad blocking and a private browsing mode. The old advertisements are a great read!

Browser Chrome

Real web browsers don’t just show web page contents—they’ve got labels and icons and buttons.Oh my! This is called the browser “chrome”;Yep, that predates and inspired the name of Google’s Chrome browser. all of this stuff is drawn by the browser to the same window as the page contents, and it requires information about the browser as a whole (like the list of all tabs), so it has to happen at the browser level, not per tab.

However, a browser’s UI is quite complicated, so let’s put that code in a new Chrome helper class:

class Chrome:
    def __init__(self, browser):
        self.browser = browser

class Browser:
    def __init__(self):
        # ...
        self.chrome = Chrome(self)

Let’s design the browser chrome. Ultimately, I think it should have two rows (see Figure 1):

Figure 1: The intended appearance of the browser chrome.

A lot of this design involves text, so let’s start by picking a font:

class Chrome:
    def __init__(self, browser):
        # ...
        self.font = get_font(20, "normal", "roman")
        self.font_height = self.font.metrics("linespace")

Because different operating systems draw fonts differently, we’ll need to adjust the exact design of the browser chrome based on font metrics. So we’ll need the font_height later.I chose 20px as the font size, but that might be too large on your device. Feel free to adjust.

Using that font height, we can now determine where the tab bar starts and ends:

class Chrome:
    def __init__(self, browser):
        # ...
        self.padding = 5
        self.tabbar_top = 0
        self.tabbar_bottom = self.font_height + 2*self.padding

Note that I’ve added some padding so that text doesn’t run into the edge of the window.

We will store rectangles representing the size of various elements in the browser chrome. For that, a new Rect class will be convenient:

class Rect:
    def __init__(self, left, top, right, bottom):
        self.left = left
        self.top = top
        self.right = right
        self.bottom = bottom

Now, this tab row needs to contain a new-tab button and the tab names themselves.

I’ll add padding around the new-tab button:

class Chrome:
    def __init__(self, browser):
        # ...
        plus_width = self.font.measure("+") + 2*self.padding
        self.newtab_rect = Rect(
           self.padding, self.padding,
           self.padding + plus_width,
           self.padding + self.font_height)

Then the tabs will start padding past the end of the new-tab button. Because the number of tabs can change, I’m not going to store the location of each tab. Instead I’ll just compute their bounds on the fly:

class Chrome:
    def tab_rect(self, i):
        tabs_start = self.newtab_rect.right + self.padding
        tab_width = self.font.measure("Tab X") + 2*self.padding
        return Rect(
            tabs_start + tab_width * i, self.tabbar_top,
            tabs_start + tab_width * (i + 1), self.tabbar_bottom)

Note that I measure the text “Tab X” and use that for all of the tab widths. This is not quite right—in many fonts, numbers like 8 are wider than numbers like 1—but it is close enough, and anyway, the letter X is typically as wide as the widest number.

To actually draw the UI, we’ll first have the browser chrome paint a display list, which the Browser will then draw to the screen:

class Chrome:
    def paint(self):
        cmds = []
        # ...
        return cmds

Let’s start by first painting the new-tab button:

class Chrome:
    def paint(self):
        # ...
        cmds.append(DrawOutline(self.newtab_rect, "black", 1))
        cmds.append(DrawText(
            self.newtab_rect.left + self.padding,
            self.newtab_rect.top,
            "+", self.font, "black"))
        # ...

The DrawOutline command draws a rectangular border:

class DrawOutline:
    def __init__(self, rect, color, thickness):
        self.rect = rect
        self.color = color
        self.thickness = thickness

    def execute(self, scroll, canvas):
        canvas.create_rectangle(
            self.rect.left, self.rect.top - scroll,
            self.rect.right, self.rect.bottom - scroll,
            width=self.thickness,
            outline=self.color)

Next up is drawing the tabs. Python’s enumerate function lets you iterate over both the indices and the contents of an array at the same time. For each tab, we need to create a border on the left and right and then draw the tab name:

class Chrome:
    def paint(self):
        # ...
        for i, tab in enumerate(self.browser.tabs):
            bounds = self.tab_rect(i)
            cmds.append(DrawLine(
                bounds.left, 0, bounds.left, bounds.bottom,
                "black", 1))
            cmds.append(DrawLine(
                bounds.right, 0, bounds.right, bounds.bottom,
                "black", 1))
            cmds.append(DrawText(
                bounds.left + self.padding, bounds.top + self.padding,
                "Tab {}".format(i), self.font, "black"))
        # ...

Finally, to identify which tab is the active tab, we’ve got to make that file folder shape with the current tab sticking up:

class Chrome:
    def paint(self):
        for i, tab in enumerate(self.browser.tabs):
            # ...
            if tab == self.browser.active_tab:
                cmds.append(DrawLine(
                    0, bounds.bottom, bounds.left, bounds.bottom,
                    "black", 1))
                cmds.append(DrawLine(
                    bounds.right, bounds.bottom, WIDTH, bounds.bottom,
                    "black", 1))

The DrawLine command draws a line of a given color and thickness. It’s defined like so:

class DrawLine:
    def __init__(self, x1, y1, x2, y2, color, thickness):
        self.rect = Rect(x1, y1, x2, y2)
        self.color = color
        self.thickness = thickness

    def execute(self, scroll, canvas):
        canvas.create_line(
            self.rect.left, self.rect.top - scroll,
            self.rect.right, self.rect.bottom - scroll,
            fill=self.color, width=self.thickness)

One final thing: we want to make sure that the browser chrome is always drawn on top of the page contents. To guarantee that, we can draw a white rectangle behind the chrome:

class Chrome:
    def __init__(self, browser):
        # ...
        self.bottom = self.tabbar_bottom

    def paint(self):
        # ...
        cmds.append(DrawRect(
            Rect(0, 0, WIDTH, self.bottom),
            "white"))
        cmds.append(DrawLine(
            0, self.bottom, WIDTH,
            self.bottom, "black", 1))
        # ...

Make sure the background is drawn before any other part of the chrome. I also added a line at the bottom of the chrome to separate it from the page. Note how I also changed DrawRect to pass a Rect instead of the four corners; this requires a change to BlockLayout:

class BlockLayout:
    def self_rect(self):
        return Rect(self.x, self.y,
            self.x + self.width, self.y + self.height)

    def paint(self):
        # ...
        if bgcolor != "transparent":
            rect = DrawRect(self.self_rect(), bgcolor)
            cmds.append(rect)
        return cmds

Add a rect field to DrawText and DrawLine too.

Drawing this chrome display list is now straightforward:

class Browser:
    def draw(self):
        # ...
        for cmd in self.chrome.paint():
            cmd.execute(0, self.canvas)

Note that this display list is always drawn at the top of the window, unlike the tab contents (which scroll). Make sure to draw the chrome after the main tab contents, so that the chrome is drawn over it.

However, we also have to make some adjustments to tab drawing to account for the fact that the browser chrome takes up some vertical space. Let’s add a tab_height parameter to Tabs:

class Tab:
    def __init__(self, tab_height):
        # ...
        self.tab_height = tab_height

We can pass it to new_tab:

class Browser:
    def new_tab(self, url):
        new_tab = Tab(HEIGHT - self.chrome.bottom)
        # ...

We can then adjust scrolldown to account for the height of the page content now being tab_height:

class Tab:
    def scrolldown(self):
        max_y = max(
            self.document.height + 2*VSTEP - self.tab_height, 0)
        self.scroll = min(self.scroll + SCROLL_STEP, max_y)

Finally, in Tab’s draw method we need to shift the drawing commands down by the chrome height. I’ll pass the chrome height in as an offset parameter:

class Tab:
    def draw(self, canvas, offset):
        for cmd in self.display_list:
            if cmd.rect.top > self.scroll + self.tab_height:
                continue
            if cmd.rect.bottom < self.scroll: continue
            cmd.execute(self.scroll - offset, canvas)

The Browser’s final draw method now looks like this:

class Browser:
    def draw(self):
        self.canvas.delete("all")
        self.active_tab.draw(self.canvas, self.chrome.bottom)
        for cmd in self.chrome.paint():
            cmd.execute(0, self.canvas)

One more thing: clicking on tabs to switch between them. The Browser handles the click and now needs to delegate clicks on the browser chrome to the Chrome object:

class Browser:
    def handle_click(self, e):
        if e.y < self.chrome.bottom:
            self.chrome.click(e.x, e.y)
        else:
            tab_y = e.y - self.chrome.bottom
            self.active_tab.click(e.x, tab_y)
        self.draw()

Note that we need to subtract out the chrome size when clicking on tab contents. As for clicks on the browser chrome, inside Chrome we need to figure out what the user clicked on. To make that easier, let’s add a quick method to test whether a point is contained in a Rect:

class Rect:
    def containsPoint(self, x, y):
        return x >= self.left and x < self.right \
            and y >= self.top and y < self.bottom

We use this method to handle clicks inside Chrome, and then use it to choose between clicking to add a tab or select an open tab.

class Chrome:
    def click(self, x, y):
        if self.newtab_rect.containsPoint(x, y):
            self.browser.new_tab(URL("https://browser.engineering/"))
        else:
            for i, tab in enumerate(self.browser.tabs):
                if self.tab_rect(i).containsPoint(x, y):
                    self.browser.active_tab = tab
                    break

That’s an appropriate “new tab” page, don’t you think? Anyway, you should now be able to load multiple tabs, scroll and click around them independently, and switch tabs by clicking on them.

Google Chrome 1.0 was accompanied by a comic book to pitch its features. There’s a whole chapter about its design ideas and user interface features, many of which stuck around. Even this book’s browser has tabs on top, for example.

Navigation History

Now that we are navigating between pages all the time, it’s easy to get a little lost and forget what web page you’re looking at. An address bar that shows the current URL would help a lot. Let’s make room for it in the chrome:

class Chrome:
    def __init__(self, browser):
        # ...
        self.urlbar_top = self.tabbar_bottom
        self.urlbar_bottom = self.urlbar_top + \
            self.font_height + 2*self.padding
        self.bottom = self.urlbar_bottom

This “URL bar” will contain the back button and the address bar:

class Chrome:
    def __init__(self, browser):
        # ...
        back_width = self.font.measure("<") + 2*self.padding
        self.back_rect = Rect(
            self.padding,
            self.urlbar_top + self.padding,
            self.padding + back_width,
            self.urlbar_bottom - self.padding)

        self.address_rect = Rect(
            self.back_rect.top + self.padding,
            self.urlbar_top + self.padding,
            WIDTH - self.padding,
            self.urlbar_bottom - self.padding)

Painting the back button is straightforward:

class Chrome:
    def paint(self):
        # ...
        cmds.append(DrawOutline(self.back_rect, "black", 1))
        cmds.append(DrawText(
            self.back_rect.left + self.padding,
            self.back_rect.top,
            "<", self.font, "black"))

The address bar needs to get the current tab’s URL from the browser:

class Chrome:
    def paint(self):
        # ...
        cmds.append(DrawOutline(self.address_rect, "black", 1))
        url = str(self.browser.active_tab.url)
        cmds.append(DrawText(
            self.address_rect.left + self.padding,
            self.address_rect.top,
            url, self.font, "black"))

Here, str is a built-in Python function that we can override to correctly convert URL objects to strings:

class URL:
    def __str__(self):
        port_part = ":" + str(self.port)
        if self.scheme == "https" and self.port == 443:
            port_part = ""
        if self.scheme == "http" and self.port == 80:
            port_part = ""
        return self.scheme + "://" + self.host + port_part + self.path

I think the extra logic to hide port numbers is worth it to make the URLs more tidy.

What should happen when the back button is clicked? Well, that tab should go back. Other tabs are not affected. So the Browser has to invoke some method on the current tab to go back:

class Chrome:
    def click(self, x, y):
        # ...
        elif self.back_rect.containsPoint(x, y):
            self.browser.active_tab.go_back()

For the active tab to “go back”, it needs to store a “history” of which pages it’s visited before:

class Tab:
    def __init__(self, tab_height):
        # ...
        self.history = []

The history grows every time we go to a new page:

class Tab:
    def load(self, url):
        self.history.append(url)
        # ...

Going back uses that history. You might think to write this:

class Tab:
    def go_back(self):
        if len(self.history) > 1:
            self.load(self.history[-2])

That’s almost correct, but it doesn’t work if you click the back button twice, because load adds to the history. Instead, we need to do something more like this:

class Tab:
    def go_back(self):
        if len(self.history) > 1:
            self.history.pop()
            back = self.history.pop()
            self.load(back)

Now, going back shrinks the history and clicking on links grows it, as it should.

So we’ve now got a pretty good web browser for reading this very book: you can click links, browse around, and even have multiple chapters open simultaneously for cross-referencing things. But it’s a little hard to visit a website not linked to from the current one.

A browser’s navigation history can contain sensitive information about which websites a user likes visiting, so keeping it secure is important. Surprisingly, this is pretty hard, because CSS features like the :visited selector can be used to check whether a URL has been visited before. For this reason, there are efforts to restrict :visited.

Editing the URL

One way to go to another page is by clicking on a link. But most browsers also allow you to type into the address bar to visit a new URL, if you happen to know the URL.

Take a moment to notice the complex ritual of typing in an address (see Figure 2):

Figure 2: Screenshots of editing in the address bar in Apple Safari 16.6.

These steps suggest that the browser stores the contents of the address bar separately from the url field, and also that there’s some state to say whether you’re currently typing into the address bar. Let’s call the contents address_bar and the state focus:

class Chrome:
    def __init__(self, browser):
        # ...
        self.focus = None
        self.address_bar = ""

Clicking on the address bar should set focus and clicking outside it should clear focus:

class Chrome:
    def click(self, x, y):
        self.focus = None
        # ...
        elif self.address_rect.containsPoint(x, y):
            self.focus = "address bar"
            self.address_bar = ""

Note that clicking on the address bar also clears the address bar contents. That’s not quite what a real browser does, but it’s pretty close, and it lets us skip adding text selection.

Now, when we draw the address bar, we need to check whether to draw the current URL or the currently typed text:

class Chrome:
    def paint(self):
        # ...
        if self.focus == "address bar":
            cmds.append(DrawText(
                self.address_rect.left + self.padding,
                self.address_rect.top,
                self.address_bar, self.font, "black"))
        else:
            url = str(self.browser.active_tab.url)
            cmds.append(DrawText(
                self.address_rect.left + self.padding,
                self.address_rect.top,
                url, self.font, "black"))

When the user is typing in the address bar, let’s also draw a cursor. Making states (like focus) visible on the screen (like with the cursor) makes software easier to use:

class Chrome:
    def paint(self):
        # ...
        if self.focus == "address bar":
            # ...
            w = self.font.measure(self.address_bar)
            cmds.append(DrawLine(
                self.address_rect.left + self.padding + w,
                self.address_rect.top,
                self.address_rect.left + self.padding + w,
                self.address_rect.bottom,
                "red", 1))

Next, when the address bar is focused, we need to support typing in a URL. In Tk, you can bind to <Key> to capture all key presses. The event object’s char field contains the character the user typed.

class Browser:
    def __init__(self):
        # ...
        self.window.bind("<Key>", self.handle_key)

    def handle_key(self, e):
        if len(e.char) == 0: return
        if not (0x20 <= ord(e.char) < 0x7f): return
        self.chrome.keypress(e.char)
        self.draw()

This handle_key handler starts with some conditions: <Key> fires for every key press, not just regular letters, so we want to ignore cases where no character is typed (a modifier key is pressed) or the character is outside the ASCII range (which can represent the arrow keys or function keys). For now let’s have the Browser send all key presses to Chrome and then call draw() so that the new letters actually show up.

Then Chrome can check focus and add on to address_bar:

class Chrome:
    def keypress(self, char):
        if self.focus == "address bar":
            self.address_bar += char

Finally, once the new URL is entered, we need to handle the “Enter” key, which Tk calls <Return>, and actually send the browser to the new address:

class Chrome:
    def enter(self):
        if self.focus == "address bar":
            self.browser.active_tab.load(URL(self.address_bar))
            self.focus = None

class Browser:
    def __init__(self):
        # ...
        self.window.bind("<Return>", self.handle_enter)

    def handle_enter(self, e):
        self.chrome.enter()
        self.draw()

So there—after a long chapter, you can now unwind a bit by surfing the web.

Text editing is surprisingly complex, and can be pretty tricky to implement well, especially for languages other than English. And nowadays URLs can be written in any language, though modern browsers restrict this somewhat for security reasons.

Summary

It’s been a lot of work just to handle links! We had to:

Now just imagine all the features you can add to your browser!

And here’s the lab 7 browser. Try using the browser chrome—it works! Our browser is starting to look like a real one:

Outline

The complete set of functions, classes, and methods in our browser should now look something like this:

class URL: def __init__(url) def request() def resolve(url) def __str__() class Text: def __init__(text, parent) def __repr__() class Element: def __init__(tag, attributes, parent) def __repr__() def print_tree(node, indent) def tree_to_list(tree, list) class HTMLParser: SELF_CLOSING_TAGS HEAD_TAGS def __init__(body) def parse() def get_attributes(text) def add_text(text) def add_tag(tag) def implicit_tags(tag) def finish() class CSSParser: def __init__(s) def whitespace() def literal(literal) def word() def ignore_until(chars) def pair() def selector() def body() def parse() class TagSelector: def __init__(tag) def matches(node) class DescendantSelector: def __init__(ancestor, descendant) def matches(node) FONTS def get_font(size, weight, style) DEFAULT_STYLE_SHEET INHERITED_PROPERTIES def style(node, rules) def cascade_priority(rule) WIDTH, HEIGHT HSTEP, VSTEP class Rect: def __init__(left, top, right, bottom) def containsPoint(x, y) BLOCK_ELEMENTS class DocumentLayout: def __init__(node) def layout() def paint() class BlockLayout: def __init__(node, parent, previous) def layout_mode() def layout() def recurse(node) def new_line() def word(node, word) def self_rect() def paint() class LineLayout: def __init__(node, parent, previous) def layout() def paint() class TextLayout: def __init__(node, word, parent, previous) def layout() def paint() class DrawText: def __init__(x1, y1, text, font, color) def execute(scroll, canvas) class DrawRect: def __init__(rect, color) def execute(scroll, canvas) class DrawLine: def __init__(x1, y1, x2, y2, color, thickness) def execute(scroll, canvas) class DrawOutline: def __init__(rect, color, thickness) def execute(scroll, canvas) def paint_tree(layout_object, display_list) SCROLL_STEP class Tab: def __init__(tab_height) def load(url) def draw(canvas, offset) def scrolldown() def click(x, y) def go_back() class Chrome: def __init__(browser) def tab_rect(i) def paint() def click(x, y) def keypress(char) def enter() class Browser: def __init__() def draw() def new_tab(url) def handle_down(e) def handle_click(e) def handle_key(e) def handle_enter(e)

Exercises

7-1 Backspace. Add support for the backspace key when typing in the address bar. Honestly, do this exercise just for your sanity.

7-2 Middle-click. Add support for middle-clicking on a link (Button-2) to open it in a new tab. You might want to use a mouse when testing.

7-3 Window title. Browsers set their window title to the contents of the current tab’s <title> element. Make your browser do the same. (You can call the title method of Browser.window to change the window title.)

7-4 Forward. Add a forward button, which should undo the back button. If the most recent navigation action wasn’t a back button, the forward button shouldn’t do anything.To accomplish this, you’ll need to keep around history items when clicking the back button, and store an index into it for the current page, instead of removing them entirely from the array. Draw it in gray in that case, so the user isn’t stuck wondering why it doesn’t work. Also draw the back button in gray if there’s nowhere to go back to.

7-5 Fragments. URLs can contain a fragment, which comes at the end of a URL and is separated from the path by a hash sign #. When the browser navigates to a URL with a fragment, it should scroll the page so that the element with that identifier is at the top of the screen. Also, implement fragment links: relative URLs that begin with a # don’t load a new page, but instead scroll the element with that identifier to the top of the screen. The table of contents on the web version of this chapter uses fragment links.

7-6 Search. If the user types something that’s not a URL into the address bar, make your browser automatically search for it with a search engine. This usually means going to a special URL. For example, you can search Google by going to https://google.com/search?q=QUERY, where QUERY is the search query with every space replaced by a + sign.Actually, you need to escape lots of punctuation characters in these “query strings”, but that’s kind of orthogonal to this address bar search feature.

7-7 Visited links. In real browsers, links you’ve visited before are usually purple. Implement that feature. You’ll need to store the set of visited URLs, annotate the corresponding HTML elements, and check those annotations when drawing the text.Real browsers support special pseudo-class selectors that select all visited links, which you could implement if you want.

7-8 Bookmarks. Implement basic bookmarks. Add a button to the browser chrome; clicking it should bookmark the page. When you’re looking at a bookmarked page, that bookmark button should look different (maybe yellow?) to remind the user that the page is bookmarked, and clicking it should un-bookmark it. Add a special web page, about:bookmarks, for viewing the list of bookmarks.

7-9 Cursor. Make the left and right arrow keys move the text cursor around the address bar when it is focused. Pressing the backspace key should delete the character before the cursor, and typing other keys should add characters at the cursor. (Remember that the cursor can be before the first character or after the last!)

7-10 Multiple windows. Add support for multiple browser windows in addition to tabs. This will require keeping track of multiple Tk windows and canvases and grouping tabs by their containing window. You’ll also need some way to create a new window, perhaps with a keypress such as Ctrl+N.

7-11 Clicks via the display list. At the moment, our browser converts a click location to page coordinates and then finds the layout object at those coordinates. But you could instead first look up the draw command at that location, and then go from the draw command to the layout object that generated it. Implement this. You’ll need draw commands to know which layout object generated them.Real browsers don’t currently do this, but it’s an attractive possibility: display lists are pure data structures so access to them is easier to optimize or parallelize than the more complicated layout tree.

Sending Information to Servers

So far, our browser has seen the web as read-only—but when you post on Facebook, fill out a survey, or search Google, you’re sending information to servers as well as receiving information from them. In this chapter, we’ll start to transform our browser into a platform for web applications by building out support for HTML forms, the simplest way for a browser to send information to a server.

How Forms Work

HTML forms have a couple of moving parts.

First, in HTML there is a form element, which contains input elements,There are other elements similar to input, such as select and textarea. They work similarly enough; they just represent different kinds of user controls, like dropdowns and multi-line inputs. which in turn can be edited by the user. So a form might be written like this (see results in Figure 1):

<form action="/submit" method="post">
    <p>Name: <input name=name value=1></p>
    <p>Comment: <input name=comment value=2></p>
    <p><button>Submit!</button></p>
</form>
Figure 1: The example form in our browser.

This form contains two text entry boxes called name and comment. When the user goes to this page, they can click on those boxes to edit their values. Then, when they click the button at the end of the form, the browser collects all of the name–value pairs and bundles them into an HTTP POST request (as indicated by the method attribute), sent to the URL given by the form element’s action attribute, with the usual rules of relative URLs—so in this case, /submit. The POST request looks like this:

POST /submit HTTP/1.0
Host: example.org
Content-Length: 16

name=1&comment=2

In other words, it’s a lot like the regular GET requests we’ve already seen, except that it has a body—you’ve already seen HTTP responses with bodies, but requests can have them too. Note the Content-Length header; it’s mandatory for POST requests. The server responds to this request with a web page, just like normal, and the browser then does everything it normally does.

Implementing forms requires extending many parts of the browser, from implementing HTTP POST through new layout objects that draw input elements to handling buttons clicks. That makes it a great starting point for transforming our browser into an application platform, our goal for the next few chapters. Let’s get started implementing it all!

HTML forms were first standardized in HTML+, which also proposed tables, mathematical equations, and text that wraps around images. Amazingly, all three of these technologies survive, but in totally different standards: tables in RFC 1942, equations in MathML, and floating images in CSS 1.0.

Rendering Widgets

First, let’s draw the input areas that the user will type into.Most applications use OS libraries to draw input areas, so that those input areas look like other applications on that OS. But browsers need a lot of control over application styling, so they often draw their own input areas. Input areas are inline content, laid out in lines next to text. So to support inputs we’ll need a new kind of layout object, which I’ll call InputLayout. We can copy TextLayout and use it as a template, though we’ll need to make some quick edits.

First, there’s no word argument to InputLayouts:

class InputLayout:
    def __init__(self, node, parent, previous):
        self.node = node
        self.children = []
        self.parent = parent
        self.previous = previous

Second, input elements usually have a fixed width:

INPUT_WIDTH_PX = 200

class InputLayout:
    def layout(self):
        # ...
        self.width = INPUT_WIDTH_PX
        # ...

The input and button elements need to be visually distinct so the user can find them easily. Our browser’s styling capabilities are limited, so let’s use background color to do that:

input {
    font-size: 16px; font-weight: normal; font-style: normal;
    background-color: lightblue;
}
button {
    font-size: 16px; font-weight: normal; font-style: normal;
    background-color: orange;
}

When the browser paints an InputLayout it needs to draw the background:

class InputLayout:
    def paint(self):
        cmds = []
        bgcolor = self.node.style.get("background-color",
                                      "transparent")
        if bgcolor != "transparent":
            rect = DrawRect(self.self_rect(), bgcolor)
            cmds.append(rect)
        return cmds

It then needs to get the input element’s text contents:

class InputLayout:
    def paint(self):
        # ...
        if self.node.tag == "input":
            text = self.node.attributes.get("value", "")
        elif self.node.tag == "button":
            if len(self.node.children) == 1 and \
               isinstance(self.node.children[0], Text):
                text = self.node.children[0].text
            else:
                print("Ignoring HTML contents inside button")
                text = ""
        # ...

Note that <button> elements can in principle contain complex HTML, not just a text node. That’s too complicated for this chapter, so I’m having the browser print a warning and skip the text in that case.See Exercise 8-8. Finally, we draw that text:

class InputLayout:
    def paint(self):
        # ...
        color = self.node.style["color"]
        cmds.append(
            DrawText(self.x, self.y, text, self.font, color))
        return cmds

By this point in the book, you’ve seen many layout objects, so I’m glossing over these changes. The point is that new layout objects are one common way to extend the browser.

We now need to create some InputLayouts, which we can do in BlockLayout:

class BlockLayout:
    def recurse(self, node):
        if isinstance(node, Text):
            # ...
        else:
            if node.tag == "br":
                self.new_line()
            elif node.tag == "input" or node.tag == "button":
                self.input(node)
            else:
                for child in node.children:
                    self.recurse(child)

Finally, this new input method is similar to the text method, creating a new layout object and adding it to the current line:It’s so similar in fact that they only differ in how they compute w. I’ll resist the temptation to refactor this code until we get to Chapter 15.

class BlockLayout:
    def input(self, node):
        w = INPUT_WIDTH_PX
        if self.cursor_x + w > self.width:
            self.new_line()
        line = self.children[-1]
        previous_word = line.children[-1] if line.children else None
        input = InputLayout(node, line, previous_word)
        line.children.append(input)

        weight = node.style["font-weight"]
        style = node.style["font-style"]
        if style == "normal": style = "roman"
        size = int(float(node.style["font-size"][:-2]) * .75)
        font = get_font(size, weight, style)

        self.cursor_x += w + font.measure(" ")

But actually, there are a couple more complications due to the way we decided to resolve the block-mixed-with-inline-siblings problem (see Chapter 5). One is that if there are no children for a node, we assume it’s a block element. But <input> elements don’t have children, yet must have inline layout or else they won’t draw correctly. Likewise, a <button> does have children, but they are treated specially.This situation is specific to these elements in our browser, but only because they are the only elements with special painting behavior within an inline context. These are also two examples of atomic inlines.

We can fix that with this change to layout_mode to add a second condition for returning “inline”:

class BlockLayout:
    def layout_mode(self):
        # ...
        elif self.node.children or self.node.tag == "input":
            return "inline"
        # ...

The second problem is that, again due to having block siblings, sometimes an <input> or <button> element will create a BlockLayout (which will then create an InputLayout inside). In this case we don’t want to paint the background twice, so let’s add some simple logic to skip painting it in BlockLayout in this case, via a new should_paint method:Recall (see Chapter 5) that we only get into this situation due to the presence of anonymous block boxes. Also, it’s worth noting that there are various other ways that our browser does not fully implement all the complexities of inline painting—one example is that it does not correctly paint nested inlines with different background colors. Inline layout and paint are very complicated in real browsers.

class BlockLayout:
    # ...
    def should_paint(self):
        return isinstance(self.node, Text) or \
            (self.node.tag != "input" and self.node.tag !=  "button")

Add a trivial should_paint method that just returns True to all of the other layout object types. Now we can skip painting objects based on should_paint:

def paint_tree(layout_object, display_list):
    if layout_object.should_paint():
        display_list.extend(layout_object.paint())
    # ...

With these changes the browser should now draw input and button elements as blue and orange rectangles.

The reason buttons surround their contents but input areas don’t is that a button can contain images, styled text, or other content. In a real browser, that relies on the inline-block display mode: a way of putting a block element into a line of text. There’s also an older <input type=button> syntax more similar to text inputs.

Interacting with Widgets

We’ve got input elements rendering, but you can’t edit their contents yet. But of course that’s the whole point! So let’s make input elements work like the address bar does—clicking on one will clear it and let you type into it.

Clearing is easy, another case inside Tab’s click method:

class Tab:
    def click(self, x, y):
        while elt:
            # ...
            elif elt.tag == "input":
                elt.attributes["value"] = ""
            # ...

However, if you try this, you’ll notice that clicking does not actually clear the input element. That’s because the code above updates the HTML tree—but we need to update the layout tree and then the display list for the change to appear on the screen.

Right now, the layout tree and display list are computed in load, but we don’t want to reload the whole page; we just want to redo the styling, layout, paint and draw phases. Together these are called rendering. So let’s extract these phases into a new Tab method, render:

class Tab:
    def load(self, url, payload=None):
        # ...
        self.render()

    def render(self):
        style(self.nodes, sorted(self.rules, key=cascade_priority))
        self.document = DocumentLayout(self.nodes)
        self.document.layout()
        self.display_list = []
        paint_tree(self.document, self.display_list)

For this code to work, you’ll also need to change nodes and rules from local variables in the load method to new fields on a Tab. Note that styling moved from load to render, but downloading the style sheets didn’t—we don’t re-download the style sheetsActually, some changes to the web page could delete existing link nodes or create new ones. Real browsers respond to this correctly, either removing the rules corresponding to deleted link nodes or downloading new style sheets when new link nodes are created. This is tricky to get right, and typing into an input area definitely can’t make such changes, so let’s skip this in our browser. every time you type!

Now when we click an input element and clear its contents, we can call render to redraw the page with the input cleared:

class Tab:
    def click(self, x, y):
        while elt:
            elif elt.tag == "input":
                elt.attributes["value"] = ""
                return self.render()

So that’s clicking in an input area. But typing is harder. Think back to how we implemented the address bar in Chapter 7: we added a focus field that remembered what we clicked on so we could later send it our key presses. We need something like that focus field for input areas, but it’s going to be more complex because the input areas live inside a Tab, not inside the Browser.

Naturally, we will need a focus field on each Tab, to remember which text entry (if any) we’ve recently clicked on:

class Tab:
    def __init__(self):
        # ...
        self.focus = None

Now when we click on an input element, we need to set focus (and clear focus if nothing was found to focus on):

class Tab:
    def click(self, x, y):
        self.focus = None
        # ...
        while elt:
            elif elt.tag == "input":
                self.focus = elt
                # ...

But remember that keyboard input isn’t handled by the Tab—it’s handled by the Browser. So how does the Browser even know when keyboard events should be sent to the Tab? The Browser has to remember that in its own focus field!

In other words, when you click on the web page, the Browser updates its focus field to remember that the user is interacting with the page, not the browser chrome. And if so, it should unfocus (“blur”) the browser chrome:

class Chrome:
    def blur(self):
        self.focus = None
class Browser:
    def handle_click(self, e):
        if e.y < self.chrome.bottom:
            self.focus = None
            # ...
        else:
            self.focus = "content"
            self.chrome.blur()
            # ...
        self.draw()

The if branch that corresponds to clicks in the browser chrome unsets focus, meaning focus is no longer on the page contents, and key presses will thus be sent to the Chrome.

When a key press happens, the Browser either sends it to the address bar or calls the active tab’s keypress method (or neither, if nothing is focused):

class Browser:
    def handle_key(self, e):
        # ...
        if self.chrome.keypress(e.char):
            self.draw()
        elif self.focus == "content":
            self.active_tab.keypress(e.char)
            self.draw()

Here I’ve changed keypress to return true if the browser chrome consumed the key:

class Chrome:
    def keypress(self, char):
        if self.focus == "address bar":
            self.address_bar += char
            return True
        return False

That keypress method then uses the tab’s focus field to put the character in the right text entry:

class Tab:
    def keypress(self, char):
        if self.focus:
            self.focus.attributes["value"] += char
            self.render()

Note that here we call render instead of draw, because we’ve modified the web page and thus need to regenerate the display list instead of just redrawing it to the screen.

Hierarchical focus handling is an important pattern for combining graphical widgets; in a real browser, where web pages can be embedded into one another with iframes,The iframe element allows you to embed one web page into another as a little window. We’ll talk about this more in Chapter 15. the focus tree can be arbitrarily deep.

So now we have user input working with input elements. Before we move on, there is one last tweak that we need to make: drawing the text cursor in the Tab’s render method. This turns out to be harder than expected: the cursor should be drawn by the InputLayout of the focused node, and that means that each node has to know whether or not it’s focused:

class Element:
    def __init__(self, tag, attributes, parent):
        # ...
        self.is_focused = False

Add the same field to Text nodes; they’ll never be focused and never draw cursors, but it’s more convenient if Text and Element have the same fields. We’ll set this when we move focus to an input element:

class Tab:
    def click(self, x, y):
        while elt:
            elif elt.tag == "input":
                elt.attributes["value"] = ""
                if self.focus:
                    self.focus.is_focused = False
                self.focus = elt
                elt.is_focused = True
                return self.render()

Note that we have to un-focus the currently focused element, lest it keep drawing its cursor. Anyway, now we can draw a cursor if an input element is focused:

class InputLayout:
    def paint(self):
        # ...
        if self.node.is_focused:
            cx = self.x + self.font.measure(text)
            cmds.append(DrawLine(
                cx, self.y, cx, self.y + self.height, "black", 1))
        # ...

Now you can click on a text entry, type into it, and modify its value. The next step is submitting the now-filled-out form.

This approach to drawing the text cursor—having the InputLayout draw it—allows visual effects to apply to the cursor, as we’ll see in Chapter 11. But not every browser does it this way. Chrome, for example, keeps track of a global focused element to make sure the cursor can be globally styled.

Submitting Forms

You submit a form by clicking on a button. So let’s add another condition to the big while loop in click:

class Tab:
    def click(self, x, y):
        while elt:
            # ...
            elif elt.tag == "button":
                # ...
            # ...

Once we’ve found the button, we need to find the form that it’s in by walking up the HTML tree:

elif elt.tag == "button":
    while elt:
        if elt.tag == "form" and "action" in elt.attributes:
            return self.submit_form(elt)
        elt = elt.parent

The submit_form method is then in charge of finding all of the input elements, encoding them in the right way, and sending the POST request. First, we look through all the descendents of the form to find input elements:

class Tab:
    def submit_form(self, elt):
        inputs = [node for node in tree_to_list(elt, [])
                  if isinstance(node, Element)
                  and node.tag == "input"
                  and "name" in node.attributes]

For each of those input elements, we need to extract the name attribute and the value attribute, and form encode both of them. Form encoding is how the name–value pairs are formatted in the HTTP POST request. Basically, it is: name, then equal sign, then value; and name–value pairs are separated by ampersands:

class Tab:
    def submit_form(self, elt):
        # ...
        body = ""
        for input in inputs:
            name = input.attributes["name"]
            value = input.attributes.get("value", "")
            body += "&" + name + "=" + value
        body = body[1:]

Here, body initially has an extra & tacked on to the front, which is removed on the last line.

Now, any time you see special syntax like this, you’ve got to ask: what if the name or the value has an equal sign or an ampersand in it? So in fact, “percent encoding” replaces all special characters with a percent sign followed by those characters’ hex codes. For example, a space becomes %20 and a period becomes %2e. Python provides a percent-encoding function as quote in the urllib.parse module:You can write your own percent_encode function using Python’s ord and hex functions if you like. I’m using the standard function for expediency. In Chapter 1, using these library functions would have obscured key concepts, but by this point percent encoding is necessary but not conceptually interesting.

for input in inputs:
    # ...
    name = urllib.parse.quote(name)
    value = urllib.parse.quote(value)
    # ...

Now that submit_form has built a request body, it needs to make a POST request. I’m going to defer that responsibility to the load function, which handles making requests:

def submit_form(self, elt):
    # ...
    url = self.url.resolve(elt.attributes["action"])
    self.load(url, body)

The new payload argument to load is then passed through to request:

def load(self, url, payload=None):
    # ...
    body = url.request(payload)
    # ...

In request, this new argument is used to decide between a GET and a POST request:

class URL:
    def request(self, payload=None):
        # ...
        method = "POST" if payload else "GET"
        # ...
        request = "{} {} HTTP/1.0\r\n".format(method, self.path)
        # ...

If it’s a POST request, the Content-Length header is mandatory:

class URL:
    def request(self, payload=None):
        # ...
        if payload:
            length = len(payload.encode("utf8"))
            request += "Content-Length: {}\r\n".format(length)
        # ...

Note that the Content-Length is the length of the payload in bytes, which might not be equal to its length in letters.Because characters from many languages take up multiple bytes. Finally, after the headers, we send the payload itself:

class URL:
    def request(self, payload=None):
        # ...
        if payload: request += payload
        s.send(request.encode("utf8"))
        # ...

So that’s how the POST request gets sent. Then the server responds with an HTML page and the browser will render it in the totally normal way.Actually, because browsers treat going “back” to a POST-requested page specially (see Exercise 8-5), it’s common to respond to a POST request with a redirect. That’s basically it for forms!

While most form submissions use the form encoding described here, forms with file uploads (using <input type=file>) use a different encoding that includes metadata for each key–value pair (like the file name or file type). There’s also an obscure text/plain encoding option, which uses no escaping and which even the standard warns against using.

How web apps work

So … how do web applications (web apps) use forms? When you use an application from your browser—whether you are registering to vote, looking at pictures of your baby cousin, or checking your email—there are typicallyHere I’m talking in general terms. There are some browser applications without a server, and others where the client code is exceptionally simple and almost all the code is on the server. two programs involved: client code that runs in the browser, and server code that runs on the server. When you click on things or take actions in the application, that runs client code, which then sends data to the server via HTTP requests.

For example, imagine a simple message board application. The server stores the state of the message board—who has posted what—and has logic for updating that state. But all the actual interaction with the page—drawing the posts, letting the user enter new ones—happens in the browser. Both components are necessary.

The browser and the server interact over HTTP. The browser first makes a GET request to the server to load the current message board. The user interacts with the browser to type a new post, and submits it to the server (say, via a form). That causes the browser to make a POST request to the server, which instructs the server to update the message board state. The server then needs the browser to update what the user sees; with forms, the server sends a new HTML page in its response to the POST request. This process is shown in Figure 2.

Figure 2: The cycle of request and response for a multi-page application.

Forms are a simple, minimal introduction to this cycle of request and response and make a good introduction to how browser applications work. They’re also implemented in every browser and have been around for decades. These days many web applications use the form elements, but replace synchronous POST requests with asynchronous ones driven by Javascript,In the early 2000s, the adoption of asynchronous HTTP requests sparked the wave of innovative new web applications called Web 2.0. which makes applications snappier by hiding the time to make the HTTP request. In return for that snappiness, that JavaScript code must now handle errors, validate inputs, and indicate loading time. In any case, both synchronous and asynchronous uses of forms are based on the same principles of client and server code.

There are request types besides GET and POST, like PUT (create if non-existent) and DELETE, or the more obscure CONNECT and TRACE. In 2010 the PATCH method was standardized in RFC 5789. New methods were intended as a standard extension mechanism for HTTP, and some protocols were built this way (like WebDav’s PROPFIND, MOVE, and LOCK methods), but this did not become an enduring way to extend the web itself, and HTTP 2.0 and 3.0 did not add any new methods.

Receiving POST Requests

To better understand the request/response cycle, let’s write a simple web server. It’ll implement an online guest book,They were very hip in the 1990s—comment threads from before there was anything to comment on. kind of like an open, anonymous comment thread. Now, this is a book on web browser engineering, so I won’t discuss web server implementation that thoroughly. But I want you to see how the server side of an application works.

A web server is a separate program from the web browser, so let’s start a new file. The server will need to:

Let’s start by opening a socket. Like for the browser, we need to create an internet streaming socket using TCP:

import socket
s = socket.socket(
    family=socket.AF_INET,
    type=socket.SOCK_STREAM,
    proto=socket.IPPROTO_TCP)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

The setsockopt call is optional. Normally, when a program has a socket open and it crashes, your OS prevents that port from being reusedWhen your process crashes, the computer on the end of the connection won’t be informed immediately; if some other process opens the same port, it could receive data meant for the old, now-dead process. for a short period. That’s annoying when developing a server; calling setsockopt with the SO_REUSEADDR option allows the OS to immediately reuse the port.

Now, with this socket, instead of calling connect (to connect to some other server), we’ll call bind, which waits for other computers to connect:

s.bind(('', 8000))
s.listen()

Let’s look at the bind call first. Its first argument says who should be allowed to make connections to the server; the empty string means that anyone can connect. The second argument is the port others must use to talk to our server; I’ve chosen 8000. I can’t use 80, because ports below 1024 require administrator privileges, but you can pick something other than 8000 if, for whatever reason, port 8000 is taken on your machine.

Finally, after the bind call, the listen call tells the OS that we’re ready to accept connections.

To actually accept those connections, we enter a loop that runs once per connection. At the top of the loop we call s.accept to wait for a new connection:

while True:
    conx, addr = s.accept()
    handle_connection(conx)

That connection object is, confusingly, also a socket: it is the socket corresponding to that one connection. We know what to do with those: we read the contents and parse the HTTP message. But it’s a little trickier in the server than in the browser, because the server can’t just read from the socket until the connection closes—the browser is waiting for the server and won’t close the connection.

So, we’ve got to read from the socket line by line. First, we read the request line:

def handle_connection(conx):
    req = conx.makefile("b")
    reqline = req.readline().decode('utf8')
    method, url, version = reqline.split(" ", 2)
    assert method in ["GET", "POST"]

Then we read the headers until we get to a blank line, accumulating the headers in a dictionary:

def handle_connection(conx):
    # ...
    headers = {}
    while True:
        line = req.readline().decode('utf8')
        if line == '\r\n': break
        header, value = line.split(":", 1)
        headers[header.casefold()] = value.strip()

Finally we read the body, but only when the Content-Length header tells us how much of it to read (that’s why that header is mandatory on POST requests):

def handle_connection(conx):
    # ...
    if 'content-length' in headers:
        length = int(headers['content-length'])
        body = req.read(length).decode('utf8')
    else:
        body = None

Now the server needs to generate a web page in response. We’ll get to that later; for now, just abstract that away behind a do_request call:

def handle_connection(conx):
    # ...
    status, body = do_request(method, url, headers, body)

The server then sends this page back to the browser:

def handle_connection(conx):
    # ...
    response = "HTTP/1.0 {}\r\n".format(status)
    response += "Content-Length: {}\r\n".format(
        len(body.encode("utf8")))
    response += "\r\n" + body
    conx.send(response.encode('utf8'))
    conx.close()

The architecture is summarized in Figure 3. Our implementation is all pretty bare-bones: our server doesn’t check that the browser is using HTTP 1.0 to talk to it, it doesn’t send back any headers at all except Content-Length, it doesn’t support TLS, and so on. Again: this is a web browser book—it’ll do.

Figure 3: The architecture of the simple web server in this chapter.

Ilya Grigorik’s High Performance Browser Networking is an excellent deep dive into networking and how to optimize for it in a web application. There are things the client can do (make fewer requests, avoid polling, reuse connections) and things the server can do (compression, protocol support, sharing domains).

Generating Web Pages

So far, all of this server code is “boilerplate”—any web application will have similar code. What makes our server a guest book, on the other hand, depends on what happens inside do_request. It needs to store the guest book state, generate HTML pages, and respond to POST requests.

Let’s store guest book entries in a Python list. Usually web applications use persistent state, like a database, so that the server can be restarted without losing state, but our guest book need not be that resilient.

ENTRIES = [ 'Pavel was here' ]

Next, do_request has to output HTML that shows those entries:

def do_request(method, url, headers, body):
    out = "<!doctype html>"
    for entry in ENTRIES:
        out += "<p>" + entry + "</p>"
    return "200 OK", out

This is definitely “minimal” HTML, so it’s a good thing our browser will insert implicit tags and has some default styles! You can test it out by running this minimal web server and, while it’s running, direct your browser to http://localhost:8000/, where localhost is what your computer calls itself and 8000 is the port we chose earlier. You should see one guest book entry.

By the way, while you’re debugging this web server, it’s probably better to use a real web browser, instead of this book’s browser, to interact with it. That way you don’t have to worry about browser bugs while you work on server bugs. But this server does support both real and toy browsers.

We’ll use forms to let visitors write in the guest book:

def do_request(method, url, headers, body):
    # ...
    out += "<form action=add method=post>"
    out +=   "<p><input name=guest></p>"
    out +=   "<p><button>Sign the book!</button></p>"
    out += "</form>"
    # ...

When this form is submitted, the browser will send a POST request to http://localhost:8000/add. So the server needs to react to these submissions. That means do_request will field two kinds of requests: regular browsing and form submissions. Let’s separate the two kinds of requests into different functions.

First rename the current do_request to show_comments:

def show_comments():
    # ...
    return out

This then frees up the do_request function to figure out which function to call for which request:

def do_request(method, url, headers, body):
    if method == "GET" and url == "/":
        return "200 OK", show_comments()
    elif method == "POST" and url == "/add":
        params = form_decode(body)
        return "200 OK", add_entry(params)
    else:
        return "404 Not Found", not_found(url, method)

When a POST request to /add comes in, the first step is to decode the request body:

def form_decode(body):
    params = {}
    for field in body.split("&"):
        name, value = field.split("=", 1)
        name = urllib.parse.unquote_plus(name)
        value = urllib.parse.unquote_plus(value)
        params[name] = value
    return params

Note that I use unquote_plus instead of unquote, because browsers may also use a plus sign to encode a space. The add_entry function then looks up the guest parameter and adds its content as a new guest book entry:

def add_entry(params):
    if 'guest' in params:
        ENTRIES.append(params['guest'])
    return show_comments()

I’ve also added a “404” response. Fitting the austere stylings of our guest book, here’s the 404 page:

def not_found(url, method):
    out = "<!doctype html>"
    out += "<h1>{} {} not found!</h1>".format(method, url)
    return out

Try it! You should be able to restart the server, open it in your browser, and update the guest book a few times. You should also be able to use the guest book from a real web browser.

Typically, connection handling and request routing is handled by a web framework; this book’s website, for example uses bottle.py. Frameworks parse requests into convenient data structures, route requests to the right handler, and can also provide tools like HTML templates, session handling, database access, input validation, and API generation.

Summary

With this chapter we’re starting to transform our browser into an application platform. We’ve added:

Plus, our browser now has a little web server friend. That’s going to be handy as we add more interactive features to the browser.

Since this chapter introduces a server, I’ve also added support for that in the browser widget below, by cross-compiling this chapter’s server code to JavaScript. Try submitting a comment through the form, it should work!

Outline

The complete set of functions, classes, and methods in our browser should now look something like this:

class URL: def __init__(url) def request(payload) def resolve(url) def __str__() class Text: def __init__(text, parent) def __repr__() class Element: def __init__(tag, attributes, parent) def __repr__() def print_tree(node, indent) def tree_to_list(tree, list) class HTMLParser: SELF_CLOSING_TAGS HEAD_TAGS def __init__(body) def parse() def get_attributes(text) def add_text(text) def add_tag(tag) def implicit_tags(tag) def finish() class CSSParser: def __init__(s) def whitespace() def literal(literal) def word() def ignore_until(chars) def pair() def selector() def body() def parse() class TagSelector: def __init__(tag) def matches(node) class DescendantSelector: def __init__(ancestor, descendant) def matches(node) FONTS def get_font(size, weight, style) DEFAULT_STYLE_SHEET INHERITED_PROPERTIES def style(node, rules) def cascade_priority(rule) WIDTH, HEIGHT HSTEP, VSTEP class Rect: def __init__(left, top, right, bottom) def containsPoint(x, y) INPUT_WIDTH_PX BLOCK_ELEMENTS class DocumentLayout: def __init__(node) def layout() def should_paint() def paint() class BlockLayout: def __init__(node, parent, previous) def layout_mode() def layout() def recurse(node) def new_line() def word(node, word) def input(node) def self_rect() def should_paint() def paint() class LineLayout: def __init__(node, parent, previous) def layout() def should_paint() def paint() class TextLayout: def __init__(node, word, parent, previous) def layout() def should_paint() def paint() class InputLayout: def __init__(node, parent, previous) def layout() def should_paint() def paint() def self_rect() class DrawText: def __init__(x1, y1, text, font, color) def execute(scroll, canvas) class DrawRect: def __init__(rect, color) def execute(scroll, canvas) class DrawLine: def __init__(x1, y1, x2, y2, color, thickness) def execute(scroll, canvas) class DrawOutline: def __init__(rect, color, thickness) def execute(scroll, canvas) def paint_tree(layout_object, display_list) SCROLL_STEP class Tab: def __init__(tab_height) def load(url, payload) def render() def draw(canvas, offset) def scrolldown() def click(x, y) def go_back() def submit_form(elt) def keypress(char) class Chrome: def __init__(browser) def tab_rect(i) def paint() def click(x, y) def keypress(char) def enter() def blur() class Browser: def __init__() def draw() def new_tab(url) def handle_down(e) def handle_click(e) def handle_key(e) def handle_enter(e)

There’s also a server now, but it’s much simpler:

def handle_connection(conx) def do_request(method, url, headers, body) def form_decode(body) ENTRIES def show_comments() def not_found(url, method) def add_entry(params)

If you run it, it should look something like this:

Exercises

8-1 Enter key. In most browsers, if you hit the “Enter” or “Return” key while inside a text entry, that submits the form that the text entry was in. Add this feature to your browser.

8-2 GET forms. Forms can be submitted via GET requests as well as POST requests. In GET requests, the form-encoded data is pasted onto the end of the URL, separated from the path by a question mark, like /search?q=hi; GET form submissions have no body. Implement GET form submissions.

8-3 Blurring. Right now, if you click inside a text entry, and then inside the address bar, two cursors will appear on the screen. To fix this, add a blur method to each Tab which unfocuses anything that is focused, and call it before changing focus.

8-4 Check boxes. In HTML, input elements have a type attribute. When set to checkbox, the input element looks like a checkbox; it’s checked if the checked attribute is set, and unchecked otherwise.Technically, the checked attribute only affects the state of the checkbox when the page loads; checking and unchecking a checkbox does not affect this attribute but instead manipulates internal state. When the form is submitted, a checkbox’s name=value pair is included only if the checkbox is checked. (If the checkbox has no value attribute, the default is the string on.)

8-5 Resubmit requests. One reason to separate GET and POST requests is that GET requests are supposed to be idempotent (read-only, basically) while POST requests are assumed to change the web server state. That means that going “back” to a GET request (making the request again) is safe, while going “back” to a POST request is a bad idea. Change the browser history to record what method was used to access each URL, and the POST body if one was used. When you go back to a POST-ed URL, ask the user if they want to resubmit the form. Don’t go back if they say no; if they say yes, submit a POST request with the same body as before.

8-6 Message board. Right now our web server is a simple guest book. Extend it into a simple message board by adding support for topics. Each topic should have its own URL and its own list of messages. So, for example, /cooking should be a page of posts (about cooking) and comments submitted through the form on that page should only show up when you go to /cooking, not when you go to /cars. Make the home page, at /, list the available topics with a link to each topic’s page. Make it possible for users to add new topics.

8-7 Persistence. Back the server’s list of guest book entries with a file, so that when the server is restarted it doesn’t lose data.

8-8 Rich buttons. Make it possible for a button to contain arbitrary elements as children, and render them correctly. The children should be contained inside the button instead of spilling out—this can make a button really tall. Think about edge cases, like a button that contains another button, an input area, or a link, and test real browsers to see what they do.

8-9 HTML chrome. Browser chrome is quite complicated in real browsers, with tricky details such as font sizes, padding, outlines, shadows, icons and so on. This makes it tempting to try to reuse our layout engine for it. Implement this, using <button> elements for the new tab and back buttons, an <input> element for the address bar, and <a> elements for the tab names. It won’t look exactly the same as the current chrome—outline will have to wait for Chapter 14, for example—but if you adjust the default CSS you should be able to make it look passable.Real browsers have in fact gone down this implementation path multiple times, building layout engines for the browser chrome that are heavily inspired by or reuse pieces of the main web layout engine. Firefox had one, and Chrome has one. However, because it’s so important for the browser chrome to be very fast and responsive to draw, such approaches have had mixed success.

Running Interactive Scripts

The first web applications were like the previous chapter’s guest book, with the server generating new web pages for every user action. But in the early 2000s, JavaScript-enhanced web applications, which can update pages dynamically and respond immediately to user actions, took their place. Let’s add support for this key web technology to our browser.

Installing DukPy

Actually writing a JavaScript interpreter is beyond the scope of this book,But check out a book on programming language implementation if it sounds interesting! so this chapter uses the dukpy library for executing JavaScript.

DukPy wraps a JavaScript interpreter called Duktape. The most famous JavaScript interpreters are those used in browsers: TraceMonkey (Firefox), JavaScriptCore (Safari), and V8 (Chrome). Unlike those implementations, which are extremely fast but also extremely complex, Duktape aims to be simple and extensible, and is usually embedded inside a larger C or C++ project.For example, in a video game the high-speed graphics code is usually written in C or C++ , but the actual plot of the game is usually written in a higher-level language like JavaScript.

Like other JavaScript engines, DukPy not only executes JavaScript code, but also allows it to call exported Python functions. We’ll be using this feature to allow JavaScript code to modify the web page it’s running on.

The first step to using DukPy is installing it. On most machines, including on Windows, macOS, and Linux systems, you should be able to do this with:

python3 -m pip install dukpy

If you have a really old version of Python, you might need to install the pip package first, possibly using a command line easy_install. If you do your Python programming through an integrated development environment (IDE), you may need to use your IDE’s package installer. If nothing else works, you can build from source.

If you’re following along in something other than Python, you might need to skip this chapter, though you could try binding directly to the duktape library that dukpy uses.

To test whether you installed DukPy correctly, execute this:

import dukpy
dukpy.evaljs("2 + 2")

If you get an error on the first line, you probably failed to install DukPy.Or, on my Linux machine, I sometimes get errors due to file ownership. You may have to do some sleuthing. If you get an error, or a segfault, on the second line, there’s a chance that Duktape failed to compile, or maybe doesn’t support your system, and you might need to debug further.

Note to JavaScript experts: DukPy does not implement newer syntax like let and const or arrow functions. In keeping with this book’s aesthetics, you’ll need to use old-school JavaScript from the turn of the century.

Running JavaScript Code

The test above shows how you run JavaScript code in DukPy: you just call evaljs! Let’s put this newfound knowledge to work in our browser.

On the web, JavaScript is found in <script> tags. Normally, a <script> tag has a src attribute with a relative URL that points to a JavaScript file, much like with CSS files. A <script> tag could also contain JavaScript source code between the start and end tag, but we won’t implement that.It’s a challenge for parsing, since it’s hard to avoid less-than and greater-than signs in JavaScript code. See Exercise 4-3.

Finding and downloading those scripts is similar to what we did for CSS. First, we need to find all of the scripts:

class Tab:
    def load(self, url, payload=None):
        # ...
        scripts = [node.attributes["src"] for node
                   in tree_to_list(self.nodes, [])
                   if isinstance(node, Element)
                   and node.tag == "script"
                   and "src" in node.attributes]
        # ...

Next, we run all of the scripts:

def load(self, url, payload=None):
    # ...
    for script in scripts:
        script_url = url.resolve(script)
        try:
            body = script_url.request()
        except:
            continue
        print("Script returned: ", dukpy.evaljs(body))
    # ...

This should run before styling and layout. To try it out, create a simple web page with a script tag:

<script src=test.js></script>

Then write a super simple script to test.js, maybe this:

var x = 2
x + x

Point your browser at that page, and you should see:

Script returned: 4

That’s your browser running its first bit of JavaScript!

Actually, real browsers run JavaScript code as soon as the browser parses the <script> tag, not after the whole page is parsed. Or, at least, that is the default; there are many options. What our browser does is what a real browser does when the defer attribute is set. The default behavior is much trickier to implement efficiently.

Exporting Functions

Right now, our browser just prints the last expression in a script; but in a real browser scripts must call the console.log function to print. To support that, we will need to export a function from Python into JavaScript. We’ll be exporting a lot of functions, so to avoid polluting the Tab object with many new methods, let’s put this code in a new JSContext class:

class JSContext:
    def __init__(self):
        self.interp = dukpy.JSInterpreter()

    def run(self, code):
        return self.interp.evaljs(code)

DukPy’s JSInterpreter object stores the values of all the JavaScript variables, and lets us run multiple JavaScript snippets and share variable values and other state between them.

We create this new JSContext object while loading the page:

class Tab:
    def load(self, url, payload=None):
        # ...
        self.js = JSContext()
        for script in scripts:
            # ...
            self.js.run(body)

As a side benefit of using one JSContext for all scripts, it is now possible to run two scripts and have one of them define a variable that the other uses, say on a page like this:

<script src=a.js></script>
<script src=b.js></script>

Suppose a.js is “var x = 2;” and b.js is “console.log(x + x)”; the variable x is set in a.js but used in b.js. In real web browsers, that’s common, since one script might define library functions that another script wants to call.

Now, to allow JavaScript to interact with the outside world, DukPy allows us to “export” functions to it. For example, we can export Python’s print function like so:

class JSContext:
    def __init__(self):
        # ...
        self.interp.export_function("log", print)

We can call an exported function from JavaScript using DukPy’s call_python function. For example:

call_python("log", "Hi from JS")

When this JavaScript code runs, DukPy converts the JavaScript string "Hi from JS" into a Python string,This conversion works for numbers, strings, and booleans, plus arrays and dictionaries thereof, but not with fancy objects. and then passes that Python string to the print function we exported. Then print prints that string.

Since we ultimately want a console.log function, not a call_python function, we need to define a console object and then give it a log property. We can do that in JavaScript:

console = { log: function(x) { call_python("log", x); } }

In case you’re not too familiar with JavaScript,Now’s a good time to brush up! this defines a variable called console, whose value is an object literal with the property log, whose value is a function that calls call_python. The interaction between the browser and JavaScript is shown in Figure 1.

We can call that JavaScript code our “JavaScript runtime”; we run it before we run any user code, so let’s stick it in a runtime.js file and execute it when the JSContext is created, before we run any user code:

RUNTIME_JS = open("runtime.js").read()

class JSContext:
    def __init__(self):
        # ...
        self.interp.evaljs(RUNTIME_JS)

Now you should be able to put console.log("Hi from JS!") into a JavaScript file, run it from your browser, and see output in your terminal. You should also be able to call console.log multiple times.

Taking a step back, when we run JavaScript in our browser, we’re mixing C code, which implements the JavaScript interpreter; Python code, which implements certain JavaScript functions; a JavaScript runtime, which wraps the Python API to look more like the JavaScript one; and of course some user code in JavaScript. There’s a lot of complexity here!

If a script runs for a long time, or has an infinite loop, our browser locks up and becomes completely unresponsive to the user. This is a consequence of JavaScript’s single-threaded semantics and its task-based, run-to-completion scheduling. Some APIs like Web Workers allow limited multithreading, but those threads don’t have access to the DOM.

Handling Crashes

Crashes in JavaScript code are frustrating to debug. You can cause a crash by writing bad code, or by explicitly raising an exception, like so:

throw Error("bad");

When a web page runs some JavaScript that crashes, the browser should ignore the crash. Web pages shouldn’t be able to crash our browser! You can implement that like this:

class JSContext:
    def run(self, script, code):
        try:
            return self.interp.evaljs(code)
        except dukpy.JSRuntimeError as e:
            print("Script", script, "crashed", e)

But as you go through this chapter, you’ll also run into another type of crash: crashes in our own JavaScript runtime. We can’t ignore those, because that’s our code. Debugging these crashes is a bear: by default DukPy won’t show a backtrace, and if the runtime code calls into an exported function that crashes it gets even more confusing.

Here are a few tips to help with these crashes. First, if you get a crash inside some JavaScript function, wrap the body of the function like this:

function foo() {
    try {
        // ...
    } catch(e) {
        console.log("Crash in function foo()", e.stack);
        throw e;
    }
}

This code catches all exceptions and prints a stack trace before re-raising them. If you instead are getting crashes inside an exported function you will need to wrap that function, on the Python side:

class JSContext:
    def foo(self, arg):
        try:
            # ...
        except:
            import traceback
            traceback.print_exc()
            raise

Debugging these issues is not easy, because all these calls between Python and JavaScript get pretty complicated. Because these bugs are hard, it’s worth approaching debugging systematically and gathering a lot of information before attempting a fix.

Returning Handles

So far, JavaScript evaluation is fun but useless, because JavaScript can’t make any kinds of modifications to the page itself. (Why even run JavaScript if it can’t do anything besides print? Who looks at a browser’s console output?) We need to allow JavaScript to modify the page.

JavaScript manipulates a web page by calling any of a large set of methods collectively called the DOM API. The DOM API is big, and it keeps getting bigger, so we won’t be implementing all, or even most, of it. But a few core functions show key elements of the full API:

We’ll implement simplified versions of these APIs.The simplifications will be minor. querySelectorAll will return an array, not this thing called a NodeList; innerHTML will only write the HTML contents of an element, and won’t allow reading those contents. This suffices to demonstrate JavaScript–browser interaction.

Let’s start with querySelectorAll. First, export a function:

class JSContext:
    def __init__(self):
        # ...
        self.interp.export_function("querySelectorAll",
            self.querySelectorAll)
        # ...

In JavaScript, querySelectorAll is a method on the document object, which we need to define in the JavaScript runtime:

document = { querySelectorAll: function(s) {
    return call_python("querySelectorAll", s);
}}

On the Python side, querySelectorAll first has to parse the selector and then find and return the matching elements. To parse the selector, I’ll call into the CSSParser’s selector method:If you pass querySelectorAll an invalid selector, the selector call will throw an error, and DukPy will convert that Python-side exception into a JavaScript-side exception in the web script we are running, which can catch it.

class JSContext:
    def querySelectorAll(self, selector_text):
        selector = CSSParser(selector_text).selector()

Next we need to find and return all matching elements. To do that, we need the JSContext to have access to the Tab, specifically to its nodes field. So let’s pass in the Tab when creating a JSContext:

class JSContext:
    def __init__(self, tab):
        self.tab = tab
        # ...

class Tab:
    def load(self, url, payload=None):
        # ...
        self.js = JSContext(self)
        # ...

Now querySelectorAll will find all nodes matching the selector:

def querySelectorAll(self, selector_text):
    # ...
    nodes = [node for node
             in tree_to_list(self.tab.nodes, [])
             if selector.matches(node)]

Finally, we need to return those nodes back to JavaScript. You might try something like this:

def querySelectorAll(self, selector_text):
    # ...
    return nodes

However, this throws an error:Yes, that’s a confusing error message. Is it a JSRuntimeError, an EvalError, or a TypeError? The confusion is a consequence of the complex interaction of Python, JS, and C code. (JSON, or JavaScript Object Notation, is a language-independent data format.)

_dukpy.JSRuntimeError: EvalError:
Error while calling Python Function:
TypeError('Object of type Element is not JSON serializable')

What DukPy is trying to tell you is that it has no idea what to do with the Element objects that querySelectorAll returns. After all, the Element class only exists in Python, not JavaScript!

Python objects need to stay on the Python side of the browser, so JavaScript code will need to refer to them via some kind of indirection. I’ll use a simple numeric identifier, which I’ll call a handle (see Figure 2).Note the similarity to file descriptors, which give user-level applications access to kernel data structures.

Figure 2: The relationship between Node objects in JavaScript and Element/Text objects in the browser is maintained through handles.

We’ll need to keep track of the handle to node mapping. Let’s create a node_to_handle data structure to map nodes to handles, and a handle_to_node map that goes the other way:

class JSContext:
    def __init__(self, tab):
        # ...
        self.node_to_handle = {}
        self.handle_to_node = {}
        # ...

Now the querySelectorAll handler can allocate handles for each node and return those handles instead:

def querySelectorAll(self, selector_text):
    # ...
    return [self.get_handle(node) for node in nodes]

The get_handle function should create a new handle if one doesn’t exist yet:

class JSContext:
    def get_handle(self, elt):
        if elt not in self.node_to_handle:
            handle = len(self.node_to_handle)
            self.node_to_handle[elt] = handle
            self.handle_to_node[handle] = elt
        else:
            handle = self.node_to_handle[elt]
        return handle

So now the querySelectorAll handler returns something like [1, 3, 4, 7], with each number being a handle for an element, which DukPy can easily convert into JavaScript objects without issue. Now of course, on the JavaScript side, querySelectorAll shouldn’t return a bunch of numbers: it should return a list of Node objects.In a real browser, querySelectorAll actually returns a NodeList object, for kind of abstruse reasons that aren’t relevant here. So let’s define a Node object in our runtime that wraps a handle:If your JavaScript is rusty, you might want to read up on the crazy way you define classes in JavaScript. Modern JavaScript also provides the class syntax, which is more sensible, but it’s not supported in DukPy.

function Node(handle) { this.handle = handle; }

We create these Node objects in querySelectorAll’s wrapper:This code creates new Node objects every time you call querySelectorAll, even if there’s already a Node for that handle. That means you can’t use equality to compare Node objects. I’ll ignore that but a real browser wouldn’t.

document = { querySelectorAll: function(s) {
    var handles = call_python("querySelectorAll", s);
    return handles.map(function(h) { return new Node(h) });
}}

Wrapping Handles

Now that we’ve got some Nodes, what can we do with them?

One simple DOM method is getAttribute, a method on Node objects that lets you get the value of HTML attributes. Implementing getAttribute means solving the opposite problem to querySelectorAll: taking Node objects on the JavaScript side, and shipping them over to Python.

The solution is similar to querySelectorAll: instead of shipping the Node object itself, we send over its handle:

Node.prototype.getAttribute = function(attr) {
    return call_python("getAttribute", this.handle, attr);
}

On the Python side, the getAttribute function takes two arguments, a handle and an attribute:

class JSContext:
    def getAttribute(self, handle, attr):
        elt = self.handle_to_node[handle]
        attr = elt.attributes.get(attr, None)
        return attr if attr else ""

Note that if the attribute is not assigned, the get method will return None, which DukPy will translate to JavaScript’s null. Don’t forget to export this function as getAttribute.

We finally have enough of the DOM API to implement a little character count function for text areas:

inputs = document.querySelectorAll('input')
for (var i = 0; i < inputs.length; i++) {
    var name = inputs[i].getAttribute("name");
    var value = inputs[i].getAttribute("value");
    if (value.length > 100) {
        console.log("Input " + name + " has too much text.")
    }
}

Ideally, though we’d update the character count every time the user types into an input box. That requires running JavaScript on every key press. Let’s implement that next.

Node objects in the DOM correspond to Element nodes in the browser. They thus have JavaScript object properties as well as HTML attributes. They’re easy to confuse, and to make matters worse, many DOM object properties reflect attribute values automatically. For example, the id property on Node objects gives read-write access to the id attribute of the underlying Element. This is very convenient, and avoids calling setAttribute and getAttribute all over the place. But this reflection only applies to certain fields; setting made-up JavaScript properties won’t create corresponding HTML attributes, nor vice versa.

Event Handling

The browser executes JavaScript code as soon as it loads the web page, but that code often wants to change the page in response to user actions.

Here’s how that works. Any time the user interacts with the page, the browser generates events. Each event has a type, like change, click, or submit, and happens at a target element. The addEventListener method allows JavaScript to react to those events: node.addEventListener('click', func) sets func to run every time the element corresponding to node generates a click event. It’s basically Tk’s bind, but in the browser—see Figure 3. Let’s implement it.

Figure 3: The browser calls into JavaScript when events happen.

Let’s start with generating events. I’ll create a dispatch_event method and call it whenever an event is generated. That includes, first of all, any time we click in the page:

class Tab:
    def click(self, x, y):
        # ...
        elif elt.tag == "a" and "href" in elt.attributes:
            self.js.dispatch_event("click", elt)
            # ...
        elif elt.tag == "input":
            self.js.dispatch_event("click", elt)
            # ...
        elif elt.tag == "button":
            self.js.dispatch_event("click", elt)
            # ...
        # ...

Second, before updating input area values:

class Tab:
    def keypress(self, char):
        if self.focus:
            self.js.dispatch_event("keydown", self.focus)
            # ...

And finally, when submitting forms but before actually sending the request to the server:

def submit_form(self, elt):
    self.js.dispatch_event("submit", elt)
    # ...

So far so good—but what should the dispatch_event method do? Well, it needs to run listeners passed to addEventListener, so those need to be stored somewhere. Since those listeners are JavaScript functions, we need to keep that data on the JavaScript side, in a variable in the runtime. I’ll call that variable LISTENERS; we’ll use it to look up handles and event types, so let’s make it map handles to a dictionary that maps event types to a list of listeners:

LISTENERS = {}

Node.prototype.addEventListener = function(type, listener) {
    if (!LISTENERS[this.handle]) LISTENERS[this.handle] = {};
    var dict = LISTENERS[this.handle];
    if (!dict[type]) dict[type] = [];
    var list = dict[type];
    list.push(listener);
}

To dispatch an event, we need to look up the type and handle in the LISTENERS array, like this:

Node.prototype.dispatchEvent = function(type) {
    var handle = this.handle;
    var list = (LISTENERS[handle] && LISTENERS[handle][type]) || [];
    for (var i = 0; i < list.length; i++) {
        list[i].call(this);
    }
}

Note that dispatchEvent uses the call method on functions, which sets the value of this inside that function. As is standard in JavaScript, I’m setting it to the node that the event was generated on.

When an event occurs, the browser calls dispatchEvent from Python:

class JSContext:
    def dispatch_event(self, type, elt):
        handle = self.node_to_handle.get(elt, -1)
        self.interp.evaljs(
            EVENT_DISPATCH_JS, type=type, handle=handle)

Here, the EVENT_DISPATCH_JS constant is a string of JavaScript code that dispatches a new event:

EVENT_DISPATCH_JS = \
    "new Node(dukpy.handle).dispatchEvent(dukpy.type)"

So when dispatch_event is called on the Python side, that runs dispatchEvent on the JavaScript side, and that in turn runs all of the event listeners. The dukpy JavaScript object in this code snippet stores the named type and handle arguments to evaljs.

With all this event-handling machinery in place, we can update the character count every time an input area changes:

function lengthCheck() {
    var name = this.getAttribute("name");
    var value = this.getAttribute("value");
    if (value.length > 100) {
        console.log("Input " + name + " has too much text.")
    }
}
var inputs = document.querySelectorAll("input");
for (var i = 0; i < inputs.length; i++) {
    inputs[i].addEventListener("keydown", lengthCheck);
}

Note that lengthCheck uses this to reference the input element that actually changed, as set up by dispatchEvent.

So far so good—but ideally the length check wouldn’t print to the console; it would add a warning to the web page itself. To do that, we’ll need to not only read from the page but also modify it.

JavaScript first appeared in 1995, as part of Netscape Navigator. Its name was chosen to indicate a similarity to the Java language, and the syntax is Java-esque for that reason. However, under the surface JavaScript is a much more dynamic language than Java, as is appropriate given its role as a progressive enhancement mechanism for the web. For example, any method or property on any object (including built-in ones like Element) can be dynamically overridden at any time. This makes it possible to polyfill differences between browsers, adding features that look built-in to other JavaScript code.

Modifying the DOM

So far we’ve implemented read-only DOM methods; now we need methods that change the page. The full DOM API provides a lot of such methods, but for simplicity I’m going to implement only innerHTML, which is used like this:

node.innerHTML = "This is my <b>new</b> bit of content!";

In other words, innerHTML is a property of node objects, with a setter that is run when the field is modified. That setter takes the new value, which must be a string, parses it as HTML, and makes the new, parsed HTML nodes children of the original node.

Let’s implement this, starting on the JavaScript side. JavaScript has the obscure Object.defineProperty function to define setters, which DukPy supports:

Object.defineProperty(Node.prototype, 'innerHTML', {
    set: function(s) {
        call_python("innerHTML_set", this.handle, s.toString());
    }
});

In innerHTML_set, we’ll need to parse the HTML string. That turns out to be trickier than you’d think, because our browser’s HTML parser is intended to parse whole HTML documents, not these document fragments. As an expedient, close-enough hack,Real browsers follow the standardized parsing algorithm for HTML fragments. I’ll just wrap the HTML in an html and body element:

def innerHTML_set(self, handle, s):
    doc = HTMLParser("<html><body>" + s + "</body></html>").parse()
    new_nodes = doc.children[0].children

Don’t forget to export the innerHTML_set function. Note that we extract all children of the body element, because an innerHTML_set call can create multiple nodes at a time. These new nodes must now be made children of the element innerHTML_set was called on:

def innerHTML_set(self, handle, s):
    # ...
    elt = self.handle_to_node[handle]
    elt.children = new_nodes
    for child in elt.children:
        child.parent = elt

We update the parent pointers of those parsed child nodes because otherwise they would point to the dummy body element that we added to aid parsing.

It might look like we’re done—but try this out and you’ll realize that nothing happens when a script calls innerHTML_set. That’s because, while we have changed the HTML tree, we haven’t regenerated the layout tree or the display list, so the browser is still showing the old page.

Whenever the page changes, we need to update its rendering by calling render:Redoing layout for the whole page is often wasteful; Chapter 16 explores a more complicated algorithm that speeds this up.

class JSContext:
    def innerHTML_set(self, handle, s):
        # ...
        self.tab.render()

JavaScript can now modify the web page!Note that while rendering will update to account for the new HTML, any added scripts or style sheets will not properly load, and removed style sheets will (incorrectly) still apply. I’ve left fixing that as Exercise 9-7.

Let’s try this out in our guest book. Say we want a 100-character limit on guest book entries to prevent long, incoherent rants from making it in.

First, switch to the server codebase and add a <strong> after the guest book form. Initially this element will be empty, but we’ll write an error message into it if the paragraph gets too long.

def show_comments():
    # ...
    out += "<strong></strong>"
    # ...

Also add a script to the page.

def show_comments():
    # ...
    out += "<script src=/comment.js></script>"
    # ...

Now the browser will request comment.js, so our server needs to serve that JavaScript file:

def do_request(method, url, headers, body):
    # ...
    elif method == "GET" and url == "/comment.js":
        with open("comment.js") as f:
            return "200 OK", f.read()
    # ...

We can then put our little input length checker into comment.js, with the lengthCheck function modified to use innerHTML:

var strong = document.querySelectorAll("strong")[0];

function lengthCheck() {
    var value = this.getAttribute("value");
    if (value.length > 100) {
        strong.innerHTML = "Comment too long!";
    }
}

var inputs = document.querySelectorAll("input");
for (var i = 0; i < inputs.length; i++) {
    inputs[i].addEventListener("keydown", lengthCheck);
}

Try it out: write a long comment and you should see the page warning you when it grows too long. By the way, we might want to make it stand out more, so let’s go ahead and add another URL to our web server, /comment.css, with the contents:

strong { font-weight: bold; color: red; }

Add a link to the guest book page so that this style sheet is loaded.

But even though we tell the user that their comment is too long the user can submit the guest book entry anyway. Oops! Let’s fix that.

This code has a subtle memory leak: if you access an HTML element from JavaScript (thereby creating a handle for it) and then remove the element from the page (using innerHTML), Python won’t be able to garbage-collect the Element object because it is still stored in the node_to_handle map. And that’s good, if JavaScript can still access that Element via its handle, but bad otherwise. Solving this is quite tricky, because it requires the Python and JavaScript garbage collectors to cooperate.

Event Defaults

So far, when an event is generated, the browser will run the listeners, and then also do whatever it normally does for that event—the default action. I’d now like JavaScript code to be able to cancel that default action.

There are a few steps involved. First of all, event listeners should receive an event object as an argument. That object should have a preventDefault method. When that method is called, the default action shouldn’t occur.

First of all, we’ll need event objects. Back to our JavaScript runtime:

function Event(type) {
    this.type = type
    this.do_default = true;
}

Event.prototype.preventDefault = function() {
    this.do_default = false;
}

Note the do_default field, to record whether preventDefault has been called. We’ll now be passing an Event object to dispatchEvent, instead of just the event type:

Node.prototype.dispatchEvent = function(evt) {
    var type = evt.type;
    // ...
    for (var i = 0; i < list.length; i++) {
        list[i].call(this, evt);
    }
    // ...
    return evt.do_default;
}

In Python, we now need to create an Event to pass to dispatchEvent:

EVENT_DISPATCH_JS = \
    "new Node(dukpy.handle).dispatchEvent(new Event(dukpy.type))"

Also note that dispatchEvent returns evt.do_default, which is not only standard in JavaScript but also helpful when dispatching events from Python, because Python’s dispatch_event can return that boolean to its handler:

class JSContext:
    def dispatch_event(self, type, elt):
        # ...
        do_default = self.interp.evaljs(
            EVENT_DISPATCH_JS, type=type, handle=handle)
        return not do_default

This way, every time an event happens, the browser can check the return value of dispatch_event and stop if it is True. We have three such places in the click method:

class Tab:
    def click(self, x, y):
        while elt:
            # ...
            elif elt.tag == "a" and "href" in elt.attributes:
                if self.js.dispatch_event("click", elt): return
                # ...
            elif elt.tag == "input":
                if self.js.dispatch_event("click", elt): return
                # ...
            elif elt.tag == "button":
                if self.js.dispatch_event("click", elt): return
                # ...
            # ...
         # ...

And one in submit_form:

class Tab:
    def submit_form(self, elt):
        if self.js.dispatch_event("submit", elt): return

And one in keypress:

class Tab:
    def keypress(self, char):
        if self.focus:
            if self.js.dispatch_event("keydown", self.focus): return

Now our character count code can prevent the user from submitting a form: it can use a global variable to track whether or not submission is allowed, and then when submission is attempted it can check that variable and cancel that submission if necessary:

var allow_submit = true;

function lengthCheck() {
    // ...
    allow_submit = value.length <= 100;
    if (!allow_submit) {
        // ...
    }
}

var form = document.querySelectorAll("form")[0];
form.addEventListener("submit", function(e) {
    if (!allow_submit) e.preventDefault();
});

This way it’s impossible to submit the form when the comment is too long!

Well … impossible in this browser. But since there are browsers that don’t run JavaScript (like ours, one chapter back), we should check the length on the server side too:

def add_entry(params):
    if 'guest' in params and len(params['guest']) <= 100:
        ENTRIES.append(params['guest'])
    return show_comments()

Note that we shouldn’t—can’t—rely on JavaScript being executed by the browser, because the browser is the user’s agent, not ours. Ideally, web pages should be written so that they work correctly without JavaScript, but work better with it. This is called progressive enhancement, and it means we’re not replicating in JavaScript what the browser can already do.

A closing thought: while our guest book now has a little bit of JavaScript code, it’s still mostly HTML, CSS, form elements, other standard web features. In this way JavaScript extends the web instead of replacing it. This is in contrast to the recently departed Adobe Flash, and before that Java Applets, which were self-contained plug-ins that handled input and rendering on their own.

Search engines are constantly crawling the web and indexing all of the web pages they can find. In the early days, indexing was just a matter of loading the HTML, parsing it and extracting the information. But these days, a lot of single-page app sites use JavaScript to “hydrate”This process is called “hydration” by analogy with how water is added to dehydrated food to make it edible again. their site into its full contents. On such sites, before hydration happens, the information in the site is hidden inside of JavaScript data structures. For this reason, search engines need to not just parse HTML, but also run JavaScript (and load style sheets) during indexing. In other words, the indexing systems use browsers (such as, for example, headless Chrome)—one more place browsers appear in the web ecosystem.

Summary

Our browser now runs JavaScript applications on behalf of websites. Granted, it supports just four methods from the vast DOM API, but even those demonstrate:

A web page can now add functionality via a clever script, instead of waiting for a browser developer to add it into the browser itself. And as a side benefit, a web page can now earn the lofty title of “web application”.

Starting with this chapter, I won’t be able to inline the chapter’s browser into an iframe, due to security restrictions related to the way I’m communicating with scripts within the web page. But you can still load it in a new browser tab by clicking here.

Outline

The complete set of functions, classes, and methods in our browser should now look something like this:

class URL: def __init__(url) def request(payload) def resolve(url) def __str__() class Text: def __init__(text, parent) def __repr__() class Element: def __init__(tag, attributes, parent) def __repr__() def print_tree(node, indent) def tree_to_list(tree, list) class HTMLParser: SELF_CLOSING_TAGS HEAD_TAGS def __init__(body) def parse() def get_attributes(text) def add_text(text) def add_tag(tag) def implicit_tags(tag) def finish() class CSSParser: def __init__(s) def whitespace() def literal(literal) def word() def ignore_until(chars) def pair() def selector() def body() def parse() class TagSelector: def __init__(tag) def matches(node) class DescendantSelector: def __init__(ancestor, descendant) def matches(node) FONTS def get_font(size, weight, style) DEFAULT_STYLE_SHEET INHERITED_PROPERTIES def style(node, rules) def cascade_priority(rule) WIDTH, HEIGHT HSTEP, VSTEP class Rect: def __init__(left, top, right, bottom) def containsPoint(x, y) INPUT_WIDTH_PX BLOCK_ELEMENTS class DocumentLayout: def __init__(node) def layout() def should_paint() def paint() class BlockLayout: def __init__(node, parent, previous) def layout_mode() def layout() def recurse(node) def new_line() def word(node, word) def input(node) def self_rect() def should_paint() def paint() class LineLayout: def __init__(node, parent, previous) def layout() def should_paint() def paint() class TextLayout: def __init__(node, word, parent, previous) def layout() def should_paint() def paint() class InputLayout: def __init__(node, parent, previous) def layout() def should_paint() def paint() def self_rect() class DrawText: def __init__(x1, y1, text, font, color) def execute(scroll, canvas) class DrawRect: def __init__(rect, color) def execute(scroll, canvas) class DrawLine: def __init__(x1, y1, x2, y2, color, thickness) def execute(scroll, canvas) class DrawOutline: def __init__(rect, color, thickness) def execute(scroll, canvas) def paint_tree(layout_object, display_list) EVENT_DISPATCH_JS RUNTIME_JS class JSContext: def __init__(tab) def run(script, code) def dispatch_event(type, elt) def get_handle(elt) def querySelectorAll(selector_text) def getAttribute(handle, attr) def innerHTML_set(handle, s) SCROLL_STEP class Tab: def __init__(tab_height) def load(url, payload) def render() def draw(canvas, offset) def scrolldown() def click(x, y) def go_back() def submit_form(elt) def keypress(char) class Chrome: def __init__(browser) def tab_rect(i) def paint() def click(x, y) def keypress(char) def enter() def blur() class Browser: def __init__() def draw() def new_tab(url) def handle_down(e) def handle_click(e) def handle_key(e) def handle_enter(e)

Exercises

9-1 Node.children. Add support for the children property on JavaScript Nodes. Node.children returns the immediate Element children of a node, as an array. Text children are not included.The DOM method childNodes gives access to both elements and text nodes.

9-2 createElement. The document.createElement method creates a new element, which can be attached to the document with the appendChild and insertBefore methods on Nodes; unlike innerHTML, there’s no parsing involved. Implement all three methods.

9-3 removeChild. The removeChild method on Nodes detaches the provided child and returns it, bringing that child—and its subtree—back into a detached state. (It can then be re-attached elsewhere, with appendChild and insertBefore, or deleted.) Implement this method. It’s more challenging to implement this one, because you’ll need to also remove the subtree from the Python side.

9-4 IDs. When an HTML element has an id attribute, a JavaScript variable pointing to that element is predefined. So, if a page has a <div id="foo"></div>, then there’s a variable foo referring to that node.This is standard behavior. Implement this in your browser. Make sure to handle the case of nodes being added and removed (such as with innerHTML).

9-5 Event bubbling. Right now, you can attach a click handler to a (anchor) elements, but not to anything else. Fix this. One challenge you’ll face is that when you click on an element, you also click on all its ancestors. On the web, this sort of quirk is handled by event bubbling: when an event is generated on an element, listeners are run not just on that element but also on its ancestors. Implement event bubbling, and make sure listeners can call stopPropagation on the event object to stop bubbling the event up the tree. Double-check that clicking on links still works, and make sure preventDefault still successfully prevents clicks on a link from actually following the link.

9-6 Serializing HTML. Reading from innerHTML should return a string containing HTML source code. That source code should reflect the current attributes of the element; for example:

element.innerHTML = '<span id=foo>Chris was here</span>';
element.id = 'bar';
console.log(element.innerHTML);
// Prints "<span id=bar>Chris was here</span>":

Implement this behavior for innerHTML as a getter. Also implement outerHTML, which differs from innerHTML in that it contains the element itself, not just its children.

9-7 Script-added scripts and style sheets. The innerHTML API could cause <script> or <link> elements to be added to the document, but currently our browser does not load them when this happens. Fix this. Likewise, when a <link> element is removed from the document, its style sheet should be removed from the global list; implement that as well.Note that, unlike a style sheet, a removed <script>’s evaluated code still exists for the lifetime of the web page. Can you see why it has to be that way?

Keeping Data Private

Our browser has grown up and now runs (small) web applications. With one final step—user identity via cookies—it will be able to run all sorts of personalized online services. But capability demands responsibility: our browser must now secure cookies against adversaries interested in stealing them. Luckily, browsers have sophisticated systems for controlling access to cookies and preventing their misuse.

Web security is a vast topic, covering browser, network, and application security. It also involves educating the user, so that attackers can’t mislead them into revealing their own secure data. This chapter can’t cover all of that: if you’re writing web applications or other security-sensitive code, this book is not enough.

Cookies

With what we’ve implemented so far, there’s no way for a web server to tell whether two HTTP requests come from the same user or from two different ones; our browser is effectively anonymous.I don’t mean anonymous against malicious attackers, who might use browser fingerprinting or similar techniques to tell users apart. I mean anonymous in the good-faith sense. That means it can’t “log in” anywhere, since a logged-in user’s requests would be indistinguishable from those of not-logged-in users.

The web fixes this problem with cookies. A cookie—the name is meaningless, ignore it—is a little bit of information stored by your browser on behalf of a web server. The cookie distinguishes your browser from any other, and is sent with each web request so the server can distinguish which requests come from whom. In effect, a cookie is a decentralized, server-granted identity for your browser.

Here are the technical details. An HTTP response can contain a Set-Cookie header. This header contains a key–value pair; for example, the following header sets the value of the foo cookie to bar:

Set-Cookie: foo=bar

The browser remembers this key–value pair, and the next time it makes a request to the same server (cookies are site-specific), the browser echoes it back in the Cookie header:

Cookie: foo=bar

Servers can set multiple cookies, and also set parameters like expiration dates, but this Set-Cookie / Cookie transaction as shown in Figure 1 is the core principle.

Figure 1: The server assigns cookies to the browser with the Set-Cookie header, and the browser thereafter identifies itself with the Cookie header.

Let’s use cookies to write a login system for our guest book. Each user will be identified by a long random number stored in the token cookie.This random.random call returns a decimal number with 53 bits of randomness. That’s not great; 256 bits is typically the goal. And random.random is not a secure random number generator: by observing enough tokens you can predict future values and use those to hijack accounts. A real web application must use a cryptographically secure random number generator for tokens. The server will either extract a token from the Cookie header, or generate a new one for new visitors:

import random

def handle_connection(conx):
    # ...
    if "cookie" in headers:
        token = headers["cookie"][len("token="):]
    else:
        token = str(random.random())[2:]
    # ...

Of course, new visitors need to be told to remember their newly generated token:

def handle_connection(conx):
    # ...
    if 'cookie' not in headers:
        template = "Set-Cookie: token={}\r\n"
        response += template.format(token)
    # ...

The first code block runs after all the request headers are parsed, before handling the request in do_request, while the second code block runs after do_request returns, when the server is assembling the HTTP response.

With these two code changes, each visitor to the guest book now has a unique identity. We can now use that identity to store information about each user. Let’s do that in a server side SESSIONS variable:Browsers and servers both limit header lengths, so it’s best to store minimal data in cookies. Plus, cookies are sent back and forth on every request, so long cookies mean a lot of useless traffic. It’s therefore wise to store user data on the server, and only store a pointer to that data in the cookie. And, since cookies are stored by the browser, they can be changed arbitrarily by the user, so it would be insecure to trust the cookie data.

SESSIONS = {}

def handle_connection(conx):
    # ...
    session = SESSIONS.setdefault(token, {})
    status, body = do_request(session, method, url, headers, body)
    # ...

SESSIONS maps tokens to session data dictionaries. The setdefault method both gets a key from a dictionary and also sets a default value if the key isn’t present. I’m passing that session data via do_request to individual pages like show_comments and add_entry:

def do_request(session, method, url, headers, body):
    if method == "GET" and url == "/":
        return "200 OK", show_comments(session)
    # ...
    elif method == "POST" and url == "/add":
        params = form_decode(body)
        add_entry(session, params)
        return "200 OK", show_comments(session)
    # ...

You’ll need to modify the argument lists for add_entry and show_comments to accept this new argument. We now have the foundation upon which to build a login system.

The original specification for cookies says there is “no compelling reason” for calling them “cookies”, but in fact using this term for opaque identifiers exchanged between programs seems to date way back; Wikipedia traces it back to at least 1979, and cookies were used in X11 for authentication before they were used on the web.

A Login System

I want users to log in before posting to the guest book. Minimally, that means:

Let’s start coding. We’ll hard-code two user/password pairs:

LOGINS = {
    "crashoverride": "0cool",
    "cerealkiller": "emmanuel"
}

Users will log in by going to /login:

def do_request(session, method, url, headers, body):
    # ...
    elif method == "GET" and url == "/login":
        return "200 OK", login_form(session)
    # ...

This page shows a form with a username and a password field:I’ve given the password input area the type password, which in a real browser will draw stars or dots instead of showing what you’ve entered, though our browser doesn’t do that; see Exercise 10-1. Also, do note that this is not particularly accessible HTML, lacking for example <label> elements around the form labels. Not that our browser supports that!

def login_form(session):
    body = "<!doctype html>"
    body += "<form action=/ method=post>"
    body += "<p>Username: <input name=username></p>"
    body += "<p>Password: <input name=password type=password></p>"
    body += "<p><button>Log in</button></p>"
    body += "</form>"
    return body 

Note that the form POSTs its data to the / URL. We’ll want to handle these POST requests in a new function that checks passwords and does logins:

def do_request(session, method, url, headers, body):
    # ...
    elif method == "POST" and url == "/":
        params = form_decode(body)
        return do_login(session, params)
    # ...

This do_login function checks passwords and logs people in by storing their user name in the session data:Actually, using == to compare passwords like this is a bad idea: Python’s equality function for strings scans the string from left to right, and exits as soon as it finds a difference. Therefore, you get a clue about the password from how long it takes to check a password guess; this is called a timing side channel. This book is about the browser, not the server, but a real web application has to do a constant-time string comparison!

def do_login(session, params):
    username = params.get("username")
    password = params.get("password")
    if username in LOGINS and LOGINS[username] == password:
        session["user"] = username
        return "200 OK", show_comments(session)
    else:
        out = "<!doctype html>"
        out += "<h1>Invalid password for {}</h1>".format(username)
        return "401 Unauthorized", out

Note that the session data (including the user key) is stored on the server, so users can’t modify it directly. That’s good, because we only want to set the user key in the session data if users supply the right password in the login form.

So now we can check if a user is logged in by checking the session data. Let’s only show the comment form to logged in users:

def show_comments(session):
    # ...
    if "user" in session:
        out += "<h1>Hello, " + session["user"] + "</h1>"
        out += "<form action=add method=post>"
        out +=   "<p><input name=guest></p>"
        out +=   "<p><button>Sign the book!</button></p>"
        out += "</form>"
    else:
        out += "<a href=/login>Sign in to write in the guest book</a>"
    # ...

Likewise, add_entry must check that the user is logged in before posting comments:

def add_entry(session, params):
    if "user" not in session: return
    if 'guest' in params and len(params['guest']) <= 100:
        ENTRIES.append((params['guest'], session["user"]))

Note that the username from the session is stored into ENTRIES:The pre-loaded comments reference 1995’s Hackers. Hack the Planet!

ENTRIES = [
    ("No names. We are nameless!", "cerealkiller"),
    ("HACK THE PLANET!!!", "crashoverride"),
]

When we print the guest book entries, we’ll show who authored them:

def show_comments(session):
    # ...
    for entry, who in ENTRIES:
        out += "<p>" + entry + "\n"
        out += "<i>by " + who + "</i></p>"
    # ...

Try it out in a normal web browser. You should be able to go to the main guest book page, click the link to log in, log in with one of the username/password pairs above, and then be able to post entries.The login flow slows down debugging. You might want to add the empty string as a username/password pair. Of course, this login system has a whole slew of insecurities.The insecurities include not hashing passwords, not using bcrypt, not allowing password changes, not having a “forget your password” flow, not forcing TLS, not sandboxing the server, and many many others. But the focus of this book is the browser, not the server, so once you’re sure it’s all working, let’s switch back to our web browser and implement cookies.

A more obscure browser authentication system is TLS client certificates. The user downloads a public/private key pair from the server, and the browser then uses them to prove who it is upon later requests to that server. Also, if you’ve ever seen a URL with username:password@ before the hostname, that’s HTTP authentication. Please don’t use either method in new websites (without a good reason).

Implementing Cookies

To start, we need a place in the browser that stores cookies; that data structure is traditionally called a cookie jar:Because once you have one silly name it’s important to stay on-brand.

COOKIE_JAR = {}

Since cookies are site-specific, our cookie jar will map sites to cookies. Note that the cookie jar is global, not limited to a particular tab. That means that if you’re logged in to a website and you open a second tab, you’re logged in on that tab as well.Moreover, since request can be called multiple times on one page—to load CSS and JavaScript—later requests transmit cookies set by previous responses. For example our guest book sets a cookie when the browser first requests the page and then receives that cookie when our browser later requests the page’s CSS file.

When the browser visits a page, it needs to send the cookie for that site:

class URL:
    def request(self, payload=None):
        # ...
        if self.host in COOKIE_JAR:
            cookie = COOKIE_JAR[self.host]
            request += "Cookie: {}\r\n".format(cookie)
        # ...

Symmetrically, the browser has to update the cookie jar when it sees a Set-Cookie header:A server can actually send multiple Set-Cookie headers to set multiple cookies in one request, though our browser won’t handle that correctly.

class URL:
    def request(self, payload=None):
        # ...
        if "set-cookie" in response_headers:
            cookie = response_headers["set-cookie"]
            COOKIE_JAR[self.host] = cookie
        # ...

You should now be able to use your browser to log in to the guest book and post to it. Moreover, you should be able to open the guest book in two browsers simultaneously—maybe your browser and a real browser as well—and log in and post as two different users.

Now that our browser supports cookies and uses them for logins, we need to make sure cookie data is safe from malicious actors. After all, the cookie is the browser’s identity, so if someone stole it, the server would think they are you. We need to prevent that.

At one point, an attempt was made to “clean up” the cookie specification in RFC 2965, including human-readable cookie descriptions and cookies restricted to certain ports. This required introducing the Cookie2 and Set-Cookie2 headers; the new headers were not popular. They are now obsolete.

Cross-site Requests

Cookies are site-specific, so one server shouldn’t be sent another server’s cookies.Well… Our connection isn’t encrypted, so an attacker could read it from an open Wi-Fi connection. But another server couldn’t. Or how about this attack: another server could hijack our DNS and redirect our hostname to a different IP address, and then steal our cookies. Some internet service providers support DNSSEC, which prevents this, but not all. Or consider this attack: a state-level attacker could announce fradulent BGP (Border Gateway Protocol) routes, which would send even a correctly retrieved IP address to the wrong physical computer. (Security is very hard.) But if an attacker is clever, they might be able to get the server or the browser to help them steal cookie values.

The easiest way for an attacker to steal your private data is to ask for it. Of course, there’s no API in the browser for a website to ask for another website’s cookies. But there is an API to make requests to another website. It’s called XMLHttpRequest.It’s a weird name! Why is XML capitalized but not Http? And it’s not restricted to XML! Ultimately, the naming is historical, dating back to Microsoft’s “Outlook Web Access” feature for Exchange Server 2000.

XMLHttpRequest sends asynchronous HTTP requests from JavaScript. Since I’m using XMLHttpRequest just to illustrate security issues, I’ll implement a minimal version here. Specifically, I’ll support only synchronous requests.Synchronous XMLHttpRequests are slowly moving through deprecation and obsolescence, but I’m using them here because they are easier to implement. We’ll implement the asynchronous variant in Chapter 12. Using this minimal XMLHttpRequest looks like this:

x = new XMLHttpRequest();
x.open("GET", url, false);
x.send();
// use x.responseText

We’ll define the XMLHttpRequest objects and methods in JavaScript. The open method will just save the method and URL:XMLHttpRequest has more options not implemented here, like support for usernames and passwords. This code is also missing some error checking, like making sure the method is a valid HTTP method supported by our browser.

function XMLHttpRequest() {}

XMLHttpRequest.prototype.open = function(method, url, is_async) {
    if (is_async) throw Error("Asynchronous XHR is not supported");
    this.method = method;
    this.url = url;
}

The send method calls an exported function:As above, this implementation skips important XMLHttpRequest features, like setting request headers (and reading response headers), changing the response type, or triggering various events and callbacks during the request.

XMLHttpRequest.prototype.send = function(body) {
    this.responseText = call_python("XMLHttpRequest_send",
        this.method, this.url, body);
}

The XMLHttpRequest_send function just calls request:Note that the method argument is ignored, because our request function chooses the method on its own based on whether a payload is passed. This doesn’t match the standard (which allows POST requests with no payload), and I’m only doing it here for convenience.

class JSContext:
    def XMLHttpRequest_send(self, method, url, body):
        full_url = self.tab.url.resolve(url)
        headers, out = full_url.request(body)
        return out

With XMLHttpRequest, a web page can make HTTP requests in response to user actions, making websites more interactive (see Figure 2). This API, and newer analogs like fetch, are how websites allow you to like a post, see hover previews, or submit a form without reloading.

XMLHttpRequest objects have setRequestHeader and getResponseHeader methods to control HTTP headers. However, this could allow a script to interfere with the cookie mechanism or with other security measures, so some request and response headers are not accessible from JavaScript.

Same-origin Policy

However, new capabilities lead to new responsibilities. HTTP requests sent with XMLHttpRequest include cookies. This is by design: when you “like” something, the server needs to associate the “like” to your account. But it also means that XMLHttpRequest can access private data, and thus there is a need to protect it.

Let’s imagine an attacker wants to know your username on our guest book server. When you’re logged in, the guest book includes your username on the page (where it says “Hello, so and so”), so reading the guest book with your cookies is enough to determine your username.

With XMLHttpRequest, an attacker’s websiteWhy is the user on the attacker’s site? Perhaps it has funny memes, or it’s been hacked and is being used for the attack against its will, or perhaps the evildoer paid for ads on sketchy websites where users have low standards for security anyway. could request the guest book page:

x = new XMLHttpRequest();
x.open("GET", "http://localhost:8000/", false);
x.send();
user = x.responseText.split(" ")[2].split("<")[0];

The issue here is that one server’s web page content is being sent to a script running on a website delivered by another server. Since the content is derived from cookies, this leaks private data.

To prevent issues like this, browsers have a same-origin policy, which says that requests like XMLHttpRequestSome kinds of request are not subject to the same-origin policy (most prominently CSS and JavaScript files linked from a web page); conversely, the same-origin policy also governs JavaScript interactions with iframes, images, localStorage and many other browser features. can only go to web pages on the same “origin”—scheme, hostname, and port.You may have noticed that this is not the same definition of “website” as cookies use: cookies don’t care about scheme or port! This seems to be an oversight or incongruity left over from the messy early web. This way, a website’s private data has to stay on that website, and cannot be leaked to an attacker on another server.

Let’s implement the same-origin policy for our browser. We’ll need to compare the URL of the request to the URL of the page we are on:

class JSContext:
    def XMLHttpRequest_send(self, method, url, body):
        # ...
        if full_url.origin() != self.tab.url.origin():
            raise Exception("Cross-origin XHR request not allowed")
        # ...

The origin function can just strip off the path from a URL:

class URL:
    def origin(self):
        return self.scheme + "://" + self.host + ":" + str(self.port)

Now an attacker can’t read the guest book web page. But can they write to it? Actually…

One interesting form of the same-origin policy involves images and the HTML <canvas> element. The drawImage method allows drawing an image to a canvas, even if that image was loaded from another origin. But to prevent that image from being read back with getImageData or related methods, writing cross-origin data to a canvas taints it, blocking read methods.

Cross-site Request Forgery

The same-origin policy prevents cross-origin XMLHttpRequest calls. But the same-origin policy doesn’t apply to normal browser actions like clicking a link or filling out a form. This enables an exploit called cross-site request forgery, often shortened to CSRF.

In cross-site request forgery, instead of using XMLHttpRequest, the attacker uses a form that submits to the guest book:

<form action="http://localhost:8000/add" method=post>
  <p><input name=guest></p>
  <p><button>Sign the book!</button></p>
</form>

Even though this form is on the evildoer’s website, when you submit the form, the browser will make an HTTP request to the guest book. And that means it will send its guest book cookies, so it will be logged in, so the guest book code will allow a post. But the user has no way of knowing which server a form submits to—the attacker’s web page could have misrepresented that—so they may have posted something they didn’t mean to.Even worse, the form submission could be triggered by JavaScript, with the user not involved at all. And this kind of attack can be further disguised by hiding the entry widget, pre-filling the post, and styling the button to look like a normal link.

Of course, the attacker can’t read the response, so this doesn’t leak private data to the attacker. But it can allow the attacker to act as the user! Posting a comment this way is not too scary (though shady advertisers will pay for it!) but posting a bank transaction is. And if the website has a change-of-password form, there could even be a way to take control of the account.

Unfortunately, we can’t just apply the same-origin policy to form submissions.For example, many search forms on websites submit to Google, because those websites don’t have their own search engines. So how do we defend against this attack?

To start with, there are things the server can do. The usual advice is to give a unique identity to every form the server serves, and make sure that every POST request comes from one of them. The way to do that is to embed a secret value, called a nonce, into the form, and to reject form submissions that don’t come with the right secret value.Note the similarity to cookies, except that instead of granting identity to browsers, we grant one to forms. Like a cookie, a nonce can be stolen with cross-site scripting. You can only get a nonce from the server, and the nonce is tied to the user session,It’s important that nonces are associated with the particular user. Otherwise, the attacker can generate a nonce for themselves and insert it into a form meant for the user. so the attacker could not embed it in their form.

To implement this fix, generate a nonce and save it in the user session when a form is requested:Usually <input type=hidden> is invisible, though our browser doesn’t support this.

def show_comments(session):
    # ...
    if "user" in session:
        nonce = str(random.random())[2:]
        session["nonce"] = nonce
        # ...
        out +=   "<input name=nonce type=hidden value=" + nonce + ">"

When a form is submitted, the server checks that the right nonce is submitted with it:In real websites it’s usually best to allow one user to have multiple active nonces, so that a user can open two forms in two tabs without that overwriting the valid nonce. To prevent the nonce set from growing over time, you’d have nonces expire after a while. I’m skipping this here, because it’s not the focus of this chapter.

def add_entry(session, params):
    if "nonce" not in session or "nonce" not in params: return
    if session["nonce"] != params["nonce"]: return
    # ...

Now this form can’t be submitted except from our website. Repeat this nonce fix for each form in the application, and it’ll be secure from CSRF attacks. But server-side solutions are fragile (what if you forget a form?) and relying on every website out there to do it right is a pipe dream. It’d be better for the browser to provide a fail-safe backup.

One unusual attack, similar in spirit to cross-site request forgery, is click-jacking. In this attack, an external site in a transparent iframe is positioned over the attacker’s site. The user thinks they are clicking around one site, but they actually take actions on a different one. Nowadays, sites can prevent this with the frame-ancestors directive to Content-Security-Policy or the older X-Frame-Options header.

SameSite Cookies

For form submissions, that fail-safe solution is SameSite cookies. The idea is that if a server marks its cookies SameSite, the browser will not send them in cross-site form submissions.At the time of writing the SameSite cookie standard is still in a draft stage, and not all browsers implement that draft fully. So it’s possible that this section may become out of date, though some kind of SameSite cookies will probably be ratified. The MDN page is helpful for checking the current status of SameSite cookies.

A cookie is marked SameSite in the Set-Cookie header like this:

Set-Cookie: foo=bar; SameSite=Lax

The SameSite attribute can take the value Lax, Strict, or None, and as I write, browsers have and plan different defaults. Our browser will implement only Lax and None, and default to None. When SameSite is set to Lax, the cookie is not sent on cross-site POST requests, but is sent on same-site POST or cross-site GET requests.Cross-site GET requests are also known as “clicking a link”, which is why those are allowed in Lax mode. The Strict version of SameSite blocks these too, but you need to design your web application carefully for this to work.

First, let’s modify COOKIE_JAR to store cookie/parameter pairs, and then parse those parameters out of Set-Cookie headers:

def request(self, payload=None):
    if "set-cookie" in response_headers:
        cookie = response_headers["set-cookie"]
        params = {}
        if ";" in cookie:
            cookie, rest = cookie.split(";", 1)
            for param in rest.split(";"):
                if '=' in param:
                    param, value = param.split("=", 1)
                else:
                    value = "true"
                params[param.strip().casefold()] = value.casefold()
        COOKIE_JAR[self.host] = (cookie, params)

When sending a cookie in an HTTP request, the browser only sends the cookie value, not the parameters:

def request(self, payload=None):
    if self.host in COOKIE_JAR:
        cookie, params = COOKIE_JAR[self.host]
        request += "Cookie: {}\r\n".format(cookie)

This stores the SameSite parameter of a cookie. But to actually use it, we need to know which site an HTTP request is being made from. Let’s add a new referrer parameter to request to track that:The “referrer” is the web page that “referred” our browser to make the current request. SameSite cookies are actually supposed to use the “top-level site”, not the referrer, to determine if the cookies should be sent, but the differences are subtle and I’m skipping them for simplicity.

class URL:
    def request(self, referrer, payload=None):
        # ...

Our browser calls request in three places, and we need to send the top-level URL in each case. At the top of load, it makes the initial request to a page. Modify it like so:

class Tab:
    def load(self, url, payload=None):
        headers, body = url.request(self.url, payload)
        # ...

Here, url is the new URL to visit, but self.url is the URL of the page the request comes from. Make sure this line comes at the top of load, before self.url is changed!

Later, the browser loads styles and scripts with more request calls:

class Tab:
    def load(self, url, payload=None):
        # ...
        for script in scripts:
            # ...
            try:
                header, body = script_url.request(url)
            except:
                continue
            # ...
        # ...
        for link in links:
            # ...
            try:
                header, body = style_url.request(url)
            except:
                continue
            # ...
        # ...

For these requests the top-level URL is the new URL being loaded. That’s because it is the new page that made us request these particular styles and scripts, so it defines which of those resources are on the same site.

Similarly, XMLHttpRequest-triggered requests use the tab URL as their top-level URL:

class JSContext:
    def XMLHttpRequest_send(self, method, url, body):
        # ...
        headers, out = full_url.request(self.tab.url, body)
        # ...

The request function can now check the referrer argument before sending SameSite cookies. Remember that SameSite cookies are only sent for GET requests or if the new URL and the top-level URL have the same host name:As I write this, some browsers also check that the new URL and the top-level URL have the same scheme and some browsers ignore subdomains, so that www.foo.com and login.foo.com are considered the “same site”. If cookies were invented today, they’d probably be specific to URL origins (in fact, there is an effort to do just that), much like content security policies, but alas historical contingencies and backward compatibility force rules that are more complex but easier to deploy.

def request(self, referrer, payload=None):
    if self.host in COOKIE_JAR:
        # ...
        cookie, params = COOKIE_JAR[self.host]
        allow_cookie = True
        if referrer and params.get("samesite", "none") == "lax":
            if method != "GET":
                allow_cookie = self.host == referrer.host
        if allow_cookie:
            request += "Cookie: {}\r\n".format(cookie)
        # ...

Note that we check whether the referrer is set—it won’t be when we’re loading the first web page in a new tab.

Our guest book can now mark its cookies SameSite:

def handle_connection(conx):
    if 'cookie' not in headers:
        template = "Set-Cookie: token={}; SameSite=Lax\r\n"
        response += template.format(token)

SameSite provides a kind of “defense in depth”, a fail-safe that makes sure that even if we forgot a nonce somewhere, we’re still secure against CSRF attacks. But don’t remove the nonces we added earlier! They’re important for older browsers and are more flexible in cases like multiple domains.

The web was not initially designed around security, which has led to some awkward patches after the fact. These patches may be ugly, but a dedication to backward compatibility is a strength of the web, and at least newer APIs can be designed around more consistent policies.

To this end, while there is a full specification for SameSite, it is still the case that real browsers support different subsets of the feature or different defaults. For example, Chrome defaults to Lax, but Firefox and Safari do not. Likewise, Chrome uses the scheme (https or http) as part of the definition of a “site”,This is called “schemeful same-site”. but other browsers may not. The main reason for this situation is the need to maintain backward compatibility with existing websites.

Cross-site Scripting

Now other websites can’t misuse our browser’s cookies to read or write private data. This seems secure! But what about our own website? With cookies accessible from JavaScript, any scripts run on our browser could, in principle, read the cookie value. This might seem benign—doesn’t our browser only run comment.js? But in fact…

A web service needs to defend itself from being misused. Consider the code in our guest book that outputs guest book entries:

out += "<p>" + entry + "\n"
out += "<i>by " + who + "</i></p>"

Note that entry can be anything, including anything the user might stick into our comment form. That includes HTML tags, like a custom <script> tag! So, a malicious user could post this comment:

Hi! <script src="http://my-server/evil.js"></script>

The server would then output this HTML:

<p>Hi! <script src="http://my-server/evil.js"></script>
<i>by crashoverride</i></p>

Every user’s browser would then download and run the evil.js script, which can sendA site’s cookies and cookie parameters are available to scripts running on that site through the document.cookie API. See Exercise 10-5 for more details on how web servers can opt in to allowing cross-origin requests. To steal cookies, it’s the attacker’s server that would to opt in to receiving stolen cookies. Or, in a real browser, evil.js could add images or scripts to the page to trigger additional requests. In our limited browser the attack has to be a little clunkier, but the evil script can still, for example, replace the whole page with a link that goes to their site and includes the token value in the URL. You’ve seen “please click to continue” screens and have clicked through unthinkingly; your users will too. the cookies to the attacker. The attacker could then impersonate other users, posting as them or misusing any other capabilities those users had.

The core problem here is that user comments are supposed to be data, but the browser is interpreting them as code. In web applications, this kind of exploit is usually called cross-site scripting (often written “XSS”), though misinterpreting data as code is a common security issue in all kinds of programs.

The standard fix is to encode the data so that it can’t be interpreted as code. For example, in HTML, you can write &lt; to display a less-than sign.You may have implemented this in Exercise 1-4. Python has an html module for this kind of encoding:

import html

def show_comments(session):
    # ...
    out += "<p>" + html.escape(entry) + "\n"
    out += "<i>by " + html.escape(who) + "</i></p>"
    # ...

This is a good fix, and every application should be careful to do this escaping. But if you forget to encode any text anywhere—that’s a security bug. So browsers provide additional layers of defense.

Since the CSS parser we implemented in Chapter 6 is very permissive, some HTML pages also parse as valid CSS. This leads to an attack: include an external HTML page as a style sheet and observe the styling it applies. A similar attack involves including external JSON files as scripts. Setting a Content-Type header can prevent this sort of attack thanks to browsers’ Cross-Origin Read Blocking policy.

Content Security Policy

One such layer is the Content-Security-Policy header. The full specification for this header is quite complex, but in the simplest case, the header is set to the keyword default-src followed by a space-separated list of servers:

Content-Security-Policy: default-src http://example.org

This header asks the browser not to load any resources (including CSS, JavaScript, images, and so on) except from the listed origins. If our guest book used Content-Security-Policy, even if an attacker managed to get a <script> added to the page, the browser would refuse to load and run that script.

Let’s implement support for this header. First, we’ll need request to return the response headers:

class URL:
    def request(self, referrer, payload=None):
        # ...
        return response_headers, content

Make sure to update all existing uses of request to ignore the headers.

Next, we’ll need to extract and parse the Content-Security-Policy header when loading a page:In real browsers Content-Security-Policy can also list scheme-generic URLs and other sources like self. And there are keywords other than default-src, to restrict styles, scripts, and XMLHttpRequests each to their own set of URLs.

class Tab:
    def load(self, url, payload=None):
        # ...
        self.allowed_origins = None
        if "content-security-policy" in headers:
            csp = headers["content-security-policy"].split()
            if len(csp) > 0 and csp[0] == "default-src":
                self.allowed_origins = []
                for origin in csp[1:]:
                    self.allowed_origins.append(URL(origin).origin())
        # ...

This parsing needs to happen before we request any JavaScript or CSS, because we now need to check whether those requests are allowed:

class Tab:
    def load(self, url, payload=None):
        # ...
        for script in scripts:
            script_url = url.resolve(script)
            if not self.allowed_request(script_url):
                print("Blocked script", script, "due to CSP")
                continue
            # ...

Note that we need to first resolve relative URLs to know if they’re allowed. Add a similar test to the CSS-loading code.

XMLHttpRequest URLs also need to be checked:Note that when loading styles and scripts, our browser merely ignores blocked resources, while for blocked XMLHttpRequests it throws an exception. That’s because exceptions in XMLHttpRequest calls can be caught and handled in JavaScript.

class JSContext:
    def XMLHttpRequest_send(self, method, url, body):
        full_url = self.tab.url.resolve(url)
        if not self.tab.allowed_request(full_url):
            raise Exception("Cross-origin XHR blocked by CSP")
        # ...

The allowed_request check needs to handle both the case where there is no Content-Security-Policy and the case where there is one:

class Tab:
    def allowed_request(self, url):
        return self.allowed_origins == None or \
            url.origin() in self.allowed_origins

The guest book can now send a Content-Security-Policy header:

def handle_connection(conx):
    # ...
    csp = "default-src http://localhost:8000"
    response += "Content-Security-Policy: {}\r\n".format(csp)
    # ...

To check that our implementation works, let’s have the guest book request a script from outside the list of allowed servers:

def show_comments(session):
    # ...
    out += "<script src=https://example.com/evil.js></script>"
    # ...

If you’ve got everything implemented correctly, the browser should block the evil scriptNeedless to say, example.com does not actually host an evil.js file, and any request to it returns “404 Not Found”. and report so in the console.

So are we done? Is the guest book totally secure? Uh … no. There’s more—much, much more—to web application security than what’s in this book. And just like the rest of this book, there are many other browser mechanisms that touch on security and privacy. Let’s settle for this fact: the guest book is more secure than before.

On a complicated site, deploying Content-Security-Policy can accidentally break something. For this reason, browsers can automatically report Content-Security-Policy violations to the server, using the report-to directive. The Content-Security-Policy-Report-Only header asks the browser to report violations of the content security policy without actually blocking the requests.

Summary

We’ve added user data, in the form of cookies, to our browser, and immediately had to bear the heavy burden of securing that data and ensuring it was not misused. That involved:

We’ve also seen the more general lesson that every increase in the capabilities of a web browser also leads to an increase in its responsibility to safeguard user data. Security is an ever-present consideration throughout the design of a web browser.

The purpose of this book is to teach the internals of web browsers, not to teach web application security. There’s much more you’d want to do to make this guest book truly secure, let alone what we’d need to do to avoid denial of service attacks or to handle spam and malicious use. Please consult other sources before working on security-critical code.

Click here to try this chapter’s browser.

Outline

The complete set of functions, classes, and methods in our browser should now look something like this:

COOKIE_JAR class URL: def __init__(url) def request(referrer, payload) def resolve(url) def origin() def __str__() class Text: def __init__(text, parent) def __repr__() class Element: def __init__(tag, attributes, parent) def __repr__() def print_tree(node, indent) def tree_to_list(tree, list) class HTMLParser: SELF_CLOSING_TAGS HEAD_TAGS def __init__(body) def parse() def get_attributes(text) def add_text(text) def add_tag(tag) def implicit_tags(tag) def finish() class CSSParser: def __init__(s) def whitespace() def literal(literal) def word() def ignore_until(chars) def pair() def selector() def body() def parse() class TagSelector: def __init__(tag) def matches(node) class DescendantSelector: def __init__(ancestor, descendant) def matches(node) FONTS def get_font(size, weight, style) DEFAULT_STYLE_SHEET INHERITED_PROPERTIES def style(node, rules) def cascade_priority(rule) WIDTH, HEIGHT HSTEP, VSTEP class Rect: def __init__(left, top, right, bottom) def containsPoint(x, y) INPUT_WIDTH_PX BLOCK_ELEMENTS class DocumentLayout: def __init__(node) def layout() def should_paint() def paint() class BlockLayout: def __init__(node, parent, previous) def layout_mode() def layout() def recurse(node) def new_line() def word(node, word) def input(node) def self_rect() def should_paint() def paint() class LineLayout: def __init__(node, parent, previous) def layout() def should_paint() def paint() class TextLayout: def __init__(node, word, parent, previous) def layout() def should_paint() def paint() class InputLayout: def __init__(node, parent, previous) def layout() def should_paint() def paint() def self_rect() class DrawText: def __init__(x1, y1, text, font, color) def execute(scroll, canvas) class DrawRect: def __init__(rect, color) def execute(scroll, canvas) class DrawLine: def __init__(x1, y1, x2, y2, color, thickness) def execute(scroll, canvas) class DrawOutline: def __init__(rect, color, thickness) def execute(scroll, canvas) def paint_tree(layout_object, display_list) EVENT_DISPATCH_JS RUNTIME_JS class JSContext: def __init__(tab) def run(script, code) def dispatch_event(type, elt) def get_handle(elt) def querySelectorAll(selector_text) def getAttribute(handle, attr) def innerHTML_set(handle, s) def XMLHttpRequest_send(...) SCROLL_STEP class Tab: def __init__(tab_height) def load(url, payload) def render() def draw(canvas, offset) def allowed_request(url) def scrolldown() def click(x, y) def go_back() def submit_form(elt) def keypress(char) class Chrome: def __init__(browser) def tab_rect(i) def paint() def click(x, y) def keypress(char) def enter() def blur() class Browser: def __init__() def draw() def new_tab(url) def handle_down(e) def handle_click(e) def handle_key(e) def handle_enter(e)

The server has also grown since the previous chapter:

SESSIONS def handle_connection(conx) ENTRIES LOGINS def do_request(session, method, url, headers, body) def form_decode(body) def show_comments(session) def login_form(session) def do_login(session, params) def not_found(url, method) def add_entry(session, params)

Exercises

10-1 New inputs. Add support for hidden and password input elements. Hidden inputs shouldn’t show up or take up space, while password input elements should show their contents as stars instead of characters.

10-2 Certificate errors. When accessing an HTTPS page, the web server can send an invalid certificate (badssl.com hosts various invalid certificates you can use for testing). In this case, the wrap_socket function will raise a certificate error; catch these errors and show a warning message to the user. For all other HTTPS pages draw a padlock (spelled \N{lock}) in the address bar.

10-3 Script access. Implement the document.cookie JavaScript API. Reading this field should return a string containing the cookie value and parameters, formatted similarly to the Cookie header. Writing to this field updates the cookie value and parameters, just like receiving a Set-Cookie header does. Also implement the HttpOnly cookie parameter; cookies with this parameter cannot be read or written from JavaScript.

10-4 Cookie expiration. Add support for cookie expiration. Cookie expiration dates are set in the Set-Cookie header, and can be overwritten if the same cookie is set again with a later date. On the server side, save the expiration date in the SESSIONS variable and use it to delete old sessions to save memory.

10-5 Cross-origin resource sharing (CORS). Web servers can opt in to allowing cross-origin XMLHttpRequests. The way it works is that on cross-origin HTTP requests, the browser makes the request and includes an Origin header with the origin of the requesting site; this request includes cookies for the target origin. To satisfy the same-origin policy, the browser then throws away the response. But the server can send the Access-Control-Allow-Origin header, and if its value is either the requesting origin or the special * value, the browser returns the response to the script instead. All requests made by your browser will be what the CORS standard calls “simple requests”.

10-6 Referer. When your browser visits a web page, or when it loads a CSS or JavaScript file, it sends a Referer headerYep, spelled that way. containing the URL it is coming from. Sites often use this for analytics. Implement this in your browser. However, some URLs contain personal data that they don’t want revealed to other websites, so browsers support a Referrer-Policy header,Yep, spelled that way. which can contain values like no-referrerYep, spelled that way. (never send the Referer header when leaving this page) or same-origin (only do so if navigating to another page on the same origin). Implement those two values for Referrer-Policy.

Adding Visual Effects

Right now our browser can only draw colored rectangles and text—pretty boring! Real browsers support all kinds of visual effects that change how pixels and colors blend together. To implement those effects, and also make our browser faster, we’ll need control over surfaces, the key low-level feature behind fast scrolling, visual effects, animations, and many other browser capabilities. To get that control, we’ll also switch to using the Skia graphics library and even take a peek under its hood.

Installing Skia and SDL

While Tkinter is great for basic shapes and input handling, it doesn’t give us control over surfacesThat’s because Tk, the graphics library that Tkinter uses, dates from the early 1990s, before high-performance graphics cards and GPUs became widespread. and lacks implementations of most visual effects. Implementing them ourselves would be fun, but it’s outside the scope of this book, so we need a new graphics library. Let’s use Skia, the library that Chromium uses. Unlike Tkinter, Skia doesn’t handle inputs or create graphical windows, so we’ll pair it with the SDL GUI library. Beyond new capabilities, switching to Skia will allow us to control graphics and rasterization at a lower level.

Start by installing Skia and SDL:

python3 -m pip install 'skia-python==87.*' pysdl2 pysdl2-dll

As elsewhere in this book, you may need to install the pip package first, or use your IDE’s package installer. If you’re on Linux, you’ll need to install additional dependencies, like OpenGL and fontconfig. Also, you may not be able to install pysdl2-dll; if so, you’ll need to find SDL in your system package manager instead. Consult the skia-python and pysdl2 web pages for more details.

Note that I’m explicitly installing Skia version 87. Skia makes regular releases that change APIs or break compatibility; version 87 is fairly old and should work reliably on most systems. In your own projects, or before filing bug reports in Skia, please do use more recent Skia releases. It’s also possible that future Python version no longer support Skia 87; our porting notes explain how to use recent Skia releases for the code in this book.

Once installed, remove the tkinter imports from browser and replace them with these:

import ctypes
import sdl2
import skia

The ctypes module is a standard part of Python; we’ll use it to convert between Python and C types. If any of these imports fail, check that Skia and SDL were installed correctly.

The <canvas> HTML element provides a JavaScript API that is similar to Skia and Tkinter. Combined with WebGL, it’s possible to implement basically all of SDL and Skia in JavaScript. Alternatively, one can compile Skia to WebAssembly to do the same.

SDL Creates the Window

The first big task is to switch to using SDL to create the window and handle events. The main loop of the browser first needs some boilerplate to get SDL started:

if __name__ == "__main__":
    sdl2.SDL_Init(sdl2.SDL_INIT_EVENTS)
    browser = Browser()
    browser.new_tab(URL(sys.argv[1]))
    # ...

Next, we need to create an SDL window, instead of a Tkinter window, inside the Browser. Here’s the SDL incantation:

class Browser:
    def __init__(self):
        self.sdl_window = sdl2.SDL_CreateWindow(b"Browser",
            sdl2.SDL_WINDOWPOS_CENTERED, sdl2.SDL_WINDOWPOS_CENTERED,
            WIDTH, HEIGHT, sdl2.SDL_WINDOW_SHOWN)

Now that we’ve created a window, we need to handle events sent to it. SDL doesn’t have a mainloop or bind method; we have to implement it ourselves:

def mainloop(browser):
    event = sdl2.SDL_Event()
    while True:
        while sdl2.SDL_PollEvent(ctypes.byref(event)) != 0:
            if event.type == sdl2.SDL_QUIT:
                browser.handle_quit()
                sdl2.SDL_Quit()
                sys.exit()
            # ...

The details of ctypes and PollEvent aren’t too important here, but note that SDL_QUIT is an event, sent when the user closes the last open window. The handle_quit method it calls just cleans up the window object:

class Browser:
    def handle_quit(self):
        sdl2.SDL_DestroyWindow(self.sdl_window)

Call mainloop in place of tkinter.mainloop:

if __name__ == "__main__":
    # ...
    mainloop(browser)

In place of all the bind calls in the Browser constructor, we can just directly call methods for various types of events, like clicks, typing, and so on. The SDL syntax looks like this:

def mainloop(browser):
    while True:
        while sdl2.SDL_PollEvent(ctypes.byref(event)) != 0:
            # ...
            elif event.type == sdl2.SDL_MOUSEBUTTONUP:
                browser.handle_click(event.button)
            elif event.type == sdl2.SDL_KEYDOWN:
                if event.key.keysym.sym == sdl2.SDLK_RETURN:
                    browser.handle_enter()
                elif event.key.keysym.sym == sdl2.SDLK_DOWN:
                    browser.handle_down()
            elif event.type == sdl2.SDL_TEXTINPUT:
                browser.handle_key(event.text.text.decode('utf8'))

I’ve changed the signatures of the various event handler methods. For example, the handle_click method is now passed a MouseButtonEvent object, which thankfully contains x and y coordinates, while the handle_enter and handle_down methods aren’t passed any argument at all, because we don’t use that argument anyway. You’ll need to change the Browser methods’ signatures to match.

SDL is most popular for making games. Their site lists a selection of books about game programming in SDL.

Surfaces and Pixels

Let’s peek under the hood of these SDL calls. When we create an SDL window, we’re asking SDL to allocate a surface, a chunk of memory representing the pixels on the screen.A surface may or may not be bound to the physical pixels on the screen via a window, and there can be many surfaces. A canvas is an API interface that allows you to draw into a surface with higher-level commands such as for rectangles or text. Our browser uses separate Skia and SDL surfaces for simplicity, but in a highly optimized browser, minimizing the number of surfaces is important for good performance. Creating and managing surfaces is going to be the big focus of this chapter. On today’s large screens, surfaces take up a lot of memory, so handling surfaces well is essential to good browser performance.

A surface is a representation of a graphics buffer into which you can draw pixels (bits representing colors). We implicitly created an SDL surface when we created an SDL window; let’s also create a surface for Skia to draw to:

class Browser:
    def __init__(self):
        self.root_surface = skia.Surface.MakeRaster(
            skia.ImageInfo.Make(
                WIDTH, HEIGHT,
                ct=skia.kRGBA_8888_ColorType,
                at=skia.kUnpremul_AlphaType))

Each pixel has a color. Note the ct argument, meaning “color type”, which indicates that each pixel of this surface should be represented as red, green, blue, and alpha values, each of which should take up eight bits. In other words, pixels are basically defined like so:

class Pixel:
    def __init__(self, r, g, b, a):
        self.r = r
        self.g = g
        self.b = b
        self.a = a

This Pixel definition is an illustrative example, not actual code in our browser. It’s standing in for somewhat more complex code within SDL and Skia themselves.Skia actually represents colors as 32-bit integers, with the most significant byte representing the alpha value (255 meaning opaque and 0 meaning transparent) and the next three bytes representing the red, green, and blue color channels.

Defining colors via red, green, and blue components is fairly standardIt’s formally known as the sRGB color space, and it dates back to CRT (cathode-ray tube) displays, which had a pretty limited gamut of expressible colors. New technologies like LCD, LED, and OLED can display more colors, so CSS now includes syntax for expressing these new colors. Still, all color spaces have a limited gamut of expressible colors. and corresponds to how computer screens work.Actually, some screens contain lights besides red, green, and blue, including white, cyan, or yellow. Moreover, different screens can use slightly different reds, greens, or blues; professional color designers typically have to calibrate their screen to display colors accurately. For the rest of us, the software still communicates with the display in terms of standard red, green, and blue colors, and the display hardware converts them to whatever pixels it uses. For example, in CSS, we refer to arbitrary colors with a hash character and six hex digits, like #ffd700, with two digits each for red, green, and blue:Alpha is implicitly 255, meaning opaque, in this case.

def parse_color(color):
    if color.startswith("#") and len(color) == 7:
        r = int(color[1:3], 16)
        g = int(color[3:5], 16)
        b = int(color[5:7], 16)
        return skia.Color(r, g, b)

The colors we’ve seen so far can just be specified in terms of this syntax:

NAMED_COLORS = {
    "black": "#000000",
    "white": "#ffffff",
    "red":   "#ff0000",
    # ...
}

def parse_color(color):
    # ...
    elif color in NAMED_COLORS:
        return parse_color(NAMED_COLORS[color])
    else:
        return skia.ColorBLACK

You can add more named colors from the list as you come across them; the demos in this book use blue, green, lightblue, lightgreen, orange, orangered, and gray. Note that unsupported colors are interpreted as black, so that at least something is drawn to the screen.This is not the standards-required behavior—the invalid value should just not participate in styling, so an element styled with an unknown color might inherit a color other than black—but I’m doing it as a convenience.

Let’s now use our understanding of surfaces and colors to copy from the Skia surface, where we will draw the chrome and page content, to the SDL surface, which actually appears on the screen. This is a little hairy, because we are moving data between two low-level libraries, but really we’re just copying pixels from one place to another. First, get the sequence of bytes representing the Skia surface:

class Browser:
    def draw(self):
        # ...
        skia_image = self.root_surface.makeImageSnapshot()
        skia_bytes = skia_image.tobytes()

Next, we need to copy the data to an SDL surface. This requires telling SDL what order the pixels are stored in and your computer’s endianness:

class Browser:
    def __init__(self):
        if sdl2.SDL_BYTEORDER == sdl2.SDL_BIG_ENDIAN:
            self.RED_MASK = 0xff000000
            self.GREEN_MASK = 0x00ff0000
            self.BLUE_MASK = 0x0000ff00
            self.ALPHA_MASK = 0x000000ff
        else:
            self.RED_MASK = 0x000000ff
            self.GREEN_MASK = 0x0000ff00
            self.BLUE_MASK = 0x00ff0000
            self.ALPHA_MASK = 0xff000000

The CreateRGBSurfaceFrom method then wraps the data in an SDL surface (without copying the bytes):

class Browser:
    def draw(self):
        # ...
        depth = 32 # Bits per pixel
        pitch = 4 * WIDTH # Bytes per row
        sdl_surface = sdl2.SDL_CreateRGBSurfaceFrom(
            skia_bytes, WIDTH, HEIGHT, depth, pitch,
            self.RED_MASK, self.GREEN_MASK,
            self.BLUE_MASK, self.ALPHA_MASK)

Finally, we draw all this pixel data on the window itself by blitting (copying) it from sdl_surface to sdl_window’s surface:Note that since Skia and SDL are C++ libraries, they are not always consistent with Python’s garbage collection system. So the link between the output of tobytes and sdl_window is not guaranteed to be kept consistent when skia_bytes is garbage-collected. The SDL surface could be left pointing at a bogus piece of memory, leading to memory corruption or a crash. The code here is correct because all of these are local variables that are garbage-collected together, but if not you need to be careful to keep all of them alive at the same time.

class Browser:
    def draw(self):
        # ...
        rect = sdl2.SDL_Rect(0, 0, WIDTH, HEIGHT)
        window_surface = sdl2.SDL_GetWindowSurface(self.sdl_window)
        # SDL_BlitSurface is what actually does the copy.
        sdl2.SDL_BlitSurface(sdl_surface, rect, window_surface, rect)
        sdl2.SDL_UpdateWindowSurface(self.sdl_window)

So now we can copy from the Skia surface to the SDL window. One last step: we have to draw the browser to the Skia surface.

We take it for granted, but color standards like CIELAB derive from attempts to reverse-engineer human vision. Screens use red, green, and blue color channels to match the three types of cone cells in a human eye. These cone cells vary between people: some have more and some fewer (typically an inherited condition carried on the X chromosome). Moreover, different people have different ratios of cone types and those cone types use different protein structures that vary in the exact frequency of green, red, and blue that they respond to. The study of color thus combines software, hardware, chemistry, biology, and psychology.

Rasterizing with Skia

We want to draw text, rectangles, and so on to the Skia surface. This step—coloring in the pixels of a surface to draw shapes on it—is called “rasterization” and is one important task of a graphics library. In Skia, rasterization happens via a canvas API. A canvas is just an object that draws to a particular surface:

class Browser:
    def draw(self, canvas, offset):
        # ...
        canvas = self.root_surface.getCanvas()
        # ...

Our browser’s drawing commands will need to invoke Skia methods on this canvas. To draw a line, you use Skia’s Path object:Consult the Skia and skia-python documentation for more on the Skia API.

class DrawLine:
    def execute(self, canvas, scroll):
        path = skia.Path().moveTo(self.x1 - scroll, self.y1) \
                          .lineTo(self.x2 - scroll, self.y2)
        paint = skia.Paint(
            Color=parse_color(self.color),
            StrokeWidth=self.thickness,
            Style=skia.Paint.kStroke_Style,
        )
        canvas.drawPath(path, paint)

Note the steps involved here. We first create a Path object, and then call drawPath to actually draw this path to the canvas. This drawPath call takes a second argument, paint, which defines how to actually perform this drawing. We specify the color, but we also need to specify that we want to draw a line along the path, instead of filling in the interior of the path, which is the default. To do that we set the style to “stroke”, a standard term referring to drawing along the border of some shape.The opposite is “fill”, meaning filling in the interior of the shape.

We do something similar to draw text using drawString:

class DrawText:
    def execute(self, canvas, scroll):
        paint = skia.Paint(
            AntiAlias=True,
            Color=parse_color(self.color),
        )
        baseline = self.top - scroll - self.font.getMetrics().fAscent
        canvas.drawString(self.text, float(self.left), baseline,
            self.font, paint)

Note again that we create a Paint object identifying the color and asking for anti-aliased text.“Anti-alias”ing just means drawing some semi-transparent pixels to better approximate the shape of the text. This is important when drawing shapes with fine details, like text, but is less important when drawing large shapes like rectangles and lines. We don’t specify the “style” because we want to fill the interior of the text, the default.

Finally, for drawing rectangles you use drawRect:

class DrawRect:
    def execute(self, canvas, scroll):
        paint = skia.Paint(
            Color=parse_color(self.color),
        )
        canvas.drawRect(self.rect.makeOffset(0, -scroll), paint)

Here, the rect field needs to become a Skia Rect object. Get rid of the old Rect class that was introduced in Chapter 7 in favor of skia.Rect. Everywhere that a Rect was constructed, instead put skia.Rect.MakeLTRB (for “make left-top-right-bottom”) or MakeXYWH (for “make x-y-width-height”). Everywhere that the sides of the rectangle (e.g., left) were checked, replace them with the corresponding function on a Skia Rect (e.g., left()). Also replace calls to containsPoint with Skia’s contains.

While we’re here, let’s also add a rect field to the other drawing commands, replacing its top, left, bottom, and right fields:

class DrawText:
    def __init__(self, x1, y1, text, font, color):
        # ...
        self.rect = \
            skia.Rect.MakeLTRB(x1, y1, self.right, self.bottom)

class DrawLine:
    def __init__(self, x1, y1, x2, y2, color, thickness):
        # ...
        self.rect = skia.Rect.MakeLTRB(x1, y1, x2, y2)

To create an outline, draw a rectangle but set the Style parameter of the Paint to Stroke_Style:

class DrawOutline:
    def execute(self, scroll, canvas):
        paint = skia.Paint(
            Color=parse_color(self.color),
            StrokeWidth=self.thickness,
            Style=skia.Paint.kStroke_Style,
        )
        canvas.drawRect(self.rect.makeOffset(0, -scroll), paint)

Since we’re replacing Tkinter with Skia, we are also replacing tkinter.font. In Skia, a font object has two pieces: a Typeface, which is a type family with a certain weight, style, and width; and a Font, which is a Typeface at a particular size. It’s the Typeface that contains data and caches, so that’s what we need to cache:

def get_font(size, weight, style):
    key = (weight, style)
    if key not in FONTS:
        if weight == "bold":
            skia_weight = skia.FontStyle.kBold_Weight
        else:
            skia_weight = skia.FontStyle.kNormal_Weight
        if style == "italic":
            skia_style = skia.FontStyle.kItalic_Slant
        else:
            skia_style = skia.FontStyle.kUpright_Slant
        skia_width = skia.FontStyle.kNormal_Width
        style_info = \
            skia.FontStyle(skia_weight, skia_width, skia_style)
        font = skia.Typeface('Arial', style_info)
        FONTS[key] = font
    return skia.Font(FONTS[key], size)

Our browser also needs font metrics and measurements. In Skia, these are provided by the measureText and getMetrics methods. Let’s start with measureText replacing all calls to measure. For example, in the paint method in InputLayout, we must do:

class InputLayout:
    def paint(self):
        if self.node.is_focused:
            cx = self.x + self.font.measureText(text)
            # ...

There are measure calls in several other layout objects (both in paint and layout), in DrawText, in the draw method on Chrome, in the text method in BlockLayout, and in the layout method in TextLayout. Update all of them to use measureText.

Also, in the layout method of LineLayout and in DrawText we make calls to the metrics method on fonts. In Skia, this method is called getMetrics, and to get the ascent and descent we need the fAscent and fDescent fields on its result.

Importantly, in Skia the ascent needs to be negated. In Skia, ascent and descent are positive if they go downward and negative if they go upward, so ascents will normally be negative, the opposite of Tkinter. There’s no analog for the linespace field that Tkinter provides, but you can use descent minus ascent instead:

def linespace(font):
    metrics = font.getMetrics()
    return metrics.fDescent - metrics.fAscent

You should now be able to run the browser again. It should look and behave just as it did in previous chapters, and it might feel faster on complex pages, because Skia and SDL are in general faster than Tkinter. If the transition felt easy—well, that’s one of the benefits to abstracting over the drawing backend using a display list!

Finally, Skia also provides some new features. For example, Skia has native support for rounded rectangles via RRect objects. We can implement that by converting DrawRect to DrawRRect:

class DrawRRect:
    def __init__(self, rect, radius, color):
        self.rect = rect
        self.rrect = skia.RRect.MakeRectXY(rect, radius, radius)
        self.color = color

    def execute(self, scroll, canvas):
        paint = skia.Paint(
            Color=parse_color(self.color),
        )
        canvas.drawRRect(self.rrect, paint)

Then we can draw these rounded rectangles for backgrounds:

class BlockLayout:
    def paint(self):
        if bgcolor != "transparent":
            radius = float(
                self.node.style.get(
                    "border-radius", "0px")[:-2])
            cmds.append(DrawRRect(
                self.self_rect(), radius, bgcolor))

With that, this example:Note that the example listed here, in common with other examples present in the book, accesses a local resource (a CSS file in this case) that is also present on browser.engineering.

<link rel=stylesheet href="example11-longword.css">
<div>
Background is rounded
</div>

will round the corners of its background (see Figure 1).

Figure 1: Example of a rounded background.

Similar changes should be made to InputLayout. New shapes, like rounded rectangles, is one way that Skia is a more advanced rasterization library than Tk. More broadly, since Skia is also used by Chromium, we know it has fast, built-in support for all of the shapes we might need in a browser.

Font rasterization is surprisingly deep, with techniques such as subpixel rendering and hinting used to make fonts look better on lower-resolution screens. These techniques are much less necessary on high-pixel-density screens, though. It’s likely that all screens will eventually be high-density enough to retire these techniques.

Browser Compositing

Skia and SDL have just made our browser more complex, but the low-level control offered by these libraries is important because it allows us to optimize common interactions like scrolling.

So far, any time the user scrolled a web page, we had to clear the canvas and re-raster everything on it from scratch. This is inefficient—we’re drawing the same pixels, just in a different place. When the context is complex or the screen is large, rastering too often produces a visible slowdown and drains laptop and mobile batteries. Real browsers optimize scrolling using a technique I’ll call browser compositing: drawing the whole web page to a hidden surface, and only copying the relevant pixels to the window itself.

To implement this, we’ll need two new Skia surfaces: a surface for browser chrome and a surface for the current Tab’s contents. We’ll only need to re-raster the Tab surface if page contents change, but not when (say) the user types into the address bar. And we can scroll the Tab without any raster at all—we just copy a different part of the current Tab surface to the screen. Let’s call those surfaces chrome_surface and tab_surface:We could even use a different surface for each Tab, but real browsers don’t do this, since each surface uses up a lot of memory, and typically users don’t notice the small raster delay when switching tabs.

class Browser:
    def __init__(self):
        # ...
        self.chrome_surface = skia.Surface(
            WIDTH, math.ceil(self.chrome.bottom))
        self.tab_surface = None

I’m not explicitly creating tab_surface right away, because we need to lay out the page contents to know how tall the surface needs to be.

We’ll also need to split the browser’s draw method into three parts:

Let’s start by doing the split:

class Browser:
    def raster_tab(self):
        canvas = self.tab_surface.getCanvas()
        canvas.clear(skia.ColorWHITE)
        # ...

    def raster_chrome(self):
        canvas = self.chrome_surface.getCanvas()
        canvas.clear(skia.ColorWHITE)
        # ...

    def draw(self):
        canvas = self.root_surface.getCanvas()
        canvas.clear(skia.ColorWHITE)
        # ...

Since we didn’t create the tab_surface on startup, we need to create it at the top of raster_tab:For a very big web page, tab_surface can be much larger than the size of the SDL window, and therefore take up a very large amount of memory. We’ll ignore that, but a real browser would only paint and raster surface content up to a certain distance from the visible region, and re-paint/raster as the user scrolls.

import math

class Browser:
    def raster_tab(self):
        tab_height = math.ceil(
            self.active_tab.document.height + 2*VSTEP)

        if not self.tab_surface or \
                tab_height != self.tab_surface.height():
            self.tab_surface = skia.Surface(WIDTH, tab_height)

        # ...

Note that we need to recreate the tab surface if the page’s height changes. The way we compute the page bounds here, based on the layout tree’s height, would be incorrect if page elements could stick out below (or to the right) of their parents—but our browser doesn’t support any features like that.

Next, draw should copy from the chrome and tab surfaces to the root surface. Moreover, we need to translate the tab_surface down by chrome_bottom and up by scroll, and clip it to just the area of the window that doesn’t overlap the browser chrome:

class Browser:
    def draw(self):
        # ...
        
        tab_rect = skia.Rect.MakeLTRB(
            0, self.chrome.bottom, WIDTH, HEIGHT)
        tab_offset = self.chrome.bottom - self.active_tab.scroll
        canvas.save()
        canvas.clipRect(tab_rect)
        canvas.translate(0, tab_offset)
        self.tab_surface.draw(canvas, 0, 0)
        canvas.restore()

        chrome_rect = skia.Rect.MakeLTRB(
            0, 0, WIDTH, self.chrome.bottom)
        canvas.save()
        canvas.clipRect(chrome_rect)
        self.chrome_surface.draw(canvas, 0, 0)
        canvas.restore()

        # ...

Note the draw calls: these copy the tab_surface and chrome_surface to the canvas, which is bound to root_surface. The clipRect and translate calls make sure we copy the right parts.

Finally, everywhere in Browser that we call draw, we now need to call either raster_tab or raster_chrome first. For example, in handle_click, we do this:

class Browser:
    def handle_click(self, e):
        if e.y < self.chrome.bottom:
            # ...
            self.raster_chrome()
        else:
            # ...
            self.raster_tab()
        self.draw()

Notice how we don’t redraw the chrome when only the tab changes, and vice versa. Likewise, in handle_down, we don’t need to call raster_tab at all, since scrolling doesn’t change the page.

However, clicking on a web page can cause it to navigate to a new one, so we do need to detect that and raster the browser chrome if the URL changed:

class Browser:
    def handle_click(self, e):
        if e.y < self.chrome.bottom:
            # ...
        else:
            # ...
            url = self.active_tab.url
            tab_y = e.y - self.chrome.bottom
            self.active_tab.click(e.x, tab_y)
            if self.active_tab.url != url:
                self.raster_chrome()
            self.raster_tab()

We also have some related changes in Tab. Let’s rename Tab’s draw method to raster. In it, we no longer need to pass around the scroll offset to the execute methods, or account for chrome_bottom, because we always draw the whole tab to the tab surface:

class Tab:
    def raster(self, canvas):
        for cmd in self.display_list:
            cmd.execute(canvas)

Likewise, we can remove the scroll parameter from each drawing command’s execute method:

class DrawRect:
    def execute(self, canvas):
        paint = skia.Paint(
            Color=parse_color(self.color),
        )
        canvas.drawRect(self.rect, paint)

Our browser now uses composited scrolling, making scrolling faster and smoother, all because we are now using a mix of intermediate surfaces to store already-rastered content and avoid re-rastering unless the content has actually changed.

Real browsers allocate new surfaces for various different situations, such as implementing accelerated overflow scrolling and animations of certain CSS properties such as transform and opacity that can be done without raster. They also allow scrolling arbitrary HTML elements via overflow: scroll in CSS. Basic scrolling for DOM elements is very similar to what we’ve just implemented. But implementing it in its full generality, and with excellent performance, is extremely challenging. Scrolling may well be the single most complicated feature in a browser rendering engine. The corner cases and subtleties involved are almost endless.

Transparency

Drawing shapes quickly is already a challenge, but with multiple shapes there’s an additional question: what color should the pixel be when two shapes overlap? So far, our browser has only handled opaque shapes,It also hasn’t considered subpixel geometry or anti-aliasing, which also rely on color mixing. and the answer has been simple: take the color of the top shape. But now we need more nuance.

Consider partially transparent colors in CSS. These use a hex color with eight hex digits, with the last two indicating the level of transparency. For example, the color #00000080 is 50% transparent black. Over a white background, that looks gray, but over an orange background it looks like Figure 2.

Test

Figure 2: Example of black semi-transparent text blending into an orange background.

Note that the text is a kind of dark orange, because its color is a mix of 50% black and 50% orange. Many objects in the real world are partially transparent: frosted glass, clouds, or colored paper, for example. Looking through one, you see multiple colors blended together. That’s also why computer screens work: the red, green, and blue lights blend together and appear to our eyes as another color. Designers use this effectMostly. Some more advanced blending modes on the web are difficult, or perhaps impossible, in real-world physics. in overlays, shadows, and tooltips, so our browser needs to support color mixing.

Skia supports this kind of transparency by setting the “alpha” field on the parsed color:

def parse_color(color):
    # ...
    elif color.startswith("#") and len(color) == 9:
        r = int(color[1:3], 16)
        g = int(color[3:5], 16)
        b = int(color[5:7], 16)
        a = int(color[7:9], 16)
        return skia.Color(r, g, b, a)
    # ...

Check that your browser renders dark-orange text for the example above. That shows that it’s actually mixing the black color with the existing orange color from the background.

However, there’s another, subtly different way to create transparency with CSS. Here, 50% transparency is applied to the whole element using the opacity property, as in Figure 3.

Test

Figure 3: Example of black text on an orange background, then blended semi-transparently into its ancestor.

Now the opacity applies to both the background and the text, so the background is now a little lighter. But note that the text is now gray, not dark orange. The black and orange pixels are no longer blended together!

That’s because opacity introduces what CSS calls a stacking context. Most of the details aren’t important right now, but the order of operations is. In the first example, the black pixels were first made transparent, then blended with the background. Thus, 50% transparent black pixels were blending with orange pixels, resulting in a dark-orange color. In the second example, the black pixels were first blended with the background, then the result was made transparent. Thus, fully black pixels replaced fully orange ones, resulting in just black pixels, which were later made 50% transparent.

Applying blending in the proper order, as is necessary to implement effects like opacity, requires more careful handling of surfaces.

Mostly, elements form a stacking context because of CSS properties that have something to do with layering (like z-index) or visual effects (like mix-blend-mode). On the other hand, the overflow property, which can make an element scrollable, does not induce a stacking context, which I think was a mistake.While we’re at it, perhaps scrollable elements should also be a containing block for descendants. Otherwise, a scrollable element can have non-scrolling children via properties like position. This situation is very complicated to handle in real browsers. The reason is that inside a modern browser, scrolling is done on the GPU by offsetting two surfaces. Without a stacking context the browser might (depending on the web page structure) have to move around multiple independent surfaces with complex paint orders, in lockstep, to achieve scrolling. Fixed- and sticky-positioned elements also form stacking contexts because of their interaction with scrolling.

Blending and Stacking

To handle the order of operations properly, browsers apply blending not to individual shapes but to a tree of surfaces (see Figure 4). Conceptually, each shape is drawn to its own surface, and then blended into its parent surface. Different structures of intermediate surfaces create different visual effects.You can see a more detailed discussion of how the tree structure affects the final image, and how that impacted the CSS specifications, on David Baron’s blog. Rastering a web page requires a bottom-up traversal of this conceptual tree: to raster a surface you first need to raster its contents, including its child surfaces, and then the contents need to be blended together into the parent.This tree of surfaces is an implementation strategy and not something required by any specific web API. However, the concept of a stacking context is related. A stacking context is technically a mechanism to define groups and ordering during paint, and stacking contexts need not correspond to a surface (e.g. ones created via z-index do not). However, for ease of implementation, all visual effects in CSS that generally require surfaces to implement are specified to go hand-in-hand with a stacking context, so the tree of stacking contexts is very related to the tree of surfaces.

Figure 4: A rendered web page is actually the result of stacking and blending a series of different surfaces.

To match this use pattern, in Skia, surfaces form a stack. You can push a new surface on the stack, raster things to it, and then pop it off, which blends it with the surface below. When rastering, you push a new surface onto the stack every time you need to apply some visual effect, and pop-and-blend once you’re done rastering all the elements that that effect will be applied to, like this:

# draw parent
canvas.saveLayer(None, skia.Paint(Alphaf=0.5))
# draw children
canvas.restore()

Here, the saveLayer call asks SkiaIt’s called saveLayer instead of createSurface because Skia doesn’t actually promise to create a new surface, if it can optimize that away. So what you’re really doing with saveLayer is telling Skia that there is a new conceptual layer (“piece of paper”) on the stack. Skia’s terminology distinguishes between a layer and a surface for this reason as well, but for our purposes it makes sense to assume that each new layer comes with a surface. to draw all the children to a separate surface before blending them into the parent once restore is called. The second parameter to saveLayer specifies the specific type of blending, here with the Alphaf parameter requesting 50% opacity.

saveLayer and restore are like a pair of parentheses enclosing child drawing operations. This means our display list is no longer just a linear sequence of drawing operations, but a tree. So in our display list, let’s handle opacity with an Opacity command that takes a sequence of other drawing commands as an argument:

class Opacity:
    def __init__(self, opacity, children):
        self.opacity = opacity
        self.children = children
        self.rect = skia.Rect.MakeEmpty()
        for cmd in self.children:
            self.rect.join(cmd.rect)

    def execute(self, canvas):
        paint = skia.Paint(
            Alphaf=self.opacity
        )
        canvas.saveLayer(None, paint)
        for cmd in self.children:
            cmd.execute(canvas)
        canvas.restore()

We can now wrap the drawing commands painted by an element with Opacity to add transparency to the whole element. I’m going to do this by adding a new paint_effects method to layout objects, which should be passed a list of drawing commands to wrap:

class BlockLayout:
    def paint_effects(self, cmds):
        cmds = paint_visual_effects(
            self.node, cmds, self.self_rect())
        return cmds

I put the actual construction of the Opacity command in a new global paint_visual_effects method (because other object types will also need it):

def paint_visual_effects(node, cmds, rect):
    opacity = float(node.style.get("opacity", "1.0"))

    return [
        Opacity(opacity, cmds)
    ]

A change is now needed in paint_tree to call paint_effects, but only after recursing into children, and only if should_paint is true. That’s because these visual effects apply to the entire subtree’s display list, not just the current object, and don’t apply to “anonymous” objects (see Chapter 8).

def paint_tree(layout_object, display_list):
    if layout_object.should_paint():
        cmds = layout_object.paint()
    for child in layout_object.children:
        paint_tree(child, cmds)

    if layout_object.should_paint():
        cmds = layout_object.paint_effects(cmds)
    display_list.extend(cmds)

Note that paint_visual_effects receives a list of commands and returns another list of commands. It’s just that the output list is always a single Opacity command that wraps the original content—which makes sense, because first we need to draw the commands to a surface, and then apply transparency to it when blending into the parent.

I highly recommend a blog post by Bartosz Ciechanowski, that gives a really nice visual overview of many of the concepts explored in this chapter, plus way more content about how a library such as Skia might implement features like raster sampling of vector graphics for lines and text and interpolation of surfaces when their pixel arrays don’t match in resolution or orientation.

Compositing Pixels

Now let’s pause and explore how opacity actually works under the hood. Skia, SDL, and many other color libraries account for opacity with a fourth alpha value for each pixel.The difference between opacity and alpha can be confusing. Think of opacity as a visual effect applied to content, but alpha as a part of content. Think of alpha as implementation technique for representing opacity. An alpha of 0 means the pixel is fully transparent (meaning, no matter what the colors are, you can’t see them anyway), and an alpha of 1 means fully opaque.

When a pixel with alpha overlaps another pixel, the final color is a mix of their two colors. How exactly the colors are mixed is defined by Skia’s Paint objects. Of course, Skia is pretty complex, but we can sketch these paint operations in Python as methods on the conceptual Pixel class I introduced earlier.

When we apply a Paint with an Alphaf parameter, the first thing Skia does is add the requested opacity to each pixel:

class Pixel:
    def alphaf(self, opacity):
        self.a = self.a * opacity

I want to emphasize that this code is not a part of our browser—I’m simply using Python code to illustrate what Skia is doing internally.

That Alphaf parameter applies to pixels in one surface. But with saveLayer we will end up with two surfaces, with all of their pixels aligned, and therefore we will need to combine, or blend, corresponding pairs of pixels.

Here, the terminology can get confusing: we imagine that the pixels “on top” are blending into the pixels “below”, so we call the top surface the source surface, with source pixels, and the bottom surface the destination surface, with destination pixels. When we combine them, there are lots of ways we could do it, but the default on the web is called “simple alpha compositing” or source-over compositing. In Python, the code to implement it looks like this:The formula for this code can be found here. Note that that page refers to premultiplied alpha colors, but Skia’s API generally does not use premultiplied representations, and this code doesn’t either. (Skia does represent colors internally in a premultiplied form, however.)

class Pixel:
    def source_over(self, source):
        new_a = source.a + self.a * (1 - source.a)
        if new_a == 0: return self
        self.r = \
            (self.r * (1 - source.a) * self.a + \
                source.r * source.a) / new_a
        self.g = \
            (self.g * (1 - source.a) * self.a + \
                source.g * source.a) / new_a
        self.b = \
            (self.b * (1 - source.a) * self.a + \
                source.b * source.a) / new_a
        self.a = new_a

Here, the destination pixel self is modified to blend in the source pixel source. The mathematical expressions for the red, green, and blue color channels are identical, and basically average the source and destination colors, weighted by alpha.For example, if the alpha of the source pixel is 1, the result is just the source pixel color, and if it is 0 the result is the backdrop pixel color. You might imagine the overall operation of saveLayer with an Alphaf parameter as something like this:In reality, reading individual pixels into memory to manipulate them like this is slow, so libraries such as Skia don’t make it convenient to do so. (Skia canvases do have peekPixels and readPixels methods that are sometimes used, but not for this.)

for (x, y) in destination.coordinates():
    source[x, y].alphaf(opacity)
    destination[x, y].source_over(source[x, y])

Source-over compositing is one way to combine two pixel values. But it’s not the only method—you could write literally any computation that combines two pixel values if you wanted. Two computations that produce interesting effects are traditionally called “multiply” and “difference” and use simple mathematical operations.

“Multiply” multiplies the color values:

class Pixel:
    def multiply(self, source):
        self.r = self.r * source.r
        self.g = self.g * source.g
        self.b = self.b * source.b

And “difference” computes their absolute differences:

class Pixel:
    def difference(self, source):
        self.r = abs(self.r - source.r)
        self.g = abs(self.g - source.g)
        self.b = abs(self.b - source.b)

CSS supports these and many other blending modesMany of these blending modes are common to other graphics editing programs like Photoshop and GIMP. Some, like “dodge” and “burn”, go back to analog photography, where photographers would expose some parts of the image more than others to manipulate their brightness. via the mix-blend-mode property, like this:

<div style="background-color:orange">
    Parent
    <div style="background-color:blue;mix-blend-mode:difference">
        Child
    </div>
    Parent
</div>

This HTML will look like Figure 5.

Parent