Pavel Panchekha

By

Share under CC-BY-SA.

How Stylo Matches CSS Selectors

I have long been interested in CSS selector matching algorithms. In short: existing selector matching algorithms are \(O(d^2)\) at worst, where \(d\) is the tree depth, and there are better algorithms that are \(O(d)\). But of course the constants matter a lot. I previously described both algorithms, and even showed that the bad cases can occur in practice.

Recently I dove into the Stylo CSS engine, used by Firefox, to better understand exactly how it works and better understand the constraints on a better algorithm. This blog post is the notes I took; all mistakes my own.

Also let me set some ground rules. I was mostly interested in selector matching. I also ruled two things out of scope; they are complex and deserve a whole separate look:

Notes on Stylo

Stylo supports two browsers: Gecko and Servo. A lot of stuff is virtualized over both browsers.

The main implementation starts at compute_style, which creates StyleResolverForElement objects, whose resolve_style handles styling and whose match_primary method initiates matching specifically. The main outcome of matching is a "rule tree" node. Rule trees are described in the associated blog post, but basically they the a trie for of the matched rules for each node, sorted with low-priority ones at the front.

To match a node, we construct a MatchingContext and a per-thread LocalMatchingContext, which in turn stores a StyleBloom. This is the famous bloom filter optimization, more on that later.

Stylo is multi-threaded. Threads can work-steal, so a thread might suddenly be working on an element without having touched its parents before. This makes the bloom filter kind of tricky (see push_internal in StyleBloom.)

When visiting an element, we update the bloom filter (more later), then check the style sharing cache, then proceed to match all rules on that element.

The rules are split by class name / tag name / etc (first most specific rightmost simple selector) and we match all rules from the class / tag / etc names that the element actually has. This is done by the Stylist which constructs a RuleCollector which constructs a SelectorMap.

For each selectors, we first test it against the bloom filter for fast rejection. Fast rejection is key because we expect to reject most selectors.

If not, we test the local match and then call next_element_for_combinator to walk up the tree testing parents.

Bloom filter

One bloom filter exists per thread. It's pretty big, about 4kB.

It is a counted bloom filter: each hash bucket is 0-256, with any value greater than 0 being a "hit".

Each rule has up to four hashes. They are precomputed and packed into 48 bytes. "Which is copied from Gecko, who copied it from WebKit." Hashes come from tag names, namespaces, IDs, Classes, attributes, and :is and :where selectors with one subselector inside. If there are more than 4 we just ignore the rest. The fourth hash is packed in a funky way, so it's better not to use it; three is best. This limit is pretty harsh but it is critical for fast-reject to actually be fast.

Counted bloom filters can be deleted from, as long as you only delete elements you once inserted. Inserting and then deleting leads to no change, unless one of the buckets "saturates" (hits 256), which case the filter ends up permanently larger.

The bloom filter stores all tag names, namespaces, IDs, class names, and attribute names (except class, id, and style) from all ancestors of the current element. As far as I can tell, these are allowed to collide.

The counting comes in when you go "up" in the tree: you remove the element you're leaving from the bloom filter. Then you go "down" into another element and add it. This happens in insert_parents_recovering.

Style Sharing Cache

The style sharing cache allows two similar elements to immediately get the same rule tree node.

  • For A and B to share styles, they must have the same local information (tag name, class names, style attributes, pseudo-elements, etc)
  • Their parents must share styles
  • They must match the same set of rules, this is done by matching a subset of "revalidation selectors" against both

I should learn more about "revalidation selectors".

Conclusion

A big realization for me is that there's one bloom filter per thread, and it's updated using this "counting" setup. This took me a while to understand.

Revalidation selectors seem kinda like what I want to implement for matching.