How Stylo Matches CSS Selectors
I have long been interested in CSS selector matching algorithms. In short: existing selector matching algorithms are \(O(d^2)\) at worst, where \(d\) is the tree depth, and there are better algorithms that are \(O(d)\). But of course the constants matter a lot. I previously described both algorithms, and even showed that the bad cases can occur in practice.
Recently I dove into the Stylo CSS engine, used by Firefox, to better understand exactly how it works and better understand the constraints on a better algorithm. This blog post is the notes I took; all mistakes my own.
Also let me set some ground rules. I was mostly interested in selector matching. I also ruled two things out of scope; they are complex and deserve a whole separate look:
- I ignored invalidation for now
- I also ignored "relative selectors" like
:has
and:host
Notes on Stylo
Stylo supports two browsers: Gecko and Servo. A lot of stuff is virtualized over both browsers.
The main implementation starts at compute_style
, which creates
StyleResolverForElement
objects, whose resolve_style
handles styling
and whose match_primary
method initiates matching specifically. The
main outcome of matching is a "rule tree" node. Rule trees are
described in the associated blog post, but basically they the a trie
for of the matched rules for each node, sorted with low-priority ones
at the front.
To match a node, we construct a MatchingContext
and a per-thread
LocalMatchingContext
, which in turn stores a StyleBloom
. This is the
famous bloom filter optimization, more on that later.
Stylo is multi-threaded. Threads can work-steal, so a thread might
suddenly be working on an element without having touched its parents
before. This makes the bloom filter kind of tricky (see push_internal
in StyleBloom
.)
When visiting an element, we update the bloom filter (more later), then check the style sharing cache, then proceed to match all rules on that element.
The rules are split by class name / tag name / etc (first most
specific rightmost simple selector) and we match all rules from the
class / tag / etc names that the element actually has. This is done by
the Stylist
which constructs a RuleCollector
which constructs a
SelectorMap
.
For each selectors, we first test it against the bloom filter for fast rejection. Fast rejection is key because we expect to reject most selectors.
If not, we test the local match and then call
next_element_for_combinator
to walk up the tree testing parents.
Bloom filter
One bloom filter exists per thread. It's pretty big, about 4kB.
It is a counted bloom filter: each hash bucket is 0-256, with any value greater than 0 being a "hit".
Each rule has up to four hashes. They are precomputed and packed into
48 bytes. "Which is copied from Gecko, who copied it from WebKit."
Hashes come from tag names, namespaces, IDs, Classes, attributes, and
:is
and :where
selectors with one subselector inside. If there are
more than 4 we just ignore the rest. The fourth hash is packed in a
funky way, so it's better not to use it; three is best. This limit is
pretty harsh but it is critical for fast-reject to actually be fast.
Counted bloom filters can be deleted from, as long as you only delete elements you once inserted. Inserting and then deleting leads to no change, unless one of the buckets "saturates" (hits 256), which case the filter ends up permanently larger.
The bloom filter stores all tag names, namespaces, IDs, class names,
and attribute names (except class
, id
, and style
) from all ancestors
of the current element. As far as I can tell, these are allowed to collide.
The counting comes in when you go "up" in the tree: you remove the
element you're leaving from the bloom filter. Then you go "down" into
another element and add it. This happens in insert_parents_recovering
.
Style Sharing Cache
The style sharing cache allows two similar elements to immediately get the same rule tree node.
- For A and B to share styles, they must have the same local
information (tag name, class names,
style
attributes, pseudo-elements, etc) - Their parents must share styles
- They must match the same set of rules, this is done by matching a subset of "revalidation selectors" against both
I should learn more about "revalidation selectors".
Conclusion
A big realization for me is that there's one bloom filter per thread, and it's updated using this "counting" setup. This took me a while to understand.
Revalidation selectors seem kinda like what I want to implement for matching.