Implementing a basic css selector engine

Thu, Oct 2 2025

I recently found myself forced to write a CSS selector engine. Figuring it out was extremely rewarding, so it thought it might be useful to write down the intuition about how you might do this.

By "CSS engine", I am referring to code that takes a css selector like p > [data-selected="yes"] and uses it to find matching HTML nodes. This assumes that you have a tree of HTML nodes to search over. The specific implementation does not matter, but your node API needs to have a way to get previous, preceding, parent and ancestor elements given a starting node.

The goal is to have an API that allows you to do this:

for node in start.select("p > li[href]") {
  // do something
}

I'll use the following selector as an example: body > span p + li.
Think of it this way: The selector imposes certain constraints that nodes must satisfy in order to match:

The node must be a "li" element.
At least one of the nodes that matched the preivous step must have an immediate preceding sibling that is a "p" element.
At least one of the nodes that matched the previous step must have an ancestor that is a "span" element.
At least one of the nodes that matched the previous step must have a direct parent that is a "body" element.

We break down to the selector into four "checks":

[body ">"] [span " "] [p "+"] [li]

Each check specifies what to look for and where to look for it. The "what to look for" part is what the CSS specification calls a "compound selector"; Things like "div", "#container" and "[data-state|='selected']". The "where to look for it" part is what the CSS specification calls "combinators"; Things like ">", "~", and "+". To check if an element A matches a selector, we need to pass A "through" this stack of checks. Every check performs the same steps:

Takes a node as input
Applies the combinator⁰ to the node to transform it into the list of nodes to consider.
Tests the compound selector against each node we are considering. If a node matches, pass it on to the next check.

⁰ But what combinator do we apply to the last li selector? For my algorithm, I decided to tack on a synthetic "identity" combinator that just passes on the node it was given. So if you get a li, you are going to be using that same li in the next step (as opposed to its ancestors or previous sibling for example).

An element matches if at least one of these paths makes it through the entire stack of checks.

Here is my my version of this. It is surprisingly not that much code at all once you figure out the how to do it. Pretty neat!