Context
For the taxonomies used to describe the archaeological sites of the thesis
project, I have hierarchical relationships for nested types, useful for time
periods, for example. This involves having a parent_id field and ancestor/descendant
relationships.
Problem
The ORM query I had to check for descendants was something like this:
while (queue.length > 0) {
const currentId = queue.shift(); // Pop one parent
const children = await this.db.select()... // <- QUERY EXECUTED HERE (N times!)
queue.push(...children.map(child => child.id)); // Add children to queue
}The issue is that this would be executed for all the descendants of the parent, hence it’s called a N+1 problem.
Solutions
Using the following CTE (Common Table Expression), the query executes once for all:
WITH RECURSIVE descendants AS (
SELECT ... FROM taxonomies WHERE id = 'A' -- Base: start with A
UNION ALL
SELECT ... FROM taxonomies t
INNER JOIN descendants d ON t.parentId = d.id -- Recursive: find children
)
SELECT * FROM descendants WHERE id != 'A'CTE - a named temporary result set.
It is created using a WITH clause which we can reference it later with
SELECT, INSERT etc.
In this case, we created descendants from which we selected the entries where
the ID is different than the parent’s one.
Drizzle does not provide clear abstractions for CTEs, so this was acceptable to be
run using db.execute().
| Aspect | Original | Fixed |
|---|---|---|
| Queries | N (one per level) | 1 (single recursive query) |
| Network latency | Multiplied N times | Single latency |
| Performance | Slow for deep trees | Fast for any depth |
| 5-level hierarchy | 5 queries | 1 query |
| 20-level hierarchy | 20 queries | 1 query |