Caching as a Semantic Problem: Redis, Pre-Aggregation, and Mixed Granularity Data
Caching is usually treated as a performance hack.
Add Redis.
Cache query results.
Invalidate aggressively.
Hope nothing breaks.
That approach works, until you introduce semantics.
Once users can ask the same question in multiple ways, simple query caching starts to fall apart.
Why Query Caching Breaks Down
Consider these two questions:
“Sales by category by month”
“Monthly sales for product categories”
Syntactically different.
Semantically identical.
A traditional cache sees:
-
Two queries
-
Two cache keys
-
Two entries
A semantic system should see:
-
One intent
-
One reusable result
This is where caching stops being technical and starts being conceptual.
The Shift: Cache Meaning, Not Queries
Instead of caching:
-
SQL strings
-
Serialized result sets
The idea is to cache:
-
Semantic tuples
-
At defined grains
-
With known aggregation rules
For example:
-
(Time:Month, Product:Category, Measure:Sales)
Once that tuple exists, many questions can reuse it.
Mixed Granularity Is the Hard Part
Real data is messy.
Some dimensions arrive at:
-
SKU level Others at:
-
Category Or:
-
Brand
If you insist on leaf-level purity, you either:
-
Explode storage
-
Or recompute constantly
The alternative is semantic awareness of granularity.
Redis as a Granularity-Aware Cache
Redis works well here because:
-
Keys are cheap
-
Structures are flexible
-
Access is fast enough to experiment
Instead of: cache:query:{hash}
You start thinking in:
The cache knows:
-
What level the data represents
-
What it can roll up to
-
What it can safely combine with
Pre-Aggregation Without Over-Commitment
The goal isn’t to pre-aggregate everything.
It’s to pre-aggregate:
-
Common levels
-
Stable dimensions
-
High-fan-out queries
Then allow:
-
On-the-fly composition
-
Partial reuse
-
Fallback to source data
Redis becomes a semantic accelerator, not a source of truth.
Why This Feels Different
This approach:
-
Reduces duplicate computation
-
Improves cache hit rates organically
-
Aligns performance with meaning
Most importantly:
It allows the system to explain why a result was fast or slow.
That’s rare and valuable.
A Quiet Benefit: Explainability
Because cached data is semantically labelled, you can say:
-
“This result used cached monthly category aggregates”
-
“This part was computed at runtime due to missing grain”
Performance stops being magical.
It becomes understandable.
Closing Thought
Caching isn’t about speed.
It’s about reusing understanding.
Once you treat it that way, tools like Redis start to look very different.

Comments
Post a Comment