Caching as a Semantic Problem: Redis, Pre-Aggregation, and Mixed Granularity Data

 


Caching is usually treated as a performance hack.

Add Redis.
Cache query results.
Invalidate aggressively.
Hope nothing breaks.

That approach works, until you introduce semantics.

Once users can ask the same question in multiple ways, simple query caching starts to fall apart.


Why Query Caching Breaks Down

Consider these two questions:

“Sales by category by month”

“Monthly sales for product categories”

Syntactically different.
Semantically identical.

A traditional cache sees:

  • Two queries

  • Two cache keys

  • Two entries

A semantic system should see:

  • One intent

  • One reusable result

This is where caching stops being technical and starts being conceptual.


The Shift: Cache Meaning, Not Queries

Instead of caching:

  • SQL strings

  • Serialized result sets

The idea is to cache:

  • Semantic tuples

  • At defined grains

  • With known aggregation rules

For example:

  • (Time:Month, Product:Category, Measure:Sales)

Once that tuple exists, many questions can reuse it.


Mixed Granularity Is the Hard Part

Real data is messy.

Some dimensions arrive at:

  • SKU level Others at:

  • Category Or:

  • Brand

If you insist on leaf-level purity, you either:

  • Explode storage

  • Or recompute constantly

The alternative is semantic awareness of granularity.


Redis as a Granularity-Aware Cache

Redis works well here because:

  • Keys are cheap

  • Structures are flexible

  • Access is fast enough to experiment

Instead of: cache:query:{hash}

You start thinking in:

cache:tuple:Time:2025-03|Product:Category:Bike|Measure:Sales

The cache knows:

  • What level the data represents

  • What it can roll up to

  • What it can safely combine with


Pre-Aggregation Without Over-Commitment

The goal isn’t to pre-aggregate everything.

It’s to pre-aggregate:

  • Common levels

  • Stable dimensions

  • High-fan-out queries

Then allow:

  • On-the-fly composition

  • Partial reuse

  • Fallback to source data

Redis becomes a semantic accelerator, not a source of truth.


Why This Feels Different

This approach:

  • Reduces duplicate computation

  • Improves cache hit rates organically

  • Aligns performance with meaning

Most importantly:

It allows the system to explain why a result was fast or slow.

That’s rare and valuable.


A Quiet Benefit: Explainability

Because cached data is semantically labelled, you can say:

  • “This result used cached monthly category aggregates”

  • “This part was computed at runtime due to missing grain”

Performance stops being magical.

It becomes understandable.


Closing Thought

Caching isn’t about speed.

It’s about reusing understanding.

Once you treat it that way, tools like Redis start to look very different.

Comments

Popular posts from this blog

A Secure Blazor Server Azure Deployment Pipeline

Stop Wrapping EF Core in Repositories: Use Specifications + Clean Architecture

Server-Sent Events in .NET 10: Do You Really Need SignalR?