Why I Didn’t Start with Spark, Pinot, or ClickHouse

- January 22, 2026

This question comes up a lot, so it’s worth answering directly:

If you’re doing analytics at scale, why not Spark, Pinot, or ClickHouse?

They’re powerful tools.

They’re also optimized for a different problem.

The Kind of Scale That Matters

There are two kinds of scale:

Data volume
Semantic complexity

Most modern analytics stacks optimize for the first.

This work is mostly about the second.

Spark: Great for Pipelines, Not Reasoning

Spark excels at:

Batch processing
Large transformations
Schema-on-read workloads

But semantic analytics needs:

Low latency
Fine-grained validation
Interactive feedback

You can build that on Spark — but you’ll spend most of your time:

Managing jobs
Handling latency
Debugging execution graphs

It’s a mismatch for question-driven systems.

Pinot and ClickHouse: Fast, But Opinionated

Pinot and ClickHouse are impressive.

They shine when:

Queries are known in advance
Dimensions are stable
Aggregation paths are predictable

Semantic systems break those assumptions:

Users define models dynamically
Hierarchies evolve
New measures appear

You end up fighting the engine:

Rebuilding segments
Re-indexing constantly
Encoding semantics indirectly

Speed doesn’t help if the answer is conceptually wrong.

Semantics First, Engines Later

My bias has been:

Make meaning explicit
Make queries explainable
Make performance incremental

Once semantics are stable, then:

Specialized engines make sense
Pre-computation becomes safe
Cost optimization becomes meaningful

Starting with a heavy engine too early locks you in.

PostgreSQL + Redis as a Control Plane

This combination works well as:

A semantic control plane
A correctness baseline
A performance sandbox

It lets you ask:

“What should this system do?”
Before:
“How fast can it go?”

That ordering matters.

This Is Not an Anti-Big-Data Argument

At some point:

Pinot might be perfect
ClickHouse might slot in cleanly
Spark might power offline simulation

The key is timing.

You don’t want your infrastructure to decide your semantics for you.

Final Thought

Fast answers to the wrong question aren’t useful.

Semantic systems live or die on:

Trust
Clarity
Explainability

For that, simpler stacks often win — at least at the beginning.

If you’ve made different trade-offs and they worked, I’d genuinely love to hear about them.

Search This Blog

Dot Net Consultant