Why I Didn’t Start with Spark, Pinot, or ClickHouse
This question comes up a lot, so it’s worth answering directly: If you’re doing analytics at scale, why not Spark, Pinot, or ClickHouse? They’re powerful tools. They’re also optimized for a different problem . The Kind of Scale That Matters There are two kinds of scale: Data volume Semantic complexity Most modern analytics stacks optimize for the first. This work is mostly about the second. Spark: Great for Pipelines, Not Reasoning Spark excels at: Batch processing Large transformations Schema-on-read workloads But semantic analytics needs: Low latency Fine-grained validation Interactive feedback You can build that on Spark — but you’ll spend most of your time: Managing jobs Handling latency Debugging execution graphs It’s a mismatch for question-driven systems. Pinot and ClickHouse: Fast, But Opinionated Pinot and ClickHouse are impressive. They shine when: Queries are known in advance Dimensions are stable Aggrega...