I started having some good side discussions about Druid and the most common question was “when should I use Druid?”. The good news is the Druid documentation under the
Latest Design answers this question directly:
Druid is likely a good choice if your use case fits a few of the following descriptors:
- Insert rates are very high, but updates are less common.
- Most of your queries are aggregation and reporting queries (“group by” queries). You may also have searching and scanning queries.
- You are targeting query latencies of 100ms to a few seconds.
- Your data has a time component (Druid includes optimizations and design choices specifically related to time).
- You may have more than one table, but each query hits just one big distributed table. Queries may potentially hit more than one smaller “lookup” table.
- You have high cardinality data columns (e.g. URLs, user IDs) and need fast counting and ranking over them.
- You want to load data from Kafka, HDFS, flat files, or object storage like Amazon S3.
Obviously event based data works very well with Druid, this is why I believe orders are a really good match for this. Because you can tie three critical pieces together for each order: SKU, Customer data, and Shipping, it becomes very easy to execute all kinds of queries tieing these data points together.
While I am somewhat stuck on eCommerce, here is a list of other companies that also use Druid for very different use cases (link). Here are a few of my favorites:
Airbnb – Druid powers slice and dice analytics on both historical and realtime-time metrics. It significantly reduces latency of analytic queries and help people to get insights more interactively.
eBay – eBay uses Druid to aggregate multiple data streams for real-time user behavior analytics by ingesting up at a very high rate(over 100,000 events/sec), with the ability to query or aggregate data by any random combination of dimensions, and support over 100 concurrent queries without impacting ingest rate and query latencies.
Hulu – At Hulu, we use Druid to power our analytics platform that enables us to interactively deep dive into the behaviors of our users and applications in real-time.
Monetate – Druid is a critical component in Monetate’s personalization platform, where it acts as the serving layer of a lambda architecture. As such, Druid powers numerous real-time dashboards that provide marketers valuable insights into campaign performance and customer behavior
Nielsen – Nielsen Marketing Cloud uses Druid as it’s core real-time analytics tool to help its clients monitor, test and improve its audience targeting capabilities. With Druid, Nielsen provides its clients with in-depth consumer insights leveraging world-class Nielsen audience data.
The original list is pretty large, it is fairly safe to say Druid has a place in many markets!