Product Recommendations made easy with Apache Druid Part 1

I have been playing with Apache Druid for a bit now and I have to say I am very impressed with this package. Druid provides fast analytical queries, at high concurrency, on event-driven data. Druid can instantaneously ingest streaming data and provide sub-second queries to power interactive UIs.-link. Apache Druid essentially does all of the bulk lifting of segmenting the data and putting it into high performing indexes for super fast queries . You can stream the data directly into Druid using API’s or Apache Kafka, or you can simply upload massive amounts of data at intervals appending or replacing.

Because Druid does so much for you, you could actually run different campaigns using completely different data sources that are stored and indexed in Druid. Imagine running a campaign for “Hottest Items Last Fall” or “Seasons top sellers”. This would produce a product shelf similar to this on your eCommerce site:

Screen Shot 2019-08-07 at 10.17.20 AM.png

Those products could have been returned by Druid in real time, sorting the resulting SKU’s by order value, quantity sold and even filtered for things like shopper attributes (age, gender, location).

Screen Shot 2019-08-07 at 1.40.12 PM.png

Druid let’s you store as many data sources as you want, so you could actually build dynamic components in CoreMedia that can run the same campaigns on different data sources. This could be used for different brands and their SKU’s or even seasonal order data.

Screen Shot 2019-08-07 at 10.20.22 AM

For my use case, this means you could essentially push order line item data into Druid and get fast queries for product shelves like “Top Sellers“, “Top Weekend Sales“, or even “This weeks hits” – all based on the order line sales and the time and date stamp of the order.

Pushing this line item level order information should be trivial for most order management systems. I started to ask myself what data would I actually need to satisfy a few use cases. So I started writing some use cases down as one liners:

  • Most products sold
  • Total sales
  • Highest Total count sold on day of week
  • Highest Total count sold in month of year
  • Highest Total sales on day of week
  • Highest Total sales in week of year
  • Region top seller
  • Men top seller
  • Women top seller in region


When should I use Apache Druid?

Read about how Neilsen, Monetate, eBay, AirBnB, and others use Apache Druid.

I then had to figure out the minimum amount of data needed to be able to do those use cases and this is what I came up with:

“time”, “order_id”,”shopper_id”,”sku”,”price”,”quantity”,”cost”, “shipping_info”

That is all pretty standard information you can get from a PO. What is not part of that is the customer demographic information.  Because Druid performs best with flat data we will most likely have to write a routine that combines order line data with customer attribute data. We could include fields like these (if they are known):

“age”, “region”,“gender”:

This would allow us to ask Druid many different queries and get the proper response. In the CoreMedia extension model this should really be a returned list of SKU’s that we can map to the current product catalog. Some error handling or SKU replacement code might be needed; especially if you are running against year old data. Hopefully for more current campaigns like “Hottest Weekend Products” or “What’s hot this month” the data and SKUs very up to date. The resulting JSON sent in for each row would look like this:

"time":"2019-06-30 03:53:35",

Sending in each order line item separately will allow Druid to actually dynamically build orders, return SKU’s based on any time and date combination, bloom filters, numeric expression, and of course grouping (total sales for a single SKU)- link.

I created a dataset with six months of order data, broken out by each line item as described above. It ended up being 431,148 line items created for 4,323 SKU’s in 300,000 orders

I went ahead and created queries for each of those use cases and I find Druid is extremely fast (more on that in Part 2), even when running on my local machine. Check out the slide show below for the various ways you can use SQL (or JSON) to query Druid. The real power comes with the way Druid can quickly return rows and run on functions like TIME_EXTRACT. Each query essentially returns a list of SKU’s ordered descending from either a total sales count or an items sold count.

This slideshow requires JavaScript.

Stay tuned for part 2 where I show how easy these kinds of dynamic product shelves based on sales and shopper data can be integrated into CoreMedia Studio. I will also show a demonstration where Apache Druid is accessed in realtime from our Studio where the maketing person can easily preview this dynamic behavior. A little teaser showing how the authoring environment (Preview CAE) and the runtime environment could access the same Druid data, giving marketers the same products as the shoppers would see.


I am really interested in hearing your thoughts on this, send me an email or leave a comment!

Screen Shot 2019-08-07 at 3.45.25 PM.png

Three cool examples of dynamic segment rules in WebSphere Commerce

smartercommerceI have blogged about dynamic segments in WebSphere Commerce a few times because I think they are extremely powerful and allow marketing to do all kinds of very cool things to make their site more interesting to a buyer. The InfoCenter has a really good page with three great examples for how the rules for dynamic segments can be constructed. Remember, this is in the Management Center and these flows are constructed with basic drag and drop gestures.

Continue reading

Dynamic Customer Segments and WebSphere Commerce

One very powerful feature of WebSphere Commerce(WC) is Dynamic Customer Segments. These are customer segments that can have customers dynamically added or removed with a business rule. This means any of the targets you have in your WC tooling can be used to put customers in or out of the segment. Then, other rules can use that segment to show promotions, content, change search behavior, or change the look and feel of the site.

Here is a sample segment, let’s walk through it:

First thing you need to do is create a new customer segment in Management Center and select the “Use marketing activities to add or remove customers”. This allows business rules to add or remove customers to the segment.

Next, we want this to be valid for customers who have made a purchase in the past 30 days and have spent over $100. So the first option we need to select is on the “Purchase Details” tab:

Next, we will create a Customer Dialog Rule that will place customers who submit orders with over $100 of total value.

Click to make larger

Let’s walk through what this rule does:

The trigger, this makes the entire rule execute when the customer places an order


The condition, if the first target of a condition path is met the rule will continue. If not, the next branch is evaluated and executed. In this case, if the condition is not met, the  customer is removed from the segment.


The target, this will only continue with the rule if the order is at least $100.


The action, this will dynamically add the customer to the “High Roller” customer segment.


Now, we can create other web activities and dialog that can key off of that customer segment. Let’s create a few and walk through them.

The web activity below will put an advertisement on the Home page in Row 2 for all “High Roller” customers. The advertisement shows they receive 10 percent off all purchases.

This next customer dialog activity checks the High Roller customer segment every seven days and sends out an email to each customer in that segment with a 10% off promotion for being a High Roller:

This next when the customer searches for anything we have this search rule that re-arranges the default order in which products are presented by having the products in a “High Roller Sales Category” listed first if the customer is in the High Roller customer segment:


The precision marketing engine in WebSphere Commerce is extremely powerful and flexible. The various options (triggers, targets, actions) allow for some pretty complex rules to be created to give each of your shoppers a unique experience. The goal of WebSphere Commerce is to provide a “live” site for each customer based on things like who they are (profile data), their purchasing behavior, external site referrals (where they came from), and their search behaviors. The contents of a WebSphere Commerce site is only limited by your imagination.