Why AI models depend on product similarity for better results
If you’ve never heard of a dendrogram, you’re not alone. I’d never heard of them myself until I started working as a data scientist. Then they quickly became vital to my work.
Why? They help us assess product similarity better — one fundamental way to make our AI-driven sales forecasts more accurate. If you want to improve your own sales forecasts (or are interested in creating a similar type of algorithm yourself), you need to master this concept.
What is a dendrogram?
A dendrogram is a type of diagram that shows you how strongly correlated different items are. This type of data visualization creates a tree branched from individual “leaves” or nodes that start in clusters of one that slowly pair distinct items with similar items to connect into larger groups of 2 (and then 4 and then 8, etc.). In the end, you get a chart that displays groups of similarity. You can see which items are almost identical according to a wide array of characteristics you define, as well as which items have the least in common.
When you look at a dendrogram, the higher the node connecting two particular items, the less they have in common. You can use those nodes to divide items into groups of similarity. It’s up to you to decide how high a threshold for similarity you are willing to consider when defining your final clusters. Still, the dendrogram takes a confusing group of data and organizes it so you can make an analysis.
This data is generated using an algorithm that performs a hierarchical cluster analysis — a type of calculation that determines the similarity between two different items based on a set of characteristics. Hierarchical clustering is a complex vector calculation that determines the overall difference between two items according to their commonalities across a wide range of attributes.
The relationship between dendrograms and product similarity
Dendrograms can compare any items to show how similar (or how disparate) they are. This is useful for all kinds of comparisons, not only those done by data scientists. Take this dendrogram comparing the genetics of diverse populations in Russia:
The placement of the nodes shows us how genetically similar the people in Moscow are to those in Belgorod — and how different both are from the Khakas and Bashkir people, for example. As you can imagine, comparing genetics is incredibly complex. We have billions of sequenced genes, so there is a ton of data to compare. Hierarchical clustering and dendrograms allow us to efficiently make sense of that data to understand the relationship at a glance. (The power of data visualization at work!)
When it comes to product similarity, there may not be billions of attributes to assess, but there are still tons of factors involved. Take fashion products, for instance. A seemingly endless number of descriptors define a piece of clothing. A red dress is never just a red dress. It could be casual or formal. This dress may be designed for winter or summer wear. Its material could be cotton or silk. It may have embroidery or ruching or other style details. The dress can have long sleeves or be strapless. The longer you look at the dress, the more ways there are to describe it.
Dendrograms allow you to compare two pieces of clothing holistically, taking into account all these details. Sometimes similarity may not be as evident as it seems at first glance. While men’s trousers are obviously less similar to a red cocktail dress than a yellow sundress, it’s not necessarily easy to decide whether a new blue dress is more like the red cocktail dress or the yellow sundress. A dendrogram gives a more definitive answer you can rely on.
Why is product similarity significant for sales forecasts?
But why should you care about how similar your products are anyway? If you know two products are alike, you can anticipate the sales of one based on the performance of another.
Essentially, product similarity gives you additional actionable data to calculate the demand for items, especially new products. While historical data should not be the sole driver of your sales forecast, it is an essential piece of the puzzle. If the dendrogram indicates a new product is quite similar to an existing product, you can reliably use its historical data as a part of the model to calculate the demand for the new product.
Similar products per tight clusters defined in the dendrogram should have similar sales profiles. The more comprehensive your dendrogram, the more precise your product similarity. This allows you to forecast sales more accurately.
How you can leverage dendrograms for greater success
Dendrograms may seem a simple tool, but they provide information critical to improving inventory efficiency. If you are willing to invest in understanding and using dendrograms well, you improve your ability to anticipate trends — and even potentially increase sales while reducing inventory. Dendrograms are so much more than a basic chart when used effectively.
This article gives just a brief introduction to dendrograms and product similarity. To apply the concepts effectively, you need to understand them more thoroughly than I could share here. Luckily, my team at Evo has developed a full course on the subject. It’s free and digs deep into the principles of product similarity (with a lesson just about dendrograms!). If you’re interested in this topic, I encourage you to enroll. When you understand dendrograms, you can use them to drive your own success.
Big thanks to Kaitlin Goodrich.
About the author
Elena Marocco joined Evo as data scientist in 2016 after a very successful internship experience. A cum-laude graduate in Mathematics at the University of Turin, she defended an MSc with an innovative solution for Fashion Inventory Management.
She is excited about the world of probability, statistics and, more generally, in discovering useful maths that can have a significant impact through real life applications.