Understanding Einstein Discovery Stories in Einstein AnalyticS

Einstein Discovery is known for its capability of bringing the power of analytics, machine learning, and statistical analysis to help businesses get important insights from their data quickly. But hey, that sounds like a job that a data scientist should do. While data scientists can bring the solutions to these tasks, it’s also important to remember that not all companies have a dedicated data scientist. In case your company does have one, the next question would be, what are the chances that your data scientist is preoccupied with other projects while you have a large amount of data that you need to analyze in a short amount of time? 

 

Tableau CRM flying

 

That’s when Einstein Discovery kicks in. Creating a Story in Einstein Discovery allows a quick intelligent experience for business users by bringing visualization capabilities as well as finding significant patterns and trends through statistical testing. In other words, it’s helping you to understand complex relationships between the variables and KPIs.

To help with that, Einstein Discovery divides the analysis into three sections, allowing you to capture the critical information related to:

  • What happened

  • Why it happened

  • What could happen next

Note that Einstein Discovery not only analyzes Salesforce data, but also data from external systems and spreadsheets. For the sake of data consistency, here I am going to use this Sample Superstore data.

  1. Upload the Sample Superstore data into Salesforce. If you’re not familiar with uploading data into Salesforce, check this trail to learn more about it.

  2. Create a Story using the uploaded dataset and choose the variables that you want to include in the analysis. You can choose up to 50 columns to be included. However, since we have several variables that are related to each other (City, Postal Code, Country, Region), I decided to focus on one, State.. Another important thing is to make sure that one column does not contain more than 100 distinct values because Einstein Discovery won’t be able to show you more than 100 bars as they’d be very thin. Because of this variables with too many distinct values such as City and Postal Code are excluded. Additionally, since Country only has 1 value in our dataset, which is the United States, this information is not very relevant to us. While Region might be too broad for our analysis. The same goes for Category and Sub-Category. State and Sub-Category are the appropriate ones to be included this time.
    Trailhead steps in a GIF

  3. Once it’s done preparing the story, Einstein Discovery shows you the insights it found from our dataset. Now we are ready to dig deeper into the Story insights.

WHAT HAPPENED


This will give us the results of Einstein’s descriptive insights according to what occurred in the past, which contributed to the story's outcome variable, in this case “Sales”. you can choose any outcome variables that you’d like to analyze. For instance, in case you care more about maximizing the Quantity or Profit, or aiming to decrease the Discount, you may then choose the corresponding variable for your analysis. Here, we are going to use “Maximize Sales” as our objective.

 

First-Order Analysis

First-order analysis

 

What: The first result you’ll see is a bar-chart with an average line sorted by frequency, color legends (blue bars representing a Significant bar and grey otherwise), and some explanatory text on the left. This bar-chart represents a variable with the most statistically significant insight and the highest R2 value in our dataset, which turns out to be Sub-Category. Showing that:

  1. Sub-Category explains 19,9% of the variation in Sales. With this R-squared value, the proportion of the variance in the response variable (Sales) is explained by the predictor (Sub-Category).

  2. In this analysis, we have 3 sub-categories with grey bars which means that Accessories, Appliances, and Supplies are statistically insignificant. Using Hypothesis Testing, Einstein Discovery demonstrates that it is unlikely that these sub-categories bring an impact to the variation in Sales. While for the one with blue bars, it means that it is almost impossible that the difference between each sub-category compared to all other sub-categories is happening randomly.

  3. Copiers is 1.969 above average. This result may have been worsened by Discount is 0,19 to 0,2. This means that Copiers will perform better if the Discount given is not inthe range of 0,19 to 0,2.

  4. Additional information regarding Copiers is given when we hover over the bars. This information includes:

    • Total: Total Sales of Copiers

    • Standard Deviation: How much the items in Copiers differ from the average. A standard deviation of 3,176, meaning that most of the numbers are close to the average. Copiers have the highest Std, demonstrating a greater variability in the test scores compared to other sub-categories

    • Count: The number of transactions that falls into the Copiers sub-category

    • Difference from Average: The difference from the global average across all sub-categories

    • Global Average: Overall average across all sub-categories, calculated with the sum of all sub-categories divided by the number of values (Count).

Now that you know how to interpret this bar chart, can you do the same for the following Sub-Categories (Machines, Tables, Chairs, Bookcases)? 

 

How: As we recall, Sub-Category is a categorical variable. How does Einstein Discovery calculate the RSquare value of this variable? The answer is, Einstein Discovery turns this into binary variables, or what we know as a one-hot encoding process. By doing this preprocessing step, all of the elements in a vector are 0, except for one, which has 1 as its value. The result looks like below.

Overview of sub-category

 

You don’t need to do this step as Einstein Discovery does it all for you, but should you’re interested in how to easily perform a one-hot encoding using Tableau Prep, check out this cool video for more information.

 

Why: In theory an R-Squared value below 30% is seen as low, but why do you need to consider this? As you can see in the below demonstration, the lower the R-Squared value, the more dispersed the points become. But that doesn’t dismiss the fact that using Sub-Category has reduced the error by 19.9% and that this value is commonly found in real-life data.

 

R-Squared value overview



Second-Order Analysis

 

Sample Superstore Second-order analysis

 

What: Still related to the first chart, Sub-Category, the next three bar charts add the second variable into account. First, it explains Sales, Sub-Category, and the outcome when State is Delaware. The second one takes Sales, Sub-Category, and Order Date when it’s 2017/2. The third one is Sales, Sub-Category, and when Quantity is 1 to 2. This part is called a second-order analysis (also called the interaction effect/drill-down chart/two variables bar chart) because the variables State, Order Date, and Quantity are showing a strong signal when combined with Sub-Category.

Let’s dive deeper into the third one, Sales by Sub-Category when Quantity is 1 to 2. The insights we can gather from this chart are:

  1. When Quantity is 1 to 2, Sub-Category: Looking at the explanatory text, we know that Copiers and Machines do worse. Specifically, in the case of Machines, “Machines is 1.213 lower. This result may have been worsened by Discount is 0,41 to 0,8. This means that Machines are performing worse but are hurt the most when the Discount is between 0,41 and 0,8. The magnitude of how much it can get lower is different between one sub-category to another, hence this chart.

  2. In this chart, the blue bars represent values when Quantity is 1 to 2, light grey bars represent All Other Quantity, and the dark bar (which in our case only applies to Supplies) represents the Insignificant value. As in the previous chart, this chart is also using Hypothesis Testing. We can interpret this as, for every sub-category, Quantity 1 to 2 gives lower Sales than the other sub-categories. Which makes sense, since the bigger quantities bought by customers, the more it will cause the Sales to increase accordingly. However, in the case of Supplies, the result is rather statistically insignificant meaning that the difference between Quantity 1 to 2 compared to All Other Quantity in this sub-category is random.

  3. As before, by hovering over the bar chart, you can see more detailed information. It gives us insights regarding the Total, Average, Standard Deviation, Count, and Difference from Average for that category. For example, the Difference From Average for Other Buckets is shown to be -1,213. Why? Because that is the difference between Machines when Quantity is 1 to 2 and Supplies in all other quantities.

 

WHY IT HAPPENED

 

Now onto the next tab, you’ll see Why it Happened, a section that will help you to gather insights and deep dive into the exact factors that led to a particular outcome. This waterfall chart breaks down the difference between Copiers and the whole dataset.

In a glimpse, Einstein Discovery shows us that the Copiers sub-category has higher Sales than the mean. To understand why Copiers are different from the rest of our sub-categories, let’s hover over each bar to get more insight about the cause.

 

Why is it happening

 


First-Order Analysis

 

  1. Global Average:

    Global Average
    This bar simply tells us that the mean of Sales among all sub-categories (including Copiers) is $230 and it has 9994 transactions (so-called observations). 

  2. Sub-Category is Copiers: 

Sub-Category is Copiers

This shows the impact that Copiers bought without taking into account any other variables.

  • Impact: Here Einstein Discovery tells us that the Impact of Copiers alone (without considering the effect of Discount, Order Date, Quantity, State, etc.) is $3,481 more than average. This tells us that in general, Copiers is a great product to sell and the one where we should be focusing more and perhaps digging deeper to its potential? Should we sell more of it? If so, when, where, and what other factors should we consider? This will be explained in the next graph.

  • Coefficient: If the impact of Copiers alone is already that high, does that mean that our company should sell Copiers without taking into account other factors? The answer is No! The Coefficient here explains that if we don’t involve other factors, our Sales will be $85 lower than the mean.

  • Precluded Sum: If the global average of Sales is calculated across all sub-categories, what effects does it bring if we keep Copiers only and decide not to sell products in other sub-categories? Here we see that removing all effects for other sub-categories may increase the Sales by $3,566.

  • Frequency: However if we take a look at this, Einstein Discovery tells us that Copiers are only 0.68% of our overall transactions. What? That’s very low indeed. Considering that Copiers might bring a great benefit to increase our average Sales. Here’s where the market research team plays their part in finding out whether Copier's market has always been limited and unpromising and that’s why we decided not to focus on Copier products in the first place. Or maybe it’s simply underrated and we have to upsell? Could it be that we should now focus more on trying to sell Copiers, promoting or rebranding it, and find new customers that are interested in buying our Copiers?

  • Conditional Frequency: 1 in Conditional Frequency shows that, obviously, 100% of the Copiers transactions are in the Copiers sub-category.

Second-Order Analysis


Second-order analysis impact

 

The next bars are explaining the impact on Sales when other variables are included,  the Sub-Category is Copiers. The first batch (red box above) is telling us some related characteristics of Copiers. The second batch is telling us some other things that are different about Copiers that are not necessarily characteristics of Copiers. And these two batches in combination will tell us how Copiers perform differently compared to other sub-categories. Let’s discuss the details of each bar.

 

First Batch

 

  1. Discount is 0 to 0,19 and Sub-Category is Copiers:

    Discount is 0 tot 0.19

    When we look at the description text, it says “Discount is 0 to 0,19 occurs 49,5% of the time globally but it changes to 32,4% when it is known that Sub-Category is Copiers. Because of these cases, Sales increases by 357,9.” Generally, when we give discounts around 0-0.19 regardless of the sub-categories, our Sales perform well. But when we give that amount of discount when we sell Copiers products, our Sales perform particularly better and increase by $358 from average. This is what the Impact tells us. In addition to that, 32% of Copiers transactions are subject to 0-0.19 discount. Should we think about giving this amount of discount to more of our Copiers deals? It’s definitely something we should consider.



  2. Order Date is 2018/3 and Sub-Category is Copiers:Order Date is 2018/3

    The second bar tells us that Copiers performed worse in March 2018 compared to other months. 

    The explanatory text says “Order Date is 2018/3 occurs 2,4% of the time globally but it changes to 8,8% when it is known that Sub-Category is Copiers. Because of these cases, Sales decreases by 151,9”. This particular time decreased the Sales to less $152 than the average. However, almost 9% of our Copiers Sales from 2015 to 2018 were made in this particular period. Should we take a look at what happened in this period to Copiers so that the same mistake won’t repeat? Or is there no need for us to further investigate the 9% transactions since there are other factors that require more of our attention?

  3. Discount is 0,21 to 0,4 and Sub-Category is Copiers & Discount is 0,19 to 0,2 and Sub-Category is CopiersDiscount is 0.21 to 0.4

    Let’s take a look at the next two bars since both of them are giving us the information about Copiers in combination with Discounts. Here, I am going to address the second one (Discount is 0,19 to 0,2) as this has a bigger impact on our Sales. The former should be interpreted quite similarly. This bar shows that given the discount is between 0,19 and 0,2, the impact on our Sales is -$830. Apparently, giving this amount of discount had an extreme effect on reducing Sales.. That’s a tremendous amount that we can save simply by not giving this discount! As if it’s not enough,, these particular discounts have been given to more than half of our Copiers transactions. Can you believe that 54% of the transactions are in fact, dragging our average Sales down? What should we do with these transactions? Should we take a step back to assess if there’s a way for us to avoid giving 0,19-0,2 discounts to these customers in the future? We should certainly take notice of this analysis.

  4. Small Terms Related to Sub-Category is Copiers:

Small Terms Related to Sub-Category is Copiers

This represents 88 other terms related to Copiers Sales that don’t appear in the other bars. In our case, the combined small terms are causing the Sales to be dragged down by $1,641. However, these terms don’t appear since the individual impact is rather small. Although this bar looks larger than any of the previous ones, remember that Einstein Discovery is aiming at bringing you a quick insight regarding the terms that cause more impact on Sales. So it is very important to note that individually, none of these 88 terms are larger than the previous bars (hence variables) we discussed before.

 

Second Batch

 

In this second batch, we are talking specifically about the Unrelated category to the Copiers sub-category. The reason why Einstein Discovery includes these bars here even though these are unrelated to the Copiers is that this batch demonstrates to us factors that may affect Sales globally, either positively or negatively. From a business perspective, we can interpret that there are times when we do good things and bad things. And this chart tells us what happens when we do good or bad things more or less often.

Einstein Discovery lays out how by doing good things more often, the impact on Sales is positive, while the impact is negative if we do less of the good things. Additionally, when we do bad things more often, the impact is then negative, while doing less bad things will have a positive effect on our Sales.

 

  1. Quantity is 4 to 5:

    Quantity is 4 to 5

    For instance, in the example above, we see that Quantity 4 to 5 brings a positive impact on Sales. This happens because a good thing happens more frequently (19%) for Copiers than it does for all sub-categories (12%), and that’s why the effect is positive.

    Ps: The reason why we see Quantity as a Range instead of a continuous variable is that Einstein Discovery categorized this data into several buckets (i.e. 1-2, 2-3, 4-5, etc.). You can always change the number of buckets you want by going to Edit Story and select the variable Quantity. However, I left it by default.

  2. Quantity is 6 tot 7:

    Quantity is 6 to 7


    The above result is quite obvious, since higher quantity may result in a more positive impact on Sales. The explanatory text says that “Quantity is 6 to 7 occurs 11,8% of the time globally but it changes to 7,4% when it is known that Sub-Category is Copiers. Because of these cases, Sales decreases by 26,39”. This means that 11,8% of the overall transactions across all sub-categories have a Quantity of 6 to 7. But this happens less in Copiers, which is down to 7,4%. Because this good thing happens less frequently (7,4%) for Copiers than it does for all sub-categories (11,8%), the effect is then negative.


  3. Unrelated Small Contributions:

    Unrelated Small Contributors

    There are 2 terms that contribute negatively to Sales. These combined contributions bring Sales -6.7 from the average. With this information, you can investigate with your team and the person who is responsible for these terms whether or not they are aware that these terms bring our Sales down.

  4. Unexplained

 


Unexplained

 

Unexplained bar lays out the comparison between the predictions made for all observations in the requested subset, and their overall average, compared with the observed average. This figure shows whether the average for unexplained factors was higher or lower. This difference between the predicted average Sales (from Einstein Discovery’s predictive model) and the actual average Sales (calculated from the dataset) is $969. Having this unexplained bar shows that statistical representation represented by the previous bars is not exactly matching up with reality. This is completely normal, otherwise, we would have what's called overfitting (modeling error when the function is too closely fit to a limited set of data points).

WHAT COULD HAPPEN

We discussed What and Why It Happened in the previous section. In What Could Happen, we are going to discover the analysis to predict future outcomes. Additionally, with prescriptive analysis, Einstein Discovery also helps us to map out how we can improve these predictions so that it may bring a more desirable outcome.

 

What could happen?

Remember at the very beginning, we started by uploading our dataset? We were asked to specify what type of Story we want. Since we chose Insight & Predictive Analysis, we had to select which Machine Learning algorithm we’d like to use. These are the advanced algorithms that Einstein Discovery uses to conduct a “what if” analysis based on the statistical outcome of our data.

 

 What Could Happen tab Sample Superstore

When we click on the What Could Happen tab above, we see a list of exploratory variables that are used to predict our model. This list is sorted in descending order on the Correlation score with our outcome variable, Sales. As we discussed earlier, starting from the one with the highest correlation is Sub-Category, followed by Quantity, Discount, and the rest of the variables. Note that you can select combinations of field values to see how the interaction among these factors affects the predicted Sales.

 

Einstein Prediction

 

Now let’s say we want to focus on our Machine sub-category, as it is our second highest significant product. For Actionable, we decide that our team wants to work on the Discount, therefore, we would like to know what improvements can Discount bring. By doing so, Einstein Discovery predicts that our Machine Sales would be $1102.97. Nonetheless, this can still be optimized by setting up our Discount to 0 to 0.19. This will increase Sales to $1564.477. Not bad at all! It also shows us that by selecting Machines, it brings a positive effect on the average Sales (increase it by $211.3).

 

Prediction of Sales

 

Scrolling down, you’ll see another waterfall chart. This chart simply tells us:

  • The Baseline of Sales, which is around $3.3k

  • Sub-Category when Machine is selected, which brings $211 increase on average Sales

  • Expected Impact of Other Fields that shows -$2.4k difference to average Sales should if fields are considered

  • Predicted Outcome of $1.1k in Sales

Waterfall chart of prediction

The last graph is quite self-explanatory, showing us how much Sales can be increased by setting Discount to a specific range. As seen here, the outstanding one with a $2.7k increase in Sales happens when we set up Discounts within a range of 0 to 0,19. The rest of the Discount ranges bring around a $0.5k increase in Sales.

Fantastic! Now we know how to interpret a Story in Einstein Discovery using our Superstore data. We also learnt how to max up our Sales by looking at the most statistically significant variable, Sub-Category. We also got more understanding on how Einstein Discovery may help us to improve them. It’s all very simple to be achieved in a matter of minutes simply by uploading your dataset into Einstein Discovery and letting it do all the hard work for you.

Check out this video to learn more about it.

Now, is there anything holding you back from using Machine Learning for your business case? Contact us to learn more about it and start your smart, efficient, data-driven journey with us!

Issye

Author
Issye Margaretha

Issye Margaretha

I help you get insights from your data. Easier. Faster.

Read more articles of this author
Let's discuss your data challenges

Join our community of data enthusiasts

Get industry insights, expert tips and Biztory news sent straight to your inbox with our monthly newsletter.