1. Prep versus Desktop: same yet different function libraries
Going back and forth between Tableau Desktop and Tableau Prep builder there may have been a point in time where you stumbled upon the reality and limitations of the two slightly imbalanced function libraries between these tools. Chances are that your flow has been killed by "Right, of course I don't have that function at my disposal in Prep" in your recent, or not so recent analytical past.
What did you do? Did you go with the flow like Spongebob and hopped to the next thing on your to-do list or did you challenge yourself to find a way to accomplish your analytical goal in Prep, whatever it took? Because, like always with Tableau, when a functionality isn’t there right out of the box, there is (most likely) a way to achieve it.
2. A LOD is happening to Tableau Prep
For example, having spent some time in from Tableau Desktop, one might have found themselves creating Level Of Detail (LODs) calculations before being instantly confronted with the fact that such a thing wasn’t possible. Alexander Walczek’s blog is a good example of how one could actually “mimic” FIX LOD statements in Tableau Prep making use of the summary node to supplement the full data with the aggregated values doing a group by at the preferred level of detail, using the dimensions one would normally put into a FIX statement. Followed by a simple row level calculation, a “percent of total of dimension x” can be established this way.
If you want to learn more about this technique, check out his blog here: How to mimic the magic of LOD expressions in Tableau Prep. However, get ready for the relief, with the arrival of Tableau Prep Builder 2019.4 LODs will make it as an actual released feature (hooray!). Please don’t disappoint Michelle Tanner here and join us here in a little celebration dance!
Michelle Tanner is happy LODs are coming to Tableau Prep
3. Next up: Table Calculations in Prep?
Although Tableau Prep Builder has become an all-round Swiss army knife for data preparation, there is this interesting Tableau feature missing at the moment: Table Calculations.
“Table calculations are a special type of calculated field that computes on the local data in Tableau. They are calculated based on what is currently in the visualization and do not consider any measures or dimensions that are filtered out of the visualization. For any Tableau visualization, there is a virtual table that is determined by the dimensions in the view. This table is not the same as the tables in your data source” - Tableau Help Documentation.
Specifically, the virtual table is determined by the dimensions within the actual level of detail, which means the dimensions on any of the following shelves or cards in a Tableau worksheet.
Red lines indicate all factors taken into account by a table calculation.
Let us help you, we are talking about all the functions below. You don’t use them? There is only a small chance that is true as something simple as a “% of total” or “Difference from” makes use of the TOTAL and the LOOKUP function.
Quite a list, isn't it?
Also, the concept of “a table” is not present in Tableau Prep, however you could conceive the full output result of a prep node as the the “table” along which the calculation needs to be performed and computed. Therefore, most Table Functions could make it to Tableau Prep. Until then, we are here to show you how you are able today to mimic most of the Tableau functions today by making use of the Python script functionality. So here we go.
4. So what about Python?
You can write scripts with Python that mimic the actions of Table Calculations from Tableau Desktop. The functions for these scripts can mostly be found in Python packages such as Numpy or Scipy.stats. Next, you can recall these scripts in Tableau Prep Builder by using Tabpy.
4.1 But wait a second, what is Tabpy?
Tabpy is the Python server from Tableau which expands the functions of Tableau by using Python scripts. Underneath, you can see a small tutorial on how to connect to the Tabpy server and import the Python script into Tableau Prep Builder.
Connecting Tableau Prep to Tabpy and importing Python script
We’ll assume that you know how to connect to TabPy and point correctly to the Python function name (case sensitive!). For more detailed information about installing and connecting to the Tabpy server have a look at this older blog we put out on using Tableau Scripts with Tableau prep: Click here. Make sure to read this before you proceed.
Python scripts in Tableau Prep
4.2 Which Table Calculations did we mimic?
- Running total
- Moving average
- Window Rank
- Percent from total
4.3 Are there things we have to pay attention to when using the scripts?
Yes, these scripts won’t work if you simply copy them from here. There are 3 column names in your scripts you that be changed to the names of the columns from your own dataset:
- Firstly, MEASURE refers to the column where you want to apply the Table Calculation. This name must be changed to the correct name of the column in your own dataset e.g. ‘Sales’.
- Secondly, If there is a variable called DATE, This must be changed to a column name in your own dataset that includes dates e.g. ‘Order Date’.
- Thirdly, when there is a variable in the script called CATEGORY, this means that there is a grouping taking place in the Table Calculation and the column name you want to use for the grouping should be entered here e.g. ‘‘Customer Segment’.
Or if you don’t want to interfere in the scripts, you can also change the column names in your own data to the column names from our scripts. This is not very practical if the column names from your data cannot be changed.
Finally, you should pay attention to the data types of these 3 columns. Make sure that the data type of the DATE-variable is ‘date’ and not ‘date & time’. Or else the script will give an error unless you have specified it in the script. The data type of MEASURE is a number and CATEGORY is a string.
4.3.1 Running Total
A running total is the summation of a sequence of numbers which is updated each time a new number is added to the sequence, by adding the value of the new number to the previous running total. Another term for it is partial sum. The purposes of a running total are twofold. Firstly, they allow the total to be stated at any point in time without having to sum the entire sequence each time. Secondly, they can save having to record the sequence itself if the particular numbers are not individually important. Source: wiki
This little script will perform a cumulative sum over every row in your dataset along the date values in your ‘table’ using the cumsum function from numpy. Note that the date values are sorted in the python function and we do not execute thesort in prep itself. If you have multiple rows per date, make sure to aggregate the date values first.
Running Total making use of numpy
4.3.2 Moving Average
A moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages on different subsets of the full data set. Given a series of numbers and a fixed subset size, the first element of the moving average is obtained by taking the average of the initial fixed subset of the number series. Then the subset is modified by shifting forward; that is, excluding the first number of the series and including the next value in the subset. A moving average is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles. Source: Wiki
The script below sorts all date values chronologically and then respectively calculates the moving average based upon 10, 20, or 30 data points. Try plotting the three different moving averages to visualize the “smoothing” aspect and how this correlates with the number of points that is fed to the moving average.
Moving Average with numpy
4.3.3 Window Rank
This function does not actually exist in Tableau, but imagine you want to rank a specific category based on an aggregated measure for that category. For a top n per category analysis a function like this could be very useful. Do note that we use a “dense” rank here, a ranking function where no rank numbers are skipped after ties. This is in contrast with a “unique” rank where no rank appears for a second time for each category.
Window rank with numpy
Window Rank output: Rank of a measure within CATEGORY
In the output we can see that MEASURE is ranked densely within each CATEGORY. Do note that we are not ranking aggregates, just the mere row level values within the CATEGORY. A regular rank would rank across all CATEGORY value, sorting all values in the result set from low to high and attributing a rank to it.
This function ranks all the values from MEASURE. So, imagine that this column has 1000 records then these records will be ranked from 1 to 1000 starting with the lowest value. It is almost the same as Window Rank but without taking into account the categories. One thing to note:: when there are multiple records with the same value, they will have the same rank. But the next record in line will have the correct rank. For example, the values 1, 0, 3, 1 will be ranked as 2, 1, 4, 2.
Rank output: Rank of a measure across dataset
As an example, this rank function can easily be used to rank your salespeople according to their sales, starting with the one with the lowest sales. And by using window rank, they can be ranked within specific regions.
Rank with scipy.stats
4.3.5 Percent of Total Dataset
And last but not least, we’ll show you how to calculate a percent of total using a Python script. The row level value of the dataset gets divided by the total sum of that column. Something that would be rather straightforward in Tableau Desktop using a Table Calc with Total (dynamic percentages) or a Level Of Detail (fixed percentages) but that isn't possible at this moment in time.
Percent of total with numpy
The Python scripts we’ve shared in this blog are only the tip of the iceberg of what you can do in Tableau Prep with Python today. Are you are in need of a specific function? Why not go for it and try to make it yourself. We haven’t even talked about creating your own connectors with Python! With the arrival of Python in Prep, the possibilities are endless. At the beginning of the blog we mentioned that Fixed LOD and Rank is coming to Tableau Prep soon. This was mentioned in the last Tableau Conference in Las Vegas last week and even Rank a (table calc!) was announced!
Powerpoint slide from the Tableau Conference November 2019 in Vegas about ‘What to come’
You’ll need the newest 2019.4 version of Tableau Prep to use this functionality. And as we all know, most companies don’t immediately upgrade to the latest release for obvious, sensible, reasons (stability, potential bugs, etc.). The same probably goes for your company, so in the meantime the approach with Python scripts enable you to accomplish your analytical challenges until you’ve got your hands on Tableau Prep 2019.4.
Happy (Tableau) prepping and feel free to leave a comment below!
Click here to download the Python Scripts
Click here to download the Tableau Prep Flow using these scripts.