Tableau Prep's New Functionality in 2019.3: Start Using R & Python Scripts In Your Flows Today.
Scripts are coming to Tableau Prep builder 2019.3 and this is why you should care.
Tableau, in essence, is a visualization tool. However, already from the early versions, the strong analytical powers of Tableau Desktop became obvious and were furthermore enriched with the arrival of the external connections allowing to connect to R, Python or Matlab server. These external connections have been around since version 9.0. That's already almost 5 years, time flies when you are having fun, right? Thanks to the thriving online Tableau community, numerous resources and examples have become available on this topic.
The external connections are awesome and most definitely allowed Tableau Desktop to be used for Statistical and Predictive Modeling for cases which go far beyond the out of the box possibilities (such as clustering, forecasting, regression, etc). The external connections allow us to run you R or Python code "live" when interacting with a dashboard.
Already a few basic notions of R or Python you can turn you into a Tableau Wizard
You want to let your end-user input a few predictor values and let Rserve return a predicted value based on a trained model on the fly? Sure! External Connections will do that for you!
Do you want to set up a clustering algorithm which allows the user to input the number of clusters by setting a parameter value? Sure, no problem!
You want to score the sentiment of your customer complaints directly in your Tableau Dashboard making use of a python package and Tabpy? Bring it on! Yeah but, well ...
... sometimes it is not preferred that the scoring happens after you loaded your data into Tableau. Moreover, the downside of that with the external connections is that it does not really cache any received output, which results in re-running your R or Python code whenever you interact with your visual or dashboard. This behavior can be preferred when realtime scoring is what you need in your dashboard. However, in a lot of cases, you'd just want to have your prediction already materialized in your dataset, just waiting to be visualized, not dealing with the burden of having the result recalculated over and over again, potentially resulting in performance issues and waiting times for your end-user. And nobody, ever wants to frustrate the end-user, right?
Therefore the arrival of scripts in Tableau Prep (Builder) is a real game-changer. You can literally let your R and Python code run in your flows. The results of will be saved into your output, ready to be vizzed. the sky is the limit here in terms of possibilities. If you can do it with Python, you most likely can do it with Tableau Prep from now on (and that's quite a lot).
What do you need?
-At least a few basic notions of R or Python code for creating functions and models.
-An R or Python server running locally or remote (preferably secured) to connect to.
-A Python file (.py) or R file (.r) containing the function and/or models you want to use in your flow
-An eye open on your CL interface running Rserve or TabPy in order to facilitate debugging.
So how does it work?
(high level, watch the video for specifics)
-Make sure you are connected to a data source
(Note: even if you are not going to use it in your script, a dummy Excel file with one value in it would do!)
-Click the "+" button in your flow to add a script
-Connect to TabPy or Rserve
-Select a file: point Tableau to a .py or .r file
-Enter the name of the function which will transform your data (Note: this needs to be exactly the same as the function definition name in your file).
-Make sure you have a dummy/scaffold column available for receiving the output back from R or Python or explicitly define the output schema.
When your script generates new column names, use this function at the end of your code to let Prep know what's coming.
As this part is wicked important I'll repeat it for you: "..or explicitly define the output schema". You can not generate new columns upfront from your code file.
This is relevant when your output schema before your script is not identical to input schema after your script. So what you can do here to overcome this is to make sure the columns to which you refer in your code already exist before your script initiates.
Yet, in many cases, this can be considered a bit cumbersome and in practice, you probably want to make use of the 'get_out_schema()' function the end of your code.
That's it, if you have any remarks or questions, do not hesitate to leave a comment below!
Stay tuned for more blogs on how to use Tableau Prep with Scripts in the near future.
Before watching the tutorial, make sure you have installed TabPy:
Tim Dries works as an analytics consultant and Data Science Practice lead at Biztory.
Feel free to connect on: