Integrating Tableau and R: Tips and Tricks for Intermediate Users
I will assume that you are comfortable enough with both tools and that you know how to connect to Rserve using Tableau Desktop. If not, make sure to check this official Tableau tutorial or this article from Bora Beran.
- Most tutorials are not particularly informative.
Most of them are not even made by R users. Consequently, you see countless examples of using the R connection to do things that Tableau can do natively or with the help of some calculated fields (e.g. K-means clustering, correlation, or concatenating strings). In my own experience, the most useful resources have been:
- TC18 session “Data science applications with TabPy/R” by Nathan Mannheimer (Product Manager of Advanced Analytics in Tableau).
- Tableau and R Integration wiki by Jonathan Drummey
- You can only get one vector back.
When I first started, I wanted R to wrangle my data source (e.g. tidyr::spread, dplyr::filter, dplyr::slice) or modify the content of multiple columns at the same time. This is, unfortunately, not possible yet in Tableau Desktop and Tableau Server. It will be perhaps possible with Tableau Prep in the near future.
For now, the R connection behaves similarly to dplyr::mutate when you create a new column. In other words, you can only pull out of a vector of size 1 (and append that value to every single row) or a vector that is exactly as big as your data, but you cannot generate new data source.
Also noteworthy: if you want to pull out a vector that is bigger than one, but smaller than your data source, you have two options. You can make one column with each value and append multiple columns to your data. Alternatively, you can create a scaffold with the same amount of rows as the vector you want to pull. It might seem cumbersome, but it might be worth it to improve the performance of your workbook.
- You can tell R to do ANYTHING
R will do ANYTHING that Tableau tells it to do. Even if it has nothing to do with the data that you are passing to Tableau. Meaning that you could potentially call the Domino’s API inside of an R script and you will get a pizza at your door. Even if you get an error from Tableau.
The Domino’s API is probably not a great use case. But you could potentially:
- Send data back to a database or to Google Sheets.
- Call other programming languages inside of R.
- Use R to call API’s. (I have used it to classify the language of a string or transform coordinates into postcodes).
- More often than not, you will have to create a data frame inside your script
Most R functions will ask you to provide a data frame, but there is no way to refer to the data source in your Tableau environment. You can only pass columns from your Tableau Datasource to R. Consequently, you often have to create a data frame using all of your arguments:
df <- data.frame(.arg1, .arg2)
- Be careful with Factors!
Tableau can only return a boolean, integer, real, or string. That’s it! You may run into problems if you are trying to return a factor into Tableau. Make sure to cast them into a character type. Alternatively, you could cast into a numeric type. Even though I wouldn’t recommend this approach because, unless you are using an ordered factor, the numbers assigned to your value are meaningless.
- R scripts are Table Calculations
Here are some things you have to take into account:
- R scripts will be recalculated EVERY TIME you fire a query! This can be great if your script uses machine learning to predict an outcome happening in real time or when users expect the script to recalculate after using a dashboard action. Nevertheless, this is not a great approach for results that don’t change.
- You ALWAYS have to aggregate them using “Edit Table Calculations” and play a bit with “Compute Using.”
- Calculations happen very late in the order of operations, meaning that you cannot use them inside LOD calculations and it is cumbersome to filter your view based on the outcome of your R script.
To conclude this article: The R integration can be an interesting possibility for companies with limited IT capacity to bring R scripts to life. It is, after all, one of the easiest ways to deploy a machine learning model into production.
I also think that the R integration is a nice first step to closing the gap between Tableau users and Data Scientists. The reason I say first step is because Tableau has mentioned that there are plans to revamp the advanced analytics features. In December of 2018, Tableau released support for TLS/SSL-secured connections to Rserve and there are also rumors for a native capability to run R scripts in Tableau Prep. The future seems bright!
Finally, if you have any questions, thoughts, or remarks about the R connection, feel free to contact me or any of our colleagues from the Biztory Data Science Practice. We are always happy to discuss your use cases.