Hyper API to automate a complex data workflow in Python straight to a published datasource on Tableau Server

Hyper API to automate a complex data workflow in Python straight to a published datasource on Tableau Server.

AKA: You will be able to combine data from multiple sources within the same data connection in Tableau while automatically updating the data on Tableau Server on a custom schedule.

But let's begin from the start first...

How can Tableau process such a massive nr of rows with no waiting times?

The introduction of the Tableau Hyper in-memory data engine (version 10.5 on) is what made it possible to query and extract billions of rows of data with impressive performance times.

The Hyper concept was first created as an academic research project at the Technical University of Munich (TUM) in 2010 and was later acquired and further developed by Tableau.

So what is this about and why is it so great? We can summarize its unique characteristics in two main points:

1) Dynamic code generation

Hyper makes the most out of the multicore hardware used in modern technology. The query doesn’t get ingested all at once, but the Hyper engine rather creates an ad-hoc tree that allows to process that specific query better and faster.

2) Morsel-driven parallelization

We all know that multi-core work makes our life easier. In this case, Hyper chunks up the work in tiny pieces (morsels) and re-distributes them across all available cores.

In simple words, each core can work at its optimal speed and the overall hardware utilization is very well-optimized.

Hyper API architecture

What this means in practical terms is that we can now extract a million rows dataset from 5min to 10 seconds with the added benefit of a faster data processing, which results in a faster creation of complex visualizations, filter application, etc.

Once a data connection is made in Tableau, the data can be transformed to a .hyper file by creating an Extract in the Data Source page. Data from multiple data connections can be pulled together in the same Extract by using a Cross-Database Join, so that the information from all the sources can be processed quickly and efficiently. However, not all data connections are available for cross-database joins, and we might run into the problem of not being able to pull all the data we want to a single .hyper file.

Also, if you want to include data that is extracted through eg. an API call in Python in your multi-connection extract file, we need a way to combine this data coming from different sources in one place. Besides this, having the possibility to have a multi-table extract file would allow us to reduce the number of data connections in our workbook which is great also for performance.

For these reasons I will now show you:
1) How to create a single and multi-table hyper file in Python combining dataframes extracted through API calls to an external service.

2) How to quickly push the extract file to your own Tableau Server environment

Doing the setup

First of all, we need to install or update the pantab and tableauserverclient library in Python. To install the package for the first time you can run the pip command:

^{pip install pantab}

^{pip install tableauserverclient}

Or to update the library:

pip install --upgrade pantab

pip install -upgrade tableauserverclient

For the sake of this blog, I extracted some data about the most popular data science and data engineering videos on Youtube through the Youtube API (Googleapiclient in Python). The idea is to create a dataframe to store the new updated data every time we run the Python script. If you would like to rather use .csv, .txt, .xlsx or .json files that you locally have on your computer, you can easily read them in Python in many ways, as it is explained in this article.

An example of how the data I extracted looks like is:

If you’re curious to know how I extracted the Youtube data with the Googleapiclient API you can check my code on Github, and if something is not clear feel free to reach out to me on Linkedin.

Once you have uploaded and/or extracted the data you want to send to Tableau, we need to transform it to a hyper file. With the pantab library this is almost too easy:

#File

names

hyper_filename_1 = "data_science_info.hyper"

hyper_filename_2 = "data_eng_info.hyper"

hyper_filename_3 = "data_video.hyper"

### WAY 1 (2a): Creating two .hyper output and Publish two separate datasources on the Server

# Create the .hyper file

pantab.frame_to_hyper(data_science_videos, hyper_filename_1, table = 'Data science videos')

pantab.frame_to_hyper(data_engineering_videos, hyper_filename_2, table = 'Data engineering videos')

### WAY 2 (2b): Creating a single .hyper file with two tables and Publish it on the Server

dict_df = { "Data engineering videos": data_engineering_videos,

"Data science videos": data_science_videos}

pantab.frames_to_hyper(dict_df, hyper_filename_3)

And that’s it! Our files are created locally and if we open them in Tableau, they look like this:

Now, it is time to send the newly created dataset to Tableau Server/Cloud, so that our colleagues can start analyzing it.

The tableauserverclient library allows us to do so with just a few lines of code thanks to the datasources.publish command. The full code can be found on my Github and the API documentation for this command is here.

Once this is done, we are only missing a way to automate the script so that we don’t have to run it manually over time to update the data on Tableau Server. For this, we will just need the in-built Windows scheduler for our Python script, but this blog is already long enough so hold on.. and jump to the next one here :D that’s gonna be way shorter I promise!

As a general note, this solution is great for small-scale use cases, but if requirements and capacity increase, I would always advise a full solution that includes loading/storing data in a data warehouse (possibly cloud, possibly Snowflake).

Author

How to automate a complex data workflow in Python to a published data source on Tableau Server

How can Tableau process such a massive nr of rows with no waiting times?

Doing the setup

Author

Agnese Giacomello

All the latest

More insights

Unlocking Agentic Analytics with Salesforce & Tableau

Will AI Agents Replace Data Analysts?

Still Treating Snowflake Like a Data Warehouse? You're Missing the Point.

How to automate a complex data workflow in Python to a published data source on Tableau Server

How can Tableau process such a massive nr of rows with no waiting times?

Doing the setup

Author

Agnese Giacomello

All the latest

More insights

Unlocking Agentic Analytics with Salesforce & Tableau

Will AI Agents Replace Data Analysts?

Still Treating Snowflake Like a Data Warehouse? You're Missing the Point.

Join our community of data enthusiasts

Subscribe to our newsletter