Introduction
Some of you may know the feeling: a new Tableau feature is released, and you can’t wait to start working with it. But… how to purposefully utilize it? Do you really have good reason to start using that new feature? Or do you just want to use it, for the sake of having used it?
That would be the situation I was in when Tableau released their new Metadata API in version 2019.3. In fact, that was part of the release of the larger collection of features under the Data Management add-on, including all kinds of tools for better metadata management with Tableau.
Now, it did just happen that a few colleagues and I had randomly been discussing the topic of calculated field naming conventions. How should we, Biztory, as a Tableau partner, motivate and reinforce the use of proper naming conventions for calculated fields? Well, one thing we can do is point out the wrong use of calculation namings. Not in a nagging way, of course, but maybe in a playful way?
The Metadata API
Enter the Metadata API. It turns out that we can get the names of each and every calculation published on Tableau Server or Online, process and analyze these, and single out the ones that are not aptly named. There’s your challenge for working with the new feature! The Metadata API can, naturally, do much more than just that. It’s your gateway to all the information on the contents of your Tableau environment, including workbooks, data sources, database connections, tables, data source and calculated fields, and even more. It is the foundation enabling data discovery, lineage and impact analysis.
It uses a language called GraphQL, which basically comes down to describing what you need. It is particularly suitable for working with metadata, as it is oriented towards describing things’ properties and relationships.
If using Tableau Server, one would start by making sure the Metadata API is enabled in the first place, as it is not by default. For Tableau Online, it is, and everything described below can function in a similar fashion.
Our Solution's Architecture
Experimenting with Tableau Server or Online’s Metadata API is easy, through an interface called GraphiQL. In my understanding, this is a GUI on top of the Metadata API, accessible through https://<your-tableau-server>/metadata/graphiql
. It does require you to be authenticated (naturally), as you are accessing the contents of your Tableau environment.
Now, that GUI is all well and good, but it’s not necessarily a structured approach to obtain the data we are looking for. It’s perfect for devising what exactly we need, and for building the GraphQL query we’ll use; however it’s not very suitable to actually export or access that data in a procedural and repeatable way.
We do have a multitude of options when it comes to designing this more structured approach. We can use any tool or language capable of approaching an API. In this specific case, I opted to use Python through Tableau Prep. Sure, a simple and autonomous scheduled Python script would work too, but why not grab this chance to also experiment with this (back then) relatively new functionality? That would mean that the data flow would look something like this:
Finding the GraphQL Right Query
The main challenge when learning about the Metadata API (at least in my case), was understanding the model and concepts that it embodies. Tableau’s documentation was relatively scarce back when I puzzled this together, as this was even before the release, in beta. Now, however, its documentation and example queries provide a very solid basis for starting to work with the API.
In my case, the whole point was to be able to tie a calculation back to its author, as to reach out to them and tease them about their poor naming convention choices. Hence, the query might look something like this:
query calculatedFields {
calculatedFieldsConnection (first: 10000) {
nodes {
id,
name,
formula,
role,
dataCategory,
dataType,
sheetsConnection {
nodes {
name,
workbook {
name,
owner {
username,
name
}
}
}
}
}
}
}
I’d recommend spending some time with the GraphiQL interface to start learning how to build these. The tool is really quite intuitive, and has good autocomplete and suggestions which really helps when trying to find just what you need.
The Flow and the Script
The Prep flow in itself doesn’t really represent much. It’s kind of a wrapper around the Python script, enabling it to be scheduled and then processing its output.
In my first iteration, I did not actually have the “Analyze Fields” step in Prep. I did use a calculation with a bunch of REGEXP_MATCH
statements to identify inappropriately named fields. In retrospect, it’s probably better to do so in the Prep Flow.
That being said, the script does a few things and is really the core of the whole operation:
- It signs in to Tableau Server’s REST API (because you need that same token to authenticate).
- It gets a list of sites from that same REST API, so we can iterate over them and get the data from all the sites in the Server. This is rather optional, and would not apply if you only use the default site.
- For each site, it gets the metadata using the query devised above.
- It concats (unions) the data to a dataframe that’ll go back to Prep.
The script and some more draft experiments are on GitHub, though in a very unorganized way: https://github.com/timothyvermeiren/tableau-metadata-api
Now, this is even an older version of the script and I’m almost certain that it won’t work. While I’m not going to completely fix it, I’ll just point out things that I’m pretty sure need to be adjusted for it to work in case you did want to reuse it:
- It kind of uses the function get_stuff() to allow you to pull other things than just the calculated fields and their metadata, but that wasn’t really necessary. I guess I didn’t finish this up the way it was intended to be, but it was just an exercise after all.
- I’m pretty sure the dataframe data does not have the exact required fields and structure.
- The output schema is incomplete; you’d like all the fields you need to use later on, here.
- The endpoint URL used in the script (
https://<your-tableau-server>/relationship-service-war/graphql
) is an old one used in the beta phase; you’d now want to use the actual URI:https://<your-tableau-server>/api/metadata/graphql
- The script might have benefited from using TSC (Tableau Server Client) for a more structured approach, though I’m not sure yet how the token would then have been used for the subsequent requests to the Metadata API after logging in to the REST API.
If you do plan on running an analysis like this, there’s one more thing to keep in mind. By default, the TabPy script evaluation timeout is set to 30 seconds. Unless you have a small Tableau Server instance, there’s a good chance that this will take longer than 30 seconds to run. You’d want to make sure you increase that value for this purpose.
The Viz
Once we have our hands on that data, as put before, we can preprocess it in Prep, or we can do… whatever we want. In this rather improvised exercise, I ended up using a calculated field in Desktop with a few REGEXP_MATCH
statements to categorize the calculated fields:
This would look for field names like “Calculation1”, “Sales (copy)”, and a few other undesirable names. The viz then, in the end, can be anything you like, and you can e.g. subscribe to it on Tableau Server to get updates on who’s not respecting the right naming conventions! In our whimsical exercise, it ended up looking something like this:
In the end, working through this entertaining exercise has proven valuable for two reasons:
- We’ve found a way to initially explore the then new Metadata API, and learn about its mechanism and its model.
- We’ve shown how Prep can be used as an automation platform for a more unconventional data pipeline than when working with data from a regular database.
All in all, my perspective is that the Metadata API, however you choose to approach it, is a tool that can deliver an incredible amount of new insights on what is being done on Tableau Server or Online, and how.