Perhaps one of my favourite new features from the last year is the new relationship based Tableau data model. This new Data model means that joins are now smarter and aggregations more contextual to their source, and this got me thinking that this new data model might just have changed the process of using a scaffold to build out custom views or fill in missing data, simplifying and making analysis using a scaffold more dynamic and allowing us to have more confidence in that analysis.
Before we look at exploring scaffolding with the new data model lets define a little bit both the concept of scaffolding, and the properties of the new data model before we look at how these properties make scaffolding a more intuitive process.
Data scaffolding is a visualisation technique that uses a simple data structure that will support your data for analysis one of the more common use cases of scaffolding is to add in missing dates into your view. Credit should be given here to Joe Mako who pioneered the data scaffold technique and to Ken Ferlarge and others who have really helped document and popularise this process.
The other key concept to introduce is the new Tableau data model. This data model changes how tables are managed in Tableau data sources, adding a new logical layer to out data model, which in effect allows us to build multiple logical tables from the physical tables from our connections and then to relate these together.
In older versions of Tableau we had only one logical table per datasource made up of lots of sheets and tables from our connections joined and union’d together. The latest versions of Tableau introduce the ability to have multiple logical tables in our datasource. We can then define some ways these tables are linked together. Tableau will then use these rules when you are building a viz to join tables at the time of visualisation. This joining at the time of visualising follows 8 rules that result in two key features of relationships, smart aggregations and contextual joins. If you would like to learn a little bit more about the ins and outs of relationships Tableau’s own Bethany Lyons has a great series of blog posts on the topic of relationships.
8 r’s of relationships from Bethany Lyons Blog Post
These 8 R’s are important when scaffolding because we can add this new set of data into the datasource without being concerned that we are going to end up duplicating the data, with the RESOLVE and REPLICATE of smart aggregations we can have confidence that our measures will only be resolving to their native level of detail from their source data and if there is any replication it is only in the view and so will be easier to spot and deal with than if we had done a hard join between the two tables.
So lets take a look at scaffolding with the new relationship model. In this example we will be using the following dataset. In this data we are recording the occurrence of two events on a daily basis across two complete months. The number of orders placed, and number of returns received.
Now we want to build a view that uses the running total of orders placed to identify when we hit our monthly target of 650 orders, and if there was any growth of orders on a day.
Very quickly we can see the problem here, we have many days that don't appear in our data, because there were no orders placed on those days.
Now we can try using the ‘show missing values’ feature of Tableau in impute the missing dates:
But this will only get us so far. Despite March 2020 being long over Tableau will not recognise that we should have more dates in the month, this is because Tableau at the time of writing can only determining missing dates between the minimum and maximum dates in your dataset. We can also see this in the first day of our dataset. It should really be the first of February but even when we are ‘showing missing values’ we are getting the 3rd of February as our start date because that is the first day in our dataset.
If we want to fill in missing dates outside the range we will need to use a scaffold.
The scaffold will contain a simplest of dates in the range we are interested in 1st Feb to 31st March:
Now we can bring this scaffold into the datasource editor and place it so the relationship ‘noodle’ joins it to the data.
As for defining the relationship, we have a couple of options, we can either use the default clause Tableau has generated, or we can use the traditional scaffold technique of joining based on a pair of calculated fields with dummy values.
For this example though we are going to look at using the default date = date that Tableau has generated.
We can use this because relationships do not need to set a defined join type and can make use of all the different join types depending on the fields in the view, and so we don't have to worry about missing out of dates because we haven't done the right join type. So we will start with a new sheet, but here we have replaced the date field from our data with the date field from our scaffold.
This won't immediately impute the missing dates because with relationship contextual joins if there are dimensions from two tables on the view the join will show only relevant data from across the two tables, and as our measures are all from the data table only those dates with a match in the data table are going to be shown.
However we can use the features of relationships to recover the missing members of the date dimension by bringing on a measure from the dates scaffold we can expand the relevant context of the view. Now the dates scaffold doesn't have a measure that we can bring in however Tableau generates a measure for each table in a datasource, the count field. To add this into the view we are going to place this into the marks card on the detail shelf, so it doesn't alter how we are drawing the visualisation.
Now if we highlight the extreme ends of the view we can see that we have the full range of dates to match up with our data.
One thing to note with scaffolding and relationships is that when we are matching based on dates and filtering on the view we need to be careful not to filter out ‘null’ values from the view, as this would filter out those dates which we are bringing from the scaffold, because the data for those dates in all fields would be ‘null’.
If we exclude or remove null in a filter we will exclude all those dates where we don't have data, bringing us to back to the start.
So this has been a look at using data scaffolds with the new Tableau data model. The new data modelling capabilities certainly don't hinder the use of scaffolds in building your views, and in fact with the features of relationships we can have confidence that we can add scaffolds into the datasource without needing to manage unnecessary duplication or complex joins in all views, and without needing multiple data sources with and without scaffolds. The new tableau data model simplifies the process of adding and managing scaffolds for your data.
Join the Data Jam
92% of companies fail to scale their analytics, which likely includes you. We have studied the patterns in hundreds of client engagements and cracked the code for a modern data stack that guarantees success.
We'll uncover this in the most original webinar you've attended this year.