How can I use Tableau Desktop to prepare my data?
For years, Excel has been used as the data conveyor/analysis tool by providing pivot tables and graphics. Unfortunately, pivot tables don’t get along with Tableau (eg.: Ryan Sleeper, Chapter 4 of Practical Tableau). This is actually a barrier to new users as they just want to bring their huge sales per region per year table but it returns a weird result in the Tableau Data source pane.
Let’s take a report of the sales per state in the United States between 2014 and 2017 as an example:
You can see that the data comes as is in Tableau Desktop, don't forget: "Garbage In, Garbage Out"
The result in Tableau doesn’t really look good. I have different data types in different columns and the names of the fields don’t really apply to their content. For instance, the first column is named “Sum of Sales” but it actually contains the name of the states.
To solve this issue, you would have to use the data table behind this report. Unfortunately, most of the time, you will not have access to it because it is linked to another excel file in another shared drive to which you have no access. As a result, you need to restructure the data.
Tidy table as a standard format
The standard format for the data you want to load in Tableau Desktop should be ideally similar to the concept of tidy data (Wickham, 2014). The easiest way to approach is to think about your data as a proper Excel spreadsheet where:
Each variable forms a column. eg. Category, Sub-category,...
Each observation forms a row. eg. #421, Tables, €450,...
Each type of observational unit forms a table.
This is the standard way of structuring your data before using it in Tableau. Of course, there are some cases for which you data can be structured differently.
But how can I achieve this ?
To manipulate your data you have multiple options: you can either prepare your data before connecting to Tableau using Excel but I would advise you to use a proper tool such as Tableau Prep or Alteryx.
If you don’t have any of these tools at your disposal, another option is to use the built-in Tableau Desktop tools such as the Data Interpreter and the pivot option. I will describe here how to effectively use those tools and get a clean and tidy data set to work in Tableau.
The Data Interpreter
Tableau Desktop provides a set of options to adapt your source file. The first option you can use to clean your excel pivot table is the Data Interpreter:
The Data Interpreter will interpret your file and define where the fields and values of your dataset are. It can look at different tables within the same sheet and return it in different tables. Once you click on the Data Interpreter checkbox, Tableau will allow you to review the results using Microsoft Excel. The files open with a short explanation on the interpretation.
Reviewing the results can help understand how the Data Interpret sees the data. In my case as I have only one pivot table, the Data Interpreter is doing a pretty good job to identify the header of the fields and the values.
Once you have reviewed the results, you will see that Tableau has adapted the preview of the data according to the results of the Data Interpreter. From this point, I’m able to move forward as I see that I only have values in the rows and not any information related to the name of the fields. I will rename the first column “State” and align the naming of the next two columns by removing the “Years” and the “Order Date” part of the name of the field.
You might think this is over since I have a relatively clean data set but I will stumble upon other issues. The first one is the fact that my Excel pivot table had a total column for each year. I should get rid of those columns by hiding them. The same goes for the Grand Total column that is at the end of my dataset. The hide option is great if you want to limit the amount of fields you want to see in Tableau. Tableau doesn’t delete them but it helps make the interface less scattered.
I also have a grand total row which comes from the pivot table. Since the hide option is only available for the fields, I need to find another solution to get rid of that row. Tableau Desktop actually offers multiple options: I could filter it out at the source using a data source filter, or I could decide to filter it later using a quick filter. In this case, I will filter it using a data source filter.
Pivoting in Tableau Desktop
As I mentioned earlier, the data should be structured in a "tidy format". To do so, I’m going to use is the pivot option. I will use it to convert my wide dataset where each name of the fields is actually a value : e.g. 2014 Jan is January 2014 into a long dataset where you will group all the values under one column. The tidy format is the norm but it might not always be the right format for the visualisation you want to build.
To pivot columns to row in Tableau Desktop, I need to select multiple fields and then right-click to finally click on Pivot. Tableau will then generate two new columns: Pivot Field Names and Pivot Field Values. I rename those as Date and Sales. Finally, I convert the Date field as a Date.
From a raw pivot table coming out of Excel, I was able to use Tableau Desktop and its set of Data prep tools to transform it into a tidy data-set that can be used in Tableau. This set of tools might seem quite light but it happens to be very effective in cases like the ones mentioned above. If you are looking to automate or at least repeat this operation on a regular basis, you might need to look at a tool like Tableau Prep.
Join the Data Jam
92% of companies fail to scale their analytics, which likely includes you. We have studied the patterns in hundreds of client engagements and cracked the code for a modern data stack that guarantees success.
We'll uncover this in the most original webinar you've attended this year.