1. What is Tableau Prep?
1.1 Tableau Prep Builder
Tableau Prep Builder is a powerful data preparation tool that allows users to quickly and easily clean, reshape, and combine data from a variety of sources. With its intuitive interface and a wide range of built-in functions, Tableau Prep makes it easy for users to perform complex data transformations without the need for coding or scripting.
The workspace of Tableau prep builder is very visual - allowing you to easily keep track of changes made throughout the flow, and identify and fix data quality issues at each step. There are three main sections of the tableau prep workspace - the flow pane (1), the profile pane (2) and the data grid (3).
A flow is a series of steps that you take to clean and reshape your data, and shows a visual representation of what is happening along the way. The profile pane allows you to see a summarized view of your data, and provides you the ability to click on a value to highlight the other fields . Clicking somewhere here will also filter the row level data grid below which is handy if you want to drill down and investigate certain values.. The value 0.7 in the 4th column (Discount) has been clicked in the example above and you can see how the other columns are highlighted for where this value exists, and the data grid below is also filtered for this value.
At any point in your flow, you can add an output step to write the data to a file, database, published data source on Tableau Server/Tableau Cloud, or to CRM Analytics. This allows you to easily share your cleaned data with others and collaborate on your data analysis. You can also directly preview your results in tableau desktop by right clicking any step and selecting “Preview in Tableau Desktop”.
You can import whole flows, or a certain group of steps from a flow, into another flow. This can help to save time and ensure consistency in your data cleaning process.
If you would like to share your flow then you can publish it to Tableau Server/Tableau cloud and allow others to run it, edit it, or make a copy of it. This allows for easy collaboration.
1.2 Tableau Prep Conductor
Tableau prep conductor is part of the Data Management Add-On for Tableau Server, and it allows you to schedule Tableau Prep flows (which have been published to Tableau Server/Tableau Cloud) to run automatically at specific intervals such as hourly, daily, weekly or monthly. This ensures that your data is always up-to-date and ready for analysis, saving you time and effort.
Tableau Prep Conductor also allows you to monitor the performance of your flows. You can view detailed flow run information, including start and end times, status, and any error messages. This allows you to quickly identify and resolve any issues that may arise during the flow execution.
2. Preparing and Cleaning your data with Tableau Prep
Tableau Prep supports a wide range of data sources, including Excel spreadsheets, CSV files, Tableau data sources, databases, and cloud-based data sources. When you open Tableau Prep, it’ll ask if you want to connect to data or open a previous flow. When you select the “Connect to Data” option, you choose your data source, and enter the connection details if required. After successfully connecting to your desired data source, Tableau will then take you to the workspace. From there, you can start building your flow to clean your data, or you can add more data connections via the connection pane.
Start building your flow
Add another data connection
Tableau Prep offers several data cleaning options including:
Filtering: Remove unwanted rows from your data with a calculation, filter selected values, filter on a range of values, filter using a wildcard match, or filter null values. You can also easily click on a value in the profile pane or data pane and select “keep only” or “exclude”, just like in a visualization in Tableau Desktop.
- Grouping: This can be done by manual selection, pronunciation, common characters or spelling. You can also control click and directly group values, or replace with null for example. With the automatic options, you can view your results and adjust if needed.
Automatically grouping values based on spelling
- Joining: Selecting the join type is done in a visual way with a venn diagram. You can select which parts of the diagram you want to keep which will translate to a left, inner, right or outer join. You can then view the results to see how many rows from each table have been included and excluded, and how many rows the join result has.
Configuring your join
If you want to drill into the rows that didn’t join then you can click the bar to see the unmatched rows.
Viewing the unmatched rows
- Pivoting: Transposing columns into rows, or columns to rows
- Unioning: You can union two or more tables together and easily review the results. From the union configuration interface, you’ll then be able to see if there are any fields that didn’t match. The coloured squares will help you to see in which table or tables the unmatched field doesn’t appear. You can see in the example below that Name and Order Number are in the first datasource but not the second. Tableau Prep will then try to help you match the columns from the two tables that aren’t exactly the same. When I click Order Number, it highlights “Order No” in the other datasource to suggest that these could be a match. To then match them, I can ctrl click both of them and select “Merge Fields”.
Reviewing union results
- Splitting: Separating a single field into multiple fields. This can be done automatically or by manually defining the delimiter and which parts you want to keep.
- Formatting: Modifying the appearance of your data by changing the data type or format.
- Calculation: Creating new fields based on calculations performed on existing fields.
- Aggregating: Grouping your data by one or more columns and optionally adding your metrics in according to your specified aggregation type.
- Adding new rows: Fill in the gaps for sequential data (numeric or date type).
- Renaming: You can rename fields or values within the fields by simply double clicking on the name and typing in a new one. It is also possible to bulk rename fields by clicking on “rename fields” (when more than one field is selected, or no particular field at all is selected) and replace/remove text, add a prefix or add a suffix to each field.
These options can be used in combination to clean and transform your data to meet your needs.
3. Scheduling your flow with Tableau Prep Conductor
Once you’re happy with your flow, you can publish it to Tableau Server so that it can be shared with your colleagues.
As a further step, and only If you have the data management add-on for Tableau Server, you can automate your flows to run on a schedule.
Flow schedules are added in the same way as for extract refreshes and subscriptions - they are on a server level, and are set up independently from your flows to begin with, You can then select one of these predefined schedules on which to run your flow (or string of flows).
You can either select the data output steps that you want to include, or set it to include them all.
You can also select the option to send you an email each time the flow has been completed. This can be configured so that the email goes to certain groups or users and you can change the subject, add a message, and also add the data if you want.
If your flow relies on the output from another flow, you can also define this in the setup by choosing the “linked task” option. You can choose which flows should be run together and then define the running order. This means that the next flow won’t start until the previous one is finished. You can also define what to do if one of the flows fails - whether it be to stop all proceeding flows, issue data quality warnings, or send you an email.