Your first steps with Tableau Server in a Docker container

Introduction

To some people, “containerization of Tableau Server” will sound rather eerie and unfamiliar. For others, it’s like the gates to a new world were just opened. Tableau Server is over a decade old, and its long track record of enabling analytics has seen it grow into a monolithic, rather bulky web application.

When talking about containerization, we are talking about more than a feature. A team has been working on this for years, and what they are doing is effectively redefining Tableau Server’s very core, and redesigning its foundations as a web application. If the concept of containerization of applications is new to you, you might want to take 5 minutes to view this video. It explains how Docker, which is software enabling you to run applications as containers, implements these concepts.

So how does Tableau Server benefit from applying these concepts? Well, it benefits from the general principles of containerization, including the streamlined development and deployment process, as well as containers’ typical consistency, portability and isolation.

But aside from that, this is also the first step towards the ability to truly auto-scale Tableau Server with e.g. Kubernetes. In short, the idea is that Tableau Server can be scaled up or down (adding Backgrounders, workers with VizQL server processes, etc.) completely independently and based on workload.

If you want to learn more about the journey towards containerizing Tableau Server, you might want to read Bernhard Damberger’s blog post celebrating the release of Tableau 2021.2 and this specific capability. Bernhard is the Product Manager for the development of the containerized “version” of Tableau Server.

Getting Started

At Biztory, we like to be prepared for anything our clients may require from us. Our crystal ball indicates that within the next few months and years, a certain number of companies using Tableau will migrate to using Tableau Server in a container, rather than the classic deployment method.

Well, yes, we have no idea how many. But we do like to get our hands dirty, we like to play with new toys, and we like to be prepared! So here are our learnings from having migrated our own production Tableau Server to a container, and running it as such.

Understanding Tableau Server in a Container

The challenge with this undertaking is that it requires a relatively deep understanding of two topics which are, as you would expect: Docker (containers) and Tableau Server. Few are the people who are very proficient in both. Here at Biztory, we’re really good at Tableau Server. And we understand Docker, so that’s a start.

Whether you are an experienced container deployer or a seasoned Tableau Server administrator, we think that you might benefit from hearing our story and what we’ve learned in the process. We found that the very first public version of the documentation that was just released requires quite some time to read and grasp. Moreover, as it is currently a single article with all contents spread out, it can be challenging to find exactly what is relevant to you.

Hence, here is the cut down version of the process followed. The process where we wanted to …

Migrate a single-node Tableau Server environment to a container

The gist of the process is that it takes place in two parts: building the container, and running the container. This applies to containerized applications in general, and in this case here is the breakdown of which Tableau configuration or action goes where:

Build

We build, i.e. we create the image we’ll later run. What happens in this step?

Ensuring the appropriate files and directories are available to Tableau, including drivers and certificates.
Passing the registration information for Tableau Server.
Providing scripts to be executed at different stages of the deployment process, including just before and just after the initialization.

In our case, we’ve made use of the usual procedure that relies on the build-image tool provided by Tableau, which in the end looked like this:

./build-image --accepteula -i ../tableau-server-2021-2-0.x86_64.rpm -o biztory-orca

Ha, wait, there is nothing special about this! We’re just pointing to the installer and specifying an image name, as instructed. That’s true, but here’s what is interesting about this step. In the customer-files directory, we’ve added a few things we’ll use later on:

We’ve added a few drivers.
We’ve modified the setup-script file to include the commands to install these.
We’ve created a ssl folder to place our SSL certificates. And also a saml folder for SAML-related files we’ll configure.
We’ve added a post_init_command file/script to apply these SSL and SAML settings after the initialization. It looks something like this:

#!/bin/bash

# The only way we figured out we could configure SSL for now.

tsm security external-ssl enable --cert-file ssl/cert.pem --key-file ssl/privkey.pem --chain-file ssl/fullchain.pem

# SAML too

tsm authentication saml configure --idp-entity-id https://penguin.biztory.com --idp-metadata saml/GoogleIDPMetadata-tableau-server-penguin.xml --idp-return-url https://penguin.biztory.com --cert-file saml/saml.crt --key-file saml/saml.key --max-auth-age -1

tsm configuration set -k wgserver.saml.iframed_idp.enabled -v true

tsm configuration set -k wgserver.saml.idpattribute.username -v "EMAIL"

tsm configuration set -k wgserver.saml.idpattribute.displayname -v "displayName"

tsm configuration set -k wgserver.saml.idpattribute.email -v "email"

tsm configuration set -k wgserver.authentication.desktop_nosaml -v false

tsm configuration set -k wgserver.authentication.app_nosaml -v false

tsm authentication saml enable

tsm pending-changes apply --ignore-prompt --ignore-warnings

The above lines are effectively executed once Tableau Server is initialized, causing it to restart a final time to apply the configuration items listed and thus setting up SSL and SAML.

Run

We take the image we’ve built, and we run it in a container with a specific configuration. In this step, we will be:

Making sure that a backup (in .tsbak format) is available to be restored into our new environment, as we are migrating.
Making sure the configuration of our existing Tableau environment is also available as we are, once again… migrating!
Passing the licensing information we’ll be using.
Mounting other files and folders of the host environment, e.g. to persist our data directory on said host.
Setting a variety of environment variables such as the user to run the services with, bootstrap information for multi-node clusters, default Server Administrator credentials, etc.

At this point, the instructions make it seem like this is a piece of cake. Now, we won’t pretend this is a super difficult task. However, we ran into a few things we’d like to share with you that might save you a deployment or two for your container!

Troubleshooting and Common Mistakes

The title says “common” mistakes, but really these are the mistakes we made and that we think others might benefit from avoiding. Anyway, here it all is, in a list for your convenience.

Troubleshooting tips:

If your build-image (the first step of all) is unsuccessful, review whether you have the right installer, and whether you’re referencing files in the right place. There isn’t a whole lot you can do wrong in this step, and the error messages are rather informative.
When you then issue your docker run command with all the necessary arguments, you might find that the container comes up, but also back down really fast. If that’s the case, running docker logs <container_id> is a good idea. Here’s how:

Use docker ps -a to list all containers, including the stopped ones, as this applies to ours.
This will reveal the (short) Docker container ID in the first column, a hexadecimal value. E.g., this might be 61a1b235710e.
You can then run any Docker command against this container by referring to it, and you can even refer to it with a partial name as long as the starting substring you use is unique. For example, I can refer to the container above as just 61 granted that there are no other containers starting with that sequence.
In this case, we’ll run something like docker logs 61, which will provide you with a first set of logs for troubleshooting.

When the container does start successfully, the process of setting up and initializing Tableau Server will start. The file to follow to monitor this process is /var/opt/tableau/tableau_server/supervisord/run-tableau-server.log (if your mounted data directory is /var/opt/tableau).
Then, a few additional logs that might contain hints when something goes wrong:

/var/opt/tableau/tableau_server/logs/app-install.log
/var/opt/tableau/tableau_server/data/tabsvc/logs/tabadmincontroller/tabadmincontroller_node1-0.log

Also good to know: you can “connect” to the docker container with a bash prompt and interact as you’re used to.

Use docker exec -it 61 bash to “connect”; or more accurately, to start a bash prompt in the container.
The /docker directory in the container contains interesting contents such as the customer-files that were provided, configs, and your user’s TSM log at /docker/user/.tableau/tsm/tsm.log

The most convenient way to keep an eye on any of the log files mentioned above is to use tail -f (see this video for a few tips). This will show you the contents of the log files as they are being updated.
It’s not quite possible to directly use your exported Tableau Server settings from your "regular" install (with tsm settings export). A few of the settings aren’t captured here, or should not be configured with just keys but rather more interactively. The SSL and SAML examples above illustrate this and how we can use the post_init_command script instead. I mean, maybe there is a way, but this is the only method that ended up working properly in our case. Another example is: you have to include Identity Store information, which is not exported if you use a local identity store.
When containers fail anywhere in the process of starting and initializing, it’s a good idea to clean them up with docker rm <container_id>.

Common mistakes:

Forgetting to clean up our persistent data directory.

If we use a persistent local data directory and forget to empty it before re-attempting to run a container, we’ll see an error message along the lines of "Error: Hostname is required to be same with the previous run".
To clean up the data directory after a run, simply use sudo rm -r /var/opt/tableau/tableau_server/

At some point we got this problem (in the run-tableau-server.log file):

Unable to map JSON in configuration file. Cannot deserialize instance of `java.util.ArrayList<java.lang.Object>` out of VALUE_STRING token at [Source: (File); line: 7, column: 23] (through reference chain: com.tableausoftware.tabadmin.webapp.viewmodels.TsmRequest["configEntities"]->java.util.LinkedHashMap["gatewaySettings"]->com.tableausoftware.tabadmin.webapp.viewmodels.GatewaySettingsType["trustedHosts"])
Turns out that this was the format of this specific “trustedHosts” parameter, which should be:
["hostname"] or ["hostname1", "hostname2"] if there are several hosts to add.

One other common mistake may be related to newline characters and moving files between your Windows machine and the Linux environment we’re setting up the server in. See this anecdote if this is news to you. To avoid this, it might be a good idea to start by creating the files in the Linux environment and using your favourite text editor to paste in the contents (rather than transferring the full file).
Using relative paths in the pre_init_command and post_init_command scripts will not work. One might assume that the scripts will be run from the “context” of the customer-files directory, but this is not the case. Instead, we will want to use absolute paths to /docker/customer-files and its contents.

Final few tips:

To read logs more easily, add yourself to the polkitd group which is the owner of the persistent logs. Use sudo usermod -a -G polkitd $USER and you should be set.

Do not forget the configure-container-host script.
If you are stuck after all, don't hesitate to reach out and let us know; perhaps we can help!

- Timothy

Empower your organization with intuitive analytics

Tableau is designed to put the user first because data analysis should be about asking questions and not about learning software. With built-in visual best practices, Tableau enables limitless visual data exploration without interrupting the flow of analysis.

As the market-leading choice for modern business intelligence, the Tableau platform is known for taking any kind of data from almost any system and turning it into actionable insights with speed and ease. It’s as simple as dragging and dropping.

We are a full-stack provider and integrator, relying on extensive experience and best practices to find your unique optimal set-up allowing you to tell the data stories you are eager to tell.

Author

Timothy Vermeiren

Analytics Domain Lead at Biztory, and DataDev Ambassador. Tableau Iron Viz 2018 Global Champion, Tableau Ambassador 2020-2021, Tableau Certified Consultant & Architect. He runs in his spare time & plays various musical instruments.