20 October, 2022
min reading time
Chris is a junior data engineer working with Biztory for the past 2 years as an analytics consultant. He enjoys solving niche problems and building things in Python. He is very passionate about the environment and sustainability.
The YAML standard is a serialised language which is highly effective for storing configuration profiles, perfect for what we’re doing here! We won’t go over the YAML standard here but there are plenty of resources online if you’re starting from scratch
Also - the YAML files we’ll be using are mostly pre-built for you when you install the dbt package to your environment and just require some light amendments to fit your project. The key notation we’ll be using is:
Key: Value
Key:
nested value
- also a nested value, but use hypens or no hypens consistently
If you want to see how this fits into the bigger picture of deploying a dbt project, from scratch, we are just about to release our own public training course where we’ll help you through the set-up of your first dbt project! Click here if you’re interested in becoming a dbt hero!
We’re assuming here you’ve got a dbt project set up already and have associated pre-requisites:
Incredibly well named, your profiles.yml file is just that - a yaml file specifying the profiles you’ll use to connect to your data. Stored in your .dbt folder for security purposes - you don’t want this file deployed with your git pushes as it can contain sensitive information about your deployment such as login credentials. The basic structure of your profiles file is as follows:
A quick walkthrough of what you’re looking at here:
Below is an example of what a typical snowflake profile might look like:
Now you’ve got your datasource configuration - you can set up your project.
Then, for the dbt_project YAML file. The dbt_project.yml tells dbt key information about the way you’ve structured your project and where to find the resources you’re going to be referring back to but we have some additional configuariton we can do here for specifying/overiding dbt default run settings. We’ll take a brief look at some of those too.
A basic dbt_project might look something like this:
As we mentioned before, it’s mostly pre-built for you when you initialise your dbt project. The key things we’re going to want to change here, assuming you work with the default project structure are:
That’s it, that’s our basic set-up done - easy as that. But dbt also ships with a heap of optional extras that we can plug into our dbt_project if we want to. Other things we might want to include in our project might be tests, materialisation types or overriding datatypes that dbt is parsing.
In the above example, we’re using some out-of-the-box dbt testing to ensure that our primary key is always populated and unique.
In the below example we’re overriding the data types for our seed that dbt is ingesting to ensure they’re being parsed correctly.
Or the optional quoting which is handy for when your queries fail because you may be using identifiers that match reserved words etc. Note that for Snowflake this is false by default.
One final top tip to notice and remember when using the dbt YAML files is the use of the + notation. The plus prefix is used to disambiguate resource paths and configs. Consider the following dastardly example where my wicked colleague (who evidently just wants to see the world burn) is using tags as a resource path and configuring the tabs configuration item. In this case, the use of + denotes a configuration item (since dbt v0.17.0).
Chris is a junior data engineer working with Biztory for the past 2 years as an analytics consultant. He enjoys solving niche problems and building things in Python. He is very passionate about the environment and sustainability.
Read more articles of this author