3 April, 2023
min reading time
Data Engineer at Biztory
Are you looking to get dbt certified by acing the dbt Analytics Engineering Certification Exam? Let me help you get started by sharing my experiences in a short blog in the hopes that it may help some of you who are thinking of taking the exam.
Let's get started!
Let me get started with admitting that, as with all good blog posts, I tried to make the title a little bit provocative… Yes I really had not heard from dbt before I started studying, but of course I did not actually start from zero in a broader data-ecosphere context.
In fact, the basis I started with when studying for the dbt analytics engineering certification was:
Most importantly I should mention that I had several years of experience working in the role which dbt describes as the “Analytics Engineer”. As this is the role around which the whole of dbt functionality is centered; I found that most of the “problems” that dbt tries to “fix” felt very native to me and not at all abstract.
I always use the analogy of the lumberjack: for several years I was cutting down trees with my axe (=just executing .sql files manually from a workbench), and indeed I was pretty good at it! But then someone comes around and offers you a chainsaw (dbt)... Although it takes some weeks maybe to fully get the hang of how this thing works; you immediately understand how this tool will drastically change your day-to-day work for the better!
With that out of the way, what I personally struggled with the most was the "blackbox" nature of the exam (it was only launched in 2022). No practice exams are available, and the "official" documentation dbt provides is pretty chaotic to be honest (docs, references, guides, blogs, tutorials,...).
Not to mention there are mountains of it and they cover insanely wide concepts, some of which seem to have nothing to do with dbt functionalities (e.g. SQL writing best-practices) but are of course crucial in becoming a true Analytics Engineer. And in the end that's what the exam tries to certify!
All in all I spent 3-4 weeks (my eternal gratitude to my employer Biztory for letting me prepare full-time for this); roughly in the following order:
Phase 1: Started with the dbt fundamentals course.
Note that I really took my time for this (multiple days), as I assume the “5 hours” supposedly needed for the course covers only the length of the videos… I followed along with the official jaffle-shop dataset tutorial (which uses dbt Cloud and Snowflake) and then tried to make my own project in dbt Core (with a random dataset I found online and uploaded to snowflake) to make sure I thoroughly understood the basics.
Phase 2: Follow the Learning Path in the official study guide.
This takes you through all available courses, and also points you to relevant blogs, guides,… (This step took the longest!) FYI by this point I was already summarizing all relevant information from the courses into a word document (more on that later); and every subsequent step was summarized into it as well
Phase 3: Read through all available Docs.
Specifically looking for deeper and more nuanced information on the topics handled in the courses. This is also a good “recap” exercise to start consolidating the information from the courses.
Phase 4: Read and summarize a selection of relevant Guides:
Phase 5: Read a selection of relevant blog posts:
Phase 6: Read (and study!) a selection of relevant Reference pages:
For the sixth and final point it is important to emphasize the study part, as the exam does not allow you to use anything other than what’s in your head! Broadly speaking you should be able to for example write a source.yml file by heart. Not all properties of course but the most relevant ones handled in the courses.
So by the start of the fourth week this left me with a pretty big summary document to study. Note that I tried to categorize all information under the provided 8 topics on which, according to the dbt study guide, the exam focusses.
By this time I did go into full "study-like-you're-back-at-the-university" mode (bye bye Netflix binging evenings) and really tried to learn the information by heart as much as possible.
Now onto the main reason why you are here ofcourse: the Exam itself. Since a) I was hyper focused on passing the exam (and not so much trying to memorize the questions) and b) I assume it’s not allowed by dbt to share specific exam questions online; I will not be doing so.
What I can do is offer you a (non-limiting) list of generic topics on which I do remember questions being asked. This in the hope that at least it gives you some final aiming points of topics to review in those last days or even hours before you start your exam. 🙂
DISCLAIMER: The list of topics below is purely based on my own personal experience. Use it as a source of guidance, but not as your single source of truth to pass the dbt Analytics Engineering Certification Exam.
dbt Analytics Engineering Certification topics to remember list: 👇
Topic |
Comment |
the 5 types of materializations |
Including their use cases and how to build them. |
Jinja basic concepts |
Delimiters, variables, IF-THEN statements, FOR loops, macros. |
Hooks |
The 4 types of hooks and when/where to implement them. |
SQL |
As a programming language. This is tricky because it has little to do with dbt functionality but I remember specific questions on e.g. window functions, CTE’s, joins, unions,... |
the job of Analytics Engineer |
the 3 main roles in a data-ecosphere, and more specifically the main components of an Analytics Engineer’s job. |
legacy DML SQL commands |
How to transform them into dbt functionality when migrating a legacy SQL script into dbt. |
dbt CLI commands |
the main dbt CLI commands and how they interact; e.g.: “does dbt build also test source freshness?” (answer is no) |
Node selection syntax and Graph/set operators |
These include some crazy examples y’all! |
the “description:” YML property |
How to use and “upgrade” this property
|
the DAG |
|
Modeling naming conventions |
When/how to use them (staging > intermediate > marts with fact/dim tables …). |
Target functionality |
the use of target throughout your project, including its use as a variable within a CLI command or Jinja statement (target.name for example). |
Configurations |
|
dbt debugging and troubleshooting |
Several questions on examples of the different errors you might encounter, as well as what would be the solution. Also specifically the (improper) use of the Profiles.yml file and how this could result in errors. |
dbt within a modern data stack |
Should you use dbt seed as a loader tool within an ELT set-up? |
generic and singular tests |
Including the exact configurations and properties needed to build them. |
database/schema names |
using custom database/schema names (instead of the target defaults), and how dbt handles them (concatenation) |
Jobs and Environments |
These concepts actually received quite a bit of questions, which is strange considering they are dbt Cloud features and dbt has stated several times that the exam is not specifically geared towards Core or Cloud respectively. So therefore expect questions on the abstract concepts and best practices, rather than on the specific implementations within dbt Cloud. |
Environment variables |
The order of preference of your used environment variables (env_var) when they are present throughout the project. |
state and deferring |
Reviewed in the dbt course “Advanced Deployment with dbt Cloud” |
git commands, workflows and strategies |
|
Row vs. Column-based storage |
What is the difference? Although I still don’t understand how that ended up in the exam. |
* some common configurations:
YAML key |
Value description |
name |
Your project’s name in snake case (space = _; first letters = lowercase). |
(config-)version |
Version of your project (syntax). |
require-dbt-version |
Restrict your project to only work with a range of dbt Core versions. |
profile |
The profile dbt uses to connect to your data platform. |
*-paths |
Directories to where your [* ]files live. |
clean-targets |
List of directories to be removed by dbt clean; best used for artifacts only. |
query-comment |
A string to inject as a comment in each query that dbt runs against your database
|
quoting |
Whether dbt should quote databases, schemas, and identifiers (default = true except Snowflake). |
[hooks] |
On-run-start and on-run-end hooks can also call macros that return SQL statements. |
vars |
Project variables you want to use for data compilation. |
dispatch |
Optionally override the dispatch search locations for macros in certain namespaces (default = root project first). |
[nodes] |
“Nested” configs can be added under [node]: key:value |
A final tip: metrics, exposures, and python models are brand new features so as of january 2023 they were not included in the exam. Check the dbt community forums to see if this is still the case!
Good luck!