Are you looking to get dbt certified by acing the dbt Analytics Engineering Certification Exam? Let me help you get started by sharing my experiences in a short blog in the hopes that it may help some of you who are thinking of taking the exam.

Let's get started!

null

1. Preparing for the dbt Analytics Engineering Certification Exam

Let me get started with admitting that, as with all good blog posts, I tried to make the title a little bit provocative… Yes I really had not heard from dbt before I started studying, but of course I did not actually start from zero in a broader data-ecosphere context.

In fact, the basis I started with when studying for the dbt analytics engineering certification was:

  • SQL: fluency (+/- 3 years of intensive use in my day to day job); including the use of multiple “flavours” (MySQL, Oracle, Redshift,...)
  • Python: +12 months of using this as an ETL tool but this got me introduced to most basic concepts of python. All in all I would say “junior” experience.
  • dbt: no prior experience or knowledge
  • git: I followed a 2 day GIT introduction course online before I started with dbt


Most importantly I should mention that I had several years of experience working in the role which dbt describes as the “Analytics Engineer”. As this is the role around which the whole of dbt functionality is centered; I found that most of the “problems” that dbt tries to “fix” felt very native to me and not at all abstract.

I always use the analogy of the lumberjack: for several years I was cutting down trees with my axe (=just executing .sql files manually from a workbench), and indeed I was pretty good at it! But then someone comes around and offers you a chainsaw (dbt)... Although it takes some weeks maybe to fully get the hang of how this thing works; you immediately understand how this tool will drastically change your day-to-day work for the better!

2. How to study for the dbt Analytics Engineering Certification?

With that out of the way, what I personally struggled with the most was the "blackbox" nature of the exam (it was only launched in 2022). No practice exams are available, and the "official" documentation dbt provides is pretty chaotic to be honest (docs, references, guides, blogs, tutorials,...).

Not to mention there are mountains of it and they cover insanely wide concepts, some of which seem to have nothing to do with dbt functionalities (e.g. SQL writing best-practices) but are of course crucial in becoming a true Analytics Engineer. And in the end that's what the exam tries to certify! 

All in all I spent 3-4 weeks (my eternal gratitude to my employer Biztory for letting me prepare full-time for this); roughly in the following order:

Phase 1: Started with the dbt fundamentals course.
Note that I really took my time for this (multiple days), as I assume the “5 hours” supposedly needed for the course covers only the length of the videos… I followed along with the official jaffle-shop dataset tutorial (which uses dbt Cloud and Snowflake) and then tried to make my own project in dbt Core (with a random dataset I found online and uploaded to snowflake) to make sure I thoroughly understood the basics.

Phase 2: Follow the Learning Path in the official study guide.
This takes you through all available courses, and also points you to relevant blogs, guides,… (This step took the longest!) FYI by this point I was already summarizing all relevant information from the courses into a word document (more on that later); and every subsequent step was summarized into it as well

Phase 3: Read through all available Docs.
Specifically looking for deeper and more nuanced information on the topics handled in the courses. This is also a good “recap” exercise to start consolidating the information from the courses.

Phase 4: Read and summarize a selection of relevant Guides:


Phase 5: Read a selection of relevant blog posts:


Phase 6: Read (and study!) a selection of relevant Reference pages:


For the sixth and final point it is important to emphasize the study part, as the exam does not allow you to use anything other than what’s in your head! Broadly speaking you should be able to for example write a source.yml file by heart. Not all properties of course but the most relevant ones handled in the courses.

3. Studying... but now for real

So by the start of the fourth week this left me with a pretty big summary document to study. Note that I tried to categorize all information under the provided 8 topics on which, according to the dbt study guide, the exam focusses.

By this time I did go into full "study-like-you're-back-at-the-university" mode (bye bye Netflix binging evenings) and really tried to learn the information by heart as much as possible.

4. Taking the dbt Analytics Engineering Certification Exam

Now onto the main reason why you are here ofcourse: the Exam itself. Since a) I was hyper focused on passing the exam (and not so much trying to memorize the questions) and b) I assume it’s not allowed by dbt to share specific exam questions online; I will not be doing so.

What I can do is offer you a (non-limiting) list of generic topics on which I do remember questions being asked. This in the hope that at least it gives you some final aiming points of topics to review in those last days or even hours before you start your exam. 🙂

DISCLAIMER: The list of topics below is purely based on my own personal experience. Use it as a source of guidance, but not as your single source of truth to pass the dbt Analytics Engineering Certification Exam.


dbt Analytics Engineering Certification topics to remember list: 👇

Topic

Comment

the 5 types of materializations

Including their use cases and how to build them.

Jinja basic concepts

Delimiters, variables, IF-THEN statements, FOR loops, macros.

Hooks

The 4 types of hooks and when/where to implement them.

SQL

As a programming language. This is tricky because it has little to do with dbt functionality but I remember specific questions on e.g. window functions, CTE’s, joins, unions,...

the job of Analytics Engineer

the 3 main roles in a data-ecosphere, and more specifically the main components of an Analytics Engineer’s job.

legacy DML SQL commands

How to transform them into dbt functionality when migrating a legacy SQL script into dbt.

dbt CLI commands

the main dbt CLI commands and how they interact; e.g.: “does dbt build also test source freshness?” (answer is no)

Node selection syntax and Graph/set operators

These include some crazy examples y’all!

the “description:” YML property 

How to use and “upgrade” this property

  • using “>”
  • using “|”
  • .md file + Jinja docs block

the DAG

  • why it could error (cyclical reference for example)
  • what to do when a node errors and you want the most efficient re-run possible (including the effect of fail fast on this re-run)
  • implementing ref and source functions to create modularity

Modeling naming conventions

When/how to use them (staging > intermediate > marts with fact/dim tables …).

Target functionality

the use of target throughout your project, including its use as a variable within a CLI command or Jinja statement (target.name for example).

Configurations

  • the nesting of source > table > column config properties
  • some common configurations (*see below)
  • 3 locations you can implement configs
    • the corresponding resource key in dbt_project.yml
    • a config property in a resource YML file
    • a config() jinja macro within a model
    • Note that there was even a question on a fourth method of which I have not heard to this day: applying a configuration within a CLI command… So can’t help you here (:

dbt debugging and troubleshooting

Several questions on examples of the different errors you might encounter, as well as what would be the solution. Also specifically the (improper) use of the Profiles.yml file and how this could result in errors.

dbt within a modern data stack

Should you use dbt seed as a loader tool within an ELT set-up?

generic and singular tests

Including the exact configurations and properties needed to build them.

database/schema names

using custom database/schema names (instead of the target defaults), and how dbt handles them (concatenation)

Jobs and Environments

These concepts actually received quite a bit of questions, which is strange considering they are dbt Cloud features and dbt has stated several times that the exam is not specifically geared towards Core or Cloud respectively. So therefore expect questions on the abstract concepts and best practices, rather than on the specific implementations within dbt Cloud.

Environment variables

The order of preference of your used environment variables (env_var) when they are present throughout the project.

state and deferring 

Reviewed in the dbt course “Advanced Deployment with dbt Cloud”

git commands, workflows and strategies

  • Continuous deployment vs continuous delivery
  • (slim) CI and webhooks
  • Clone vs Fork strategies

Row vs. Column-based storage

What is the difference? Although I still don’t understand how that ended up in the exam.



* some common configurations:

 

YAML key

Value description

name

Your project’s name in snake case (space = _; first letters = lowercase).

(config-)version

Version of your project (syntax).

require-dbt-version

Restrict your project to only work with a range of dbt Core versions.

profile

The profile dbt uses to connect to your data platform.

*-paths

Directories to where your [* ]files live.

clean-targets

List of directories to be removed by dbt clean; best used for artifacts only.

query-comment

A string to inject as a comment in each query that dbt runs against your database

  • can also call a macro that returns a string
  • default = JSON with dbt info
  • “null” to disable

quoting

Whether dbt should quote databases, schemas, and identifiers (default = true except Snowflake).

[hooks]

On-run-start and on-run-end hooks can also call macros that return SQL statements.

vars

Project variables you want to use for data compilation.

dispatch

Optionally override the dispatch search locations for macros in certain namespaces (default = root project first).

[nodes]

“Nested” configs can be added under 

   [node]: key:value

 

A final tip: metrics, exposures, and python models are brand new features so as of january 2023 they were not included in the exam. Check the dbt community forums to see if this is still the case!

Good luck!

Author
Michiel Smulders

Michiel Smulders

Data Engineer at Biztory

Read more articles of this author
Let's discuss your data challenges

Join our community of data enthusiasts

Get industry insights, expert tips and Biztory news sent straight to your inbox with our monthly newsletter.