The life of Pi when debugging DBT-die

DBT, or Data Build Tool, has become a popular choice for data teams, but it’s not the right fit for ?any? situation. While it offers a structured approach to data transformation, organizations will benefit from sticking with a simpler, more direct method: SQL transformations. Hahahahaha.

The die is cast – DBT

DBT introduces a new layer of complexity to your data stack. You’ll need to learn YAML, a configuration language, and master DBT’s specific commands and project structure. This adds to the learning curve for new team members and can be a barrier for those who are already proficient in SQL. A DBT project requires ongoing maintenance—managing dependencies, versioning, and running tests—all of which are time-consuming. Furthermore, the tool is an unnecessary layer of abstraction.

Even with non-simple transformations, using a tool like DBT introduces more complexity than it solves, and simple SQL transformations remain a better choice for all teams. The added layers of configuration, specific project structure, and the steep learning curve of DBT’s templating language, Jinja, can create unnecessary overhead and technical debt.

“The complexity of the setup where different teams work on tables that are the basis of the follow-up tables… is simply a hard thing to maintain,” notes a post on the Dev Genius blog. The same post points out that this can lead to “overblown dbt setups with hundreds and thousands of models” that require a significant human effort to manage, a phenomenon one person referred to as “human middleware.”

One of DBT’s core selling points is that it empowers data analysts to build data pipelines. However, critics (me) will argue that this premise will backfire. Data analysts are not necessarily trained in software engineering best practices like code reviews, rigorous testing, and managing technical debt. They also likely know SQL better. As one user on Reddit’s r/dataengineering subreddit observed, “Data analysts aren’t engineers and so don’t have the culture of doing good PR reviews, pushing back on problematic changes, testing, technical debt, etc.” This can lead to a messy project with duplicated logic and outdated models, undermining the very benefits DBT is supposed to provide.

DBT’s use of Jinja macros, which are reusable code blocks, is a source of tremendous complexity and frustration. While macros are powerful, they can be “hard to read,” “hard to parse,” and “hard to test,” according to the Dev Genius blog. A bug in a single macro can ripple through the entire project, causing multiple downstream models to fail. This is a classic example of how an abstraction, while useful in some cases, can create a new class of problems that are difficult to debug and resolve.

Sorry – I was not able to find a single actual benefit of DBT, besides star-treky jargon (Sorry – Star Trek), like: reusability (hmm – wonder how you would reuse a procedure transforming a very specific set of data? Now tell me – would you? It’s like layers of abstraction. You use them once and never again lol), version control and collaboration (I could collaborate before there was version control + SQL can go via a commit to your repo too? no?), Directed Acyclic Graphs (this is in Klingon, I respectfully reject to translate), Data Quality and Testing (hello! You CAN test SQL!) and Documentation and Discoverability (DnD! I like this one – answer why not relevant yourself).

SQL is the universal language of data (aka almost everyone gets it, even the business analysts. And much faster than Jingago from DBT). By using simple SQL transformations, you can achieve the same results without the added complexity of a new tool. SQL scripts are straightforward, easy to read, and don’t require any special setup or configuration. You can run them directly in your database or using a simple orchestration tool like Airflow (disclaimer: not advocating it’s simple for every use case). This approach is highly flexible and gives you complete control over your data pipelines. You can write a single SQL query to create a table or a view, and it’s immediately ready for use.

Example

Let’s compare how you might perform a simple transformation.

DBT Approach:

Create a file: models/staging/stg_customers.sql
Write the SQL query:SQLselect id as customer_id, first_name, last_name from raw_customers
Configure the model: In dbt_project.yml, you’d need to define the model and its properties.

Simple SQL Approach:

Write and run a single SQL query:SQLCREATE TABLE staging.stg_customers AS SELECT id AS customer_id, first_name, last_name FROM raw.raw_customers;

This SQL code is self-contained and immediately understandable to anyone with SQL knowledge (which includes many “business” people today). It bypasses the need for a separate tool, configuration files, and a specific project structure.

Simple SQL transformations are faster to implement, easier to maintain, and require a lower barrier to entry. They allow you to focus on the core task of transforming your data, rather than on managing a complex tool. Ultimately, the best tool is the one that gets the job done efficiently and effectively, and for (likely) all data teams, that tool is often just plain old SQL.

Using well-structured queries can achieve much of the modularity of DBT models without introducing a separate framework. For example, a single SQL script can be used to perform complex joins and aggregations, making the entire process of extracting/transforming/inserting data transparent and easy to troubleshoot. While it might not have the “magic” of DBT’s dependency management, it also doesn’t suffer from the associated complexity and potential for technical debt. For all organizations, the trade-off is well worth it.

So – no – not buying any sort of marketing on DBT. Happy to be challenged. Mic drop.

Gustaw Fit Blog

The life of Pi when debugging DBT-die

The die is cast – DBT

Example

Leave a comment Cancel reply