Automation in dbt for large-scale operations from Coalesce 2023

Benoit Perigaud, Staff Analytics Engineer at dbt Labs, explains how to avoid chaos when scaling dbt projects.

"This automation can really scale. Five minutes for one project. I'm pretty sure that if I was to create 20 or 30 projects, it would take me maybe 30 minutes."

Benoit Perigaud, Staff Analytics Engineer at dbt Labs, discusses leveraging automation and how to avoid chaos when scaling dbt projects. Benoit discusses the use of Terraform, an infrastructure automation tool, to provision and manage resources in any cloud or data center. He also demonstrates how to use Terraform to create a new dbt project in just five minutes.

The dbt adoption journey

Benoit gave a brief overview of the dbt adoption journey, which starts with setting up a dbt project. He explains that you need a Git repository to store your code, a way to run your code (whether through dbt Cloud or dbt Core), a way to run your dbt jobs (either manually or through a deployment environment), and a data warehouse where the transformed data will be stored.

"If you use dbt Cloud, you would set up a dbt Cloud project and then a development environment," he says. "If you use dbt Core, you might be using Docker...Then we also need to set up some deployment and orchestration for our dbt jobs...dbt is nothing without the data warehouse, so when we set up a dbt project we need to make sure that the data warehouse is set up properly with roles, schemas, databases, and so on."

Different approaches to scaling dbt

Benoit discusses several different approaches to scaling dbt, starting with what he calls the "silo approach," in which each team sets up its own dbt project. While this can work initially, he warns that it can quickly lead to chaos as each team might not follow the same best practices. This leads to inconsistency and potential confusion.

"The problem is that everybody will be doing things potentially differently. Nobody will follow the same best practices," he states. "So, from the Git side of things, for example, we won't be using the same PR templates in the org. On the dbt side, we might not be using the same packages. So, it's really a mixed bag of way of working."

Infrastructure as code (IAC) tools, templates, and continuous Integration/continuous delivery (CI/CD)

"Any tool that has some APIs can be integrated with Terraform."

To scale dbt projects more efficiently, Benoit suggests using Infrastructure as code (IAC) tools, templates, and continuous integration/continuous delivery (CI/CD). He particularly highlights the value of Terraform, an IAC tool that can be used to define and provide data center infrastructure.

"Terraform is an infrastructure automation tool to provision and manage resources in any cloud or data center," he says. "It means that we declare our configuration in some files, and then Terraform translates that into calls to the API." This approach, he adds, can help take the complexity out of configuring cloud applications.

Benoit also highlights the use of templates, such as Cookiecutter and Craft, to create and manage dbt projects as well as the use of CI/CD to automate the testing and deployment of code. "We already use CI/CD today in dbt," he says. "We already do CI/CD to go between our feature branch and main."

Benoit’s key insights

  • Scaling dbt projects can be challenging, especially when each team starts from scratch, leading to different practices and potential chaos
  • Terraform, an infrastructure automation tool, can be used to provision and manage resources in any cloud or data center, making it useful for scaling dbt projects
  • Terraform works with most of the main Git providers and data warehouses, which means that a lot of the dbt project setup can be automated
  • Terraform uses a declarative approach, similar to dbt. Users declare their configuration in text files and Terraform translates this into API calls
  • Automation can significantly speed up the process of setting up dbt projects. In the demonstration, a new dbt project was created in just five minutes
Related Articles

Register for Coalesce 2024

Join us in-person or online for the largest analytics engineering conference. Level-up your skillset, expand your network, and build your path at Coalesce 2024.