DBT Meets Mage: Powering Advanced Data Transformations

First published on September 11, 2023

Last updated at September 27, 2023

 

16 minute read

Guest post by Shashank MishraData Engineer @ Expedia

TLDR

When 

, a renowned data transformation framework, converges with 

, it amplifies ETL processes to new heights. This article explores this powerful synergy which facilitates enhanced data modeling, giving businesses a competitive edge. By tapping into this combo, data professionals can supercharge their operations and drive data precision to the forefront.

Outline

  • Introduction to Mage

  • Transform DBT with Mage

  • Set up DBT models & orchestrate DBT runs

  • Conclusion

Introduction to Mage

In an age where data is the new oil, efficient and reliable data management tools are essential. 

is a platform committed to simplifying data integration and analytics. Designed for seamless data transformation and loading, Mage is transforming how businesses approach data management. Here are its key features: 

  • Automated Data Pipeline

    : Mage automates data extraction, transformation, and loading (ETL) processes. It can extract data from multiple sources, transform it to a desirable format, and load it into a data warehouse.

  • Data Connectors

    : Mage offers various data connectors to widely-used data sources like Shopify, Facebook Ads, Google Ads, Google Analytics, etc. This makes it easier to import data from these platforms.

  • Easy Integration

    : Mage provides easy integration with popular data warehouses including Google BigQuery, Amazon Redshift, and Snowflake.

  • Pre-built SQL Models

    : Mage comes with pre-built SQL models for popular e-commerce platforms like Shopify and WooCommerce. These models simplify the process of data analysis.

  • Incremental Loading

    : Mage supports incremental loading, which means only new or updated data is loaded into the data warehouse. This saves storage space and improves efficiency.

  • Data Transformations

    : Mage performs automatic data transformations, converting raw data into a more usable format. This process makes the data ready for analysis and reporting.

  • Scheduled Refresh

    : Data refreshes can be scheduled in Mage, ensuring that the data in the warehouse is always up-to-date.

  • Data Security

    : Mage places a high emphasis on data security, ensuring data privacy and compliance with GDPR and other data protection regulations.

(Source: 

)

Transform DBT with Mage

By integrating 

 with 

, data transformation reaches new heights of efficiency and precision. This synergy unlocks unparalleled ETL capabilities, setting a new standard in data processing.

  • Overview

    - Build, run, and manage your dbt models with Mage.

  • Analytics engineers

    - With Mage and dbt combined, you can expand, build and run complex data pipelines.

  • Data engineers

    - Simplify your data infrastructure and reduce the amount of redundant tools in your “modern” data stack. Replace dbt cloud and Airflow with Mage’s native integration with dbt.

  • Features

    • Schedule dbt model runs

      - Trigger your dbt model runs on a regular schedule, from an event, or via API request.

    • Run specific dbt models and their dependencies

    • Run all models and optionally exclude others

    • DBT models can depend on non-dbt related tasks, Examples:

      •  Build model after data ingestion from API

      • Build model after another pipeline completes

    • Preview dbt model results as you write SQL

    • Build dynamic dbt pipelines using flexible variable interpolation, Examples:

      • {{ env_var('...') }}

      • {{ variables('...') }}

      • {{ mage_secret_var('...') }}

    • Automatically run dbt tests every time a pipeline runs

      - Write checks using dbt tests, then Mage will run them and fail the pipeline if the test results produce any failures.

    • Observability built-in

      - Monitor your dbt pipelines and get alerted when things break.

  • Supported connectors

    • dbt-bigquery

    • dbt-core

    • dbt-postgres

    • dbt-redshift

    • dbt-snowflake

    • dbt-sqlserver

    • dbt-trino

    • dbt-spark

(Source: 

)

Set up DBT models & orchestrate DBT runs

Configuring 

models paves the way for robust data structures, while orchestrating DBT runs ensures seamless, automated workflows in data transformation.

1. Set up new Mage project

a. Read the 

to initialize a new project and start the Mage tool locally.

b. For the rest of this tutorial, we’ll use the project name 

demo_project

.

2. Set up dbt project

a. Create dbt project

i. Open Mage and go to the terminal page: 

http://localhost:6789/terminal

ii. Initiate your dbt project using the init command (for this tutorial, we’ll use the dbt project name demo):

1
2
3
cd demo_project/dbt
dbt init demo -s
touch demo/profiles.yml

For more information on creating a dbt project, read their 

3. Create standard pipeline

a. Go to the Mage dashboard and click the button 

+ New pipeline

and select the option labeled 

Standard (batch)

.

b. Click the 

Pipeline settings

icon in the left pane, and change its name to 

dbt demo pipeline

, then click the

Save pipeline settings

button.

4. Create dbt profile for database connections

a. On the left side of the page in the file browser, expand the folder 

demo_project/dbt/demo/

b. Click the file named 

profiles.yml

c. Paste the following credentials in that file:

1
2
3
4
5
6
7
8
9
10
11
demo:
  target: dev
  outputs:
    dev:
      dbname: xyleviup
      host: queenie.db.elephantsql.com
      password: edSrMWH7Mww-lTKpp-jPHX9sYSNLy7LG
      port: 5432
      schema: dbt_demo
      type: postgres
      user: xyleviup

d. Save the 

profiles.yml

file by pressing 

Command (⌘) + S

.

e. Close the file by pressing the 

X

button on the right side of the file name 

dbt/demo/profiles.yml

5. Add data loader block to pipeline

a. Click the 

+ Data loader

button, select 

Python

, then click 

API

.

b. In the popup dialog 

Data loader block name

, change its name to 

load data

, then click the 

Save and add block

button.

c. Paste the following code in that block:

1
2
3
4
5
6
7
8
9
10
11
12
13
import io
import pandas as pd
import requests
from pandas import DataFrame

@data_loader
def load_data_from_api(**kwargs) -> DataFrame:
    url = 'https://raw.githubusercontent.com/mage-ai/datasets/master/restaurant_user_transactions.csv'

    response = requests.get(url)
    df = pd.read_csv(io.StringIO(response.text), sep=',')
    df.columns = ['_'.join(col.split(' ')) for col in df.columns]
   			    return df

6. Add dbt model block to pipeline

a. Under the data loader block you just added, click the button 

dbt model

, then click the option 

Single model

.

b. In the file browser that pops up, click the file named 

my_second_dbt_model.sql

under the folders 

demo/models/example/

i. This will add 2 dbt blocks to your pipeline: 1 for the dbt model named 

my_first_dbt_model

and the 2nd for the dbt model named 

my_second_dbt_model

.

ii. The model named 

my_first_dbt_model

was added to the pipeline because 

my_second_dbt_model

references it.

7. Edit dbt model my_first_dbt_model 

a. In the dbt block named 

my_first_dbt_model

, next to the label 

Target

at the top, choose 

dev

in the dropdown list. You can also check 

Manually enter

 

target

, and enter 

dev

in the input field.

b. Paste the following SQL into the dbt model named 

my_first_dbt_model

:

1
2
3
4
5
6
7
8
WITH source_data AS (
    SELECT 1 AS id
    UNION ALL
    SELECT 2 AS id
)

SELECT *
FROM source_data

c. Run the dbt model block by pressing the play button on the top right of the block or by pressing 

Command

 

+

 

Enter

.

d. You should see a preview of the query execution logs. To see the query results, click the 

Expand table

link at the bottom right corner.

e. After previewing the results, in the top right corner of the block, click on the triple dots to reveal a dropdown menu.

f. Under the dropdown menu, click the option Run model. This command will execute the 

dbt run

command and create the table in your data source.

8. Edit dbt model my_second_dbt_model

a. In the dbt block named 

my_second_dbt_model

, next to the label Target at the top, choose 

dev

in the dropdown list. You can also check 

Manually enter target

, and enter 

dev

in the input field.

b. Paste the following SQL into the dbt model named 

my_second_dbt_model

:

1
2
3
4
5
6
7
8
9
SELECT
    a.*
    , b.*
FROM {{ ref('my_first_dbt_model') }} AS a

LEFT JOIN {{ source('mage_demo', 'dbt_demo_pipeline_load_data') }} AS b
ON 1 = 1

WHERE a.id = 1

When a dbt model depends on an upstream block that isn’t a dbt model, a source for that block is automatically added to the 

demo_project/dbt/demo/models/example/mage_sources.yml

file.

Read more about dbt sources in their 

.

c. Run the dbt model block by pressing the play button on the top right of the block or by pressing

Command + Enter

.

d. You should see a preview of the query execution logs. To see the query results, click the 

Expand table

link at the bottom right corner.

9. Add test for dbt model

a. On the right side of the screen, click the tab labeled 

Terminal

.

b. Create a new dbt test file by running the following command:

1
touch demo_project/dbt/demo/tests/test_my_second_dbt_model.sql

c. On the left side of the page in the file browser, expand the folder 

demo_project/dbt/demo/tests/

and click the file named 

test_my_second_dbt_model.sql.

If you don’t see it, refresh the page.

d. Paste the following SQL in the file:

1
2
3
4
SELECT id
FROM {{ ref('my_second_dbt_model') }}
GROUP BY id
HAVING (id = 0)

e. Read more about dbt tests in their 

.

10. Execute pipeline end-to-end

a. Click the name of the pipeline in the header breadcrumbs to go back to the detail page.

b. Create a new trigger with a type 

Schedule

and a Frequency 

once

. For more details, follow these 

.

c. After your trigger is created, click the 

Start trigger

button at the top of the page.

d. The pipeline will eventually fail because a dbt test failed. This means everything is working as expected.

e. Open the file 

demo_project/dbt/demo/models/example/schema.yml

and remove the tests named 

unique

under both models. Your file should look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
version: 2

models:
  - name: my_first_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - not_null

  - name: my_second_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - not_null

f. Click on the 

Failed

button next to the pipeline run and click 

Retry run

. It should complete running successfully after a few minutes.

Congratulations

! You’ve created a data pipeline that orchestrates your dbt models.

(Source: 

)

Conclusion

In the evolving landscape of data processing, the fusion of 

and 

signifies a transformative leap. By integrating DBT's modeling prowess with Mage's advanced ETL capabilities, businesses are now better equipped to harness the full potential of their data. As we've explored, from setup to orchestration, this synergy not only streamlines operations but also enhances the precision of data outcomes. As data continues to be an invaluable asset, leveraging such powerful tools becomes imperative for any data-driven enterprise.