Monday, November 24, 2025

Tests in DBT

 

Understanding Tests in DBT

Testing is one of the most important features of dbt. It helps ensure that your data models are correct, reliable, and ready for downstream analytics. In dbt, tests are written using SQL and YAML, and they run automatically as part of your pipeline. This article explains the two main types of tests in dbt: singular tests and generic tests. We will also look at how to create reusable macros and how to configure tests using YAML files.


What Are Tests in dbt?

dbt tests are small SQL queries that check the quality of your data. Tests return either:

  • Pass: when the query returns zero rows

  • Fail: when the query returns one or more rows

A failing test indicates that something is wrong with the data and needs attention.


1. Singular Tests in dbt

A singular test is the simplest type of test in dbt. It is a SQL file that contains a query designed to catch bad data.

How a Singular Test Works

  • You write a SQL query.

  • The query should return only the rows that violate your expectation.

  • If the query returns rows, the test fails.

Example: Check for Negative Values

Suppose you want to ensure that the column amount never contains negative values.

Create a file inside:

tests/ test_negative_amounts.sql

Inside this file:

select * from {{ ref('sales') }} where amount < 0

If any row in the sales model has a negative amount, this test will fail.

When To Use Singular Tests

  • When the logic is complex

  • When the condition is unique to a specific model

  • When the validation cannot be generalized


2. Generic Tests in dbt

A generic test is a reusable test that can be applied to multiple models and multiple columns. Instead of writing SQL again and again, you create a macro once and use it everywhere.

Why Use Generic Tests?

  • Saves time

  • Reduces repeated code

  • Ensures consistent testing across the project


3. Creating a Generic Test Macro

Generic tests are stored inside:

macros/tests/

Example Macro: Check Non-Negative Values

Create a file:

macros/tests/non_negative.sql

Add the following:

{% test non_negative(model, column_name) %} select * from {{ model }} where {{ column_name }} < 0 {% endtest %}

Explanation:

  • test non_negative is the name of the test.

  • model and column_name are variables.

  • The SQL returns rows where the condition is violated.


4. Using YAML to Apply Generic Tests to Models

Generic tests are applied in the YAML file of your model.
dbt looks for a schema.yml file inside the model directory.

Example:

models/sales/schema.yml

Inside the YAML file:

version: 2 models: - name: sales columns: - name: amount tests: - non_negative

This applies the generic non_negative test to the amount column.


5. Passing Variables to Generic Tests

You can also create tests where additional conditions are passed as variables.

Example: Test Threshold

Suppose you want to create a test that checks whether a value exceeds a threshold.

Macro:

{% test value_above_threshold(model, column_name, threshold) %} select * from {{ model }} where {{ column_name }} > {{ threshold }} {% endtest %}

YAML:

version: 2 models: - name: sales tests: - value_above_threshold: column_name: score threshold: 90

This allows the same macro to be used with different threshold values on different models.

6. Benefits of Using YAML for dbt Tests

Using a YAML file helps in:

  • Clear organization of tests

  • Easier maintenance

  • Version-controlled configuration

  • Quick visibility of all tests related to a model

YAML also ensures that tests stay close to the model definitions, making the project easier to read and understand.


7. Summary

dbt testing is powerful yet simple. Here is a quick summary:

Singular Tests

  • Written as SQL queries

  • Best for complex or model-specific checks

  • Fail if the query returns rows

Generic Tests

  • Created using macros

  • Reusable across multiple models and columns

  • Configured using YAML

  • Accept variables for flexibility

Using both types effectively makes your dbt project more reliable and maintainable.


Content assisted by ChatGPT

No comments:

Post a Comment

Generator Expression vs List Comprehension in Python

 When handling large datasets in Python, both performance and memory usage are key concerns. Python offers two powerful tools for creating s...