Building a Robust Data Analytics with Microsoft Fabric and dbt

Rakesh Gupta
6 min readFeb 19, 2025

--

Introduction

In data engineering, ingesting data from multiple sources is just the beginning. The crucial next steps involve cleaning, performing data quality checks, and deriving valuable insights from the data.

This article focuses on leveraging Microsoft Fabric and dbt to build a robust data pipeline that processes ingested data into structured silver and gold layers, enabling seamless analytics with Power BI.

Background on Fabric Lakehouse

Microsoft Fabric Data Warehouse is a next-generation solution built on a distributed processing engine that offers high performance and scalability with minimal management overhead.

Residing within the data lake, it natively supports open formats and enables collaboration between engineers and business users while ensuring robust security.

Key features include:

  • Integration with Power BI for simplified analytics.
  • Data storage in Delta-Parquet format, ensuring ACID transactions and interoperability.
  • Support for cross-database queries, enabling fast insights without data duplication.

For an in-depth understanding, refer to Microsoft Fabric Warehouse.

Building the Fabric Data Pipeline

The best part about the Fabric Python environment is that it includes nearly all the features of a Spark notebook within a Python environment. This allows dynamic values to be passed from the pipeline orchestrator to the notebook, making CI/CD automation much easier.

In this setup, parameters such as Azure Key Vault name, dbt profile, and project locations are provided to the notebook via pipeline parameters.

1. Data Ingestion

The first step in the pipeline involves ingesting Northwind dataset via APIs and storing the raw data as JSONL files in the Fabric lakehouse using #dlt from dlthub.com.

Setting up ingestion with MS Fabric, dltHub, and Lakehouse | by Rakesh Gupta | Feb, 2025 | Medium

2. Creating Raw Delta Tables

Once ingested, the raw data is transformed into Delta tables for efficient querying and processing.

3. Developing Silver and Gold Layers with dbt in Microsoft Fabric

Using dbt within Microsoft Fabric, we refine the raw data into silver and gold layers, ensuring data quality and structuring it for analytical insights.

Setting Up dbt-core with Microsoft Fabric

To follow along, some familiarity with dbt is required. If new to dbt, refer to dbt documentation.

Configuring dbt

dbt requires setting up:

  • profiles.yml
  • dbt_project.yml

The dbt-fabric connector relies on ODBC Driver 18 for SQL Server (Driver 17 is also supported). Follow this guide for step-by-step configuration.

Authentication Methods

Microsoft Fabric supports:

  1. CLI Authentication (for local development)
  2. Service Principal Authentication (recommended for production setups)

For security best practices, store sensitive credentials (tenant_id, client_id, client_secret) in Azure KeyVault and access them via environment variables.

Run dbt debug to validate the setup.

Implementing Silver and Gold Layers

Silver Layer

The Silver Layer processes and refines data from the raw lakehouse layer. The key datasets used are:

  • raw__northwind__customers
  • raw__northwind__orders
  • raw__northwind__shippers

Best practices dictate defining these sources in a source.yml file for better reusability and management.

During this stage, you also perform data quality checks, replacing null values where necessary.

Default values should be validated with stakeholders to maintain data integrity.

Gold Layer

The Gold Layer aggregates and structures data for analytics, primarily using order data from the Silver Layer.

To facilitate time-based analytics, a dimdate table is created using dbt seeds. This dimension table is generated annually or at an appropriate frequency based on business requirements.

Consuming Fabric Warehouse in Power BI

Data in the Gold Layer is stored as Delta files in Fabric Warehouse. To build analytical reports:

  1. Establish relationships between fact and dimension tables in the Power BI dataset.
  2. Ensure that datasets and reports are managed separately for streamlined collaboration across teams.
  3. Use Power BI’s data slicing and filtering features to derive business insights efficiently.
  4. Use direct-query mode for the report

Let’s assume that stakeholders are after few important insights:

  • Total Freight Collected During Given Period
  • Total Orders During Given Period
  • Top 3 Customers by Shipments
  • Top 3 Shipment Cities
  • Shipments by Regions
  • Top 3 Countries by Shipments

Programmatic invocations

In v1.5, dbt-core added support for programmatic invocations. The intent is to expose the existing dbt Core CLI via a Python entry point, such that top-level commands are callable from within a Python script or application.

The entry point is a dbtRunner class, which allows you to invoke the same commands as on the CLI.

This is one of the notebooks needed to run your dbt project in Fabric, with the project itself hosted in the Fabric lakehouse. The location can be Azure Blob Storage or any other location accessible by Fabric.

Data Lineage Using dbt docs generate

What is Data Lineage? Data lineage refers to the tracking of data as it moves through various transformations within a data pipeline. It helps understand where data originates, how it is processed, and where it ends up. This visibility is crucial for debugging, auditing, and optimizing data workflows.

dbt docs generate command is used to create a documentation site for your dbt project. This includes data lineage graphs that visually represent relationships between models, sources, and downstream dependencies.

Steps to Generate Data Lineage in dbt

  1. Run dbt Docs Generation Execute the following command in your dbt project directory:
  2. Serve the Documentation To interact with the generated docs, run:
  3. Explore the Lineage Graph

Why is This Useful?

  • Debugging: Easily trace data issues back to their source.
  • Impact Analysis: Understand which models are affected by schema changes.
  • Collaboration: Teams can visualize dependencies without reading SQL code.

View run histories and monitoring pipeline runs

Run history and pipeline monitoring are seamlessly integrated into the Microsoft ecosystem, providing visibility into pipeline execution details.

You can click on a specific run to open a dialog box with detailed insights and links for further exploration.

Conclusion

By integrating dbt with Microsoft Fabric, we can automate and scale data processing from ingestion to analytics. Implementing Silver and Gold Layers ensures data quality and structured insights, while Power BI provides an intuitive interface for visualization and decision-making.

With a well-designed Fabric pipeline, organizations can leverage a robust data architecture for high-performance analytics and better data-driven decision-making.

How to use the github repo

All the necessary code can be found in the GitHub repository by weeks (week3): [https://github.com/sketchmyview/fabric-dlthub-series]

Import the pipeline and notebooks from their respective Pipeline and Notebook folders into the Fabric environment.

Unleash Your Creativity

Congratulations! You’ve successfully completed “Building a Robust Data Analytics with Microsoft Fabric and dbt”

If you’re eager to dive deeper, here are some self-assignments to further enhance your learning.

Feel free to share your findings with others along the way!

  • Manage the dimdate table using dbt models.
  • Assist stakeholders in identifying low-performing cities, countries, and customers, enabling them to focus efforts on driving growth in those areas

--

--

Rakesh Gupta
Rakesh Gupta

Written by Rakesh Gupta

Founder and IT Consultant, SketchMyView (www.sketchmyview.com). Reach me here: linkedin.com/in/grakeshk

No responses yet