Staging Data Transformation with Notebooks

Follow this guide to create staging notebooks required for your model.

🚧

Data Acquisition

This guide assumes that you have already finished the Data Acquisition process. Follow Data Acquisition guide for more details if necessary.

📘

What are Staging Notebooks?

Staging notebooks are utilized to produce staging data essential for modeling, sourced from the silver layer of the Medallion architecture.

In order to effectively prepare staging data for modeling purposes, dedicated notebooks are required for each entity encompassed within the model. These entities consist of both dimensions and facts, necessitating separate notebooks to cater to their distinct data processing requirements. These source query notebooks offer a structured framework for interacting with different data entities within the model.

Navigate to the folder /Workspace/DmBuild in the databricks instance. Check if the source query notebooks are available for your dimensions and facts in the following path /Workspace/DmBuild/Dimensions and /Workspace/DmBuild/Facts respectively.

If notebooks are not available, you'll need to generate a separate notebook for each entity in the model. You can utilize the source query template named SourceQueryTemplate as a starting point to create staging notebooks, located within the Dimensions and Facts folders.