Configuring Model Building
Follow this guide to set up entities, columns, and dependencies for model building using Advanced Options.
This process involves creating new Models, Entities, Entity Columns, and Entity Dependencies in the appropriate tabs in Advanced Options. If you want to learn more about these concepts, visit the pages below:
Staging Notebooks
This guide assumes you have already written the appropriate notebooks to stage your data transformations. Follow the Staging Data with Notebooks guide for more details if necessary.
Models
Navigate to the Advanced Options pane on the left side of your screen. On the top of Advanced Options, navigate to the Models tab.
What are models?
Models are the unit of measure for an atomic model building process. When the model building process is complete, the model is the final built list of gold-layer objects to be used for end processes, such as reporting, AI workflows, or uses.
This table displays a list of all models currently in your environment. You can view a model's name, description, and toggle it active or inactive. To create a new model, click on "NEW RECORD"
Fill out the fields for name and description. We recommend using a name that is descriptive of the model's end-purpose. Then click "CREATE" to create the new model.
You can see this new model exists in the Model tab.
Activate your Model!
Make sure to toggle the model ACTIVE in order to start using it. Inactive models will not be built by the Empower system.
Entities
What are Entities?
Entities are the gold-layer tables that are generated during the model building process. You can built any number of entities during a model build. Entities can be built using data from any bronze or silver table in your data estate. They can even be built using other gold-layer entities!
Know your Staging Notebook Locations!
This section assumes you have your staging notebooks written and locations to those notebooks are known. We will be configuring those locations to the entries in the Entities table, so have those on hand.
The Entity table defines the list of dims and facts for all your models. For each Entity, you can view the model this entity belongs to, its schema, the name of the entity and description, its type (a various assortment of dim and fact types), the source schema and source entity, and the staging notebook that this entity is build within. You can activate and deactivate any entity on demand. Deactivated entities will not be built during a model build process.
Below is a table describing each of the Entity columns and example values.
Column Name | Description | Example Value |
---|---|---|
ID | The unique generated ID for this entry (autogenerated). | 1 |
Model ID | The model this entity is attached to. When the model is triggered, this entity will be built with it, if both are active! This must be an existing model in the Models table. | financial-model-division-a |
Schema | The destination schema for this entity, i.e. which schema will this entity "live within" after being fully built. | empower-finance |
Name | The name for this entity, i.e. what will this table be called. The newly built entity will "live within" this table after being full built. | dim_product |
Description | A short description of this entity and what its for. | the product dimension for empower-finance, division A |
Type ID | The type for this entity, can be any one of several dim and fact types | TYPE_ONE |
Source Schema | The schema that the source notebook will write to and model builder will read from. | empower_stage |
Source Entity | The table that the source notebook will write to and mode builder will read from. | product |
Source Notebook | The location in DBFS where the staging notebook for this entity is. This is the notebook that will be run during the build phase for this entity, and the results placed in [schema].[name] | /DmBuild/Dimensions/sourcequery_product |
Active | A toggle to activate/inactivate this entity | On |
Last Modified | An uneditable field that displays the last time this entity was updated | 2023-08-23 19:09 (UTC) |
Create a new entity by clicking on "New Record". This will bring up the new Entity creation modal.
Entities Default to Inactive
After creating a new entity, you will have to toggle it on for it to be built during the model building phase. All entities default to inactive state upon creation.
After filling out fields, clicking "Create" will result in a new entry in this table. Create an entry for each of your staging notebooks/entities in your model.
Activate your Entity!
Make sure to toggle the Entity ACTIVE in order to start using it. Inactive Entities will not be built by the Empower system.
Entity Types
Entities can be several types, depending on their use and what kind of historical tracking is needed.
Type Name | Description |
---|---|
Type One (Dimension) | New versions of existing data points will overwrite the existing data points. Table is built as a full load (not incrementally). |
Type Two (Dimension) | New versions of existing data points will be inserted as new data points and flagged as the current version. Table is built as a full load (not incrementally). |
Fact Full | A generic fact table. Table is built as a full load (not incrementally). |
Type One Incremental (Dimension) | New versions of existing data points will overwrite the existing data points. Table is built incrementally. |
Type Two Incremental (Dimension) | New versions of existing data points will be inserted as new data points and flagged as the current version. Table is built incrementally. |
Fact Incremental | A generic fact table. Table is built incrementally. |
Understanding Full and Incremental Load Types
When building a model, it's essential to understand the difference between full and incremental load types. This section provides an overview of these two types and their use cases.
Full Loads
A Full Load involves loading all the data from the source to the destination each time the data is processed. In other words, the entire dataset is reloaded, regardless of whether the data has changed or not. This method can be a good choice when:
- The dataset is small, and the time required to process and reload the entire dataset is relatively short.
- The source data does not have a way to track changes or does not support incremental loads. **
- The latency will be roughly equal in both full and incremental, up to the processing time. Incremental will be more "real-time" in most cases.
However, Full Loads can be time-consuming and resource-intensive, especially when dealing with large datasets or frequent updates.
Incremental Loads
Incremental Load, on the other hand, only processes and loads new or modified data since the last load. This method is more efficient, as it reduces the amount of data being transferred and processed. It can be a good choice when:
- The dataset is large, and processing the entire dataset takes a long time.
- The source data has a method to track changes, such as a modified date or timestamp. **
- The data warehouse or destination can tolerate some delay in reflecting the latest changes.
Incremental Loads may require additional logic and setup to identify and process only the new or modified data. They may also require monitoring and handling of potential data conflicts or issues that may arise when merging the new data with the existing data.
Entity Columns
What are entity columns?
Entity columns allow Empower to track and maintain detailed metadata about entities.
Entity columns should be a 1:1 mapping with the fields of all gold-layer dims and facts. All entity columns must be linked to an existing Entity (defined in the previous section above).
Create a new Entity Column record by clicking "New Record". This will bring up the Entity Column creation modal.
Below is a table describing each of the fields in an Entity Column record.
Column Name | Description | Example Value |
---|---|---|
ID | The unique generated ID for this entity (autogenerated). | 1 |
Model ID | The model this entity column is associated with. Should be the same model the attached entity is associated with. Must be an existing model in the Models table. | financial-model-division-a |
Entity ID | The entity this entity column is associated with. Must be an existing entity in the Entities table. | dim_product |
Name | The name for this entity column. | productId |
Description | A short description of this entity column and what it is for. | the ID of the product |
Type ID | The type for this entity column. Must be one of several types explained below. | surrogate_key |
Ordinal | The order in which this column should be placed relative to others. | 1 |
Data Type | The data type for this entity column. May be one three primitive types: string, int, or date. | string |
Nullable | A toggle that signifies if this field is nullable or must always have a value. | TRUE |
Missing Value | The signifier for a missing value in this field. | -1 |
Source Column | This column can be left blank, it is depreciated | [depreciated] |
Active | A toggle to activate/inactivate this entity column | TRUE |
After filling out fields, hitting the create button will result in a new entry in this table.
Ordinal Column
The ordinal setting is only taken into account during the first model build. Any subsequent build will assume the order within the existing table. In other words, if Ordinal settings are changed after the first time the model is successfully built, these changes will not be respected.
Activate your Entity Column!
Make sure to toggle the Entity Column ACTIVE in order to start using it. Inactive Entity Columns will not be built by the Empower system.
Type ID
Selecting the correct Type ID for your column is as simple as selecting from a drop down list. Please refer to the following table to understand how assignments should be made.
ID | Name | Definition | Example Value |
---|---|---|---|
1 | surrogate_key | A unique identifier for each row in a table, often generated automatically by the system, which has no business meaning. | A primary key that is a system-generated number, such as an auto-incrementing integer in MySQL or a sequence in PostgreSQL. |
2 | business_key | A natural key which has a real-world meaning and relevance to the business domain. | For a Customer table, the customer's email address or social security number can be a business key. |
3 | foreign_key | A key used to establish a relationship between two tables by referencing the primary key in another table. | In an Order table, a foreign key might be Customer_ID, which links to the primary key of the Customer table. |
4 | metadata | Data that provides information about other data, such as descriptions, data types, and relationships between entities. | A description of a database table, such as the column names, data types, and constraints, is an example of metadata. |
5 | attribute | A characteristic or property of an entity within a table, represented as a column. | In a Product table, 'product_name', 'price', and 'stock_quantity' are examples of attributes. |
6 | metric | A quantifiable measurement, often used for analytical purposes to track performance and trends. | In a Sales table, total revenue, average order value, and the number of items sold can be considered metrics. |
Entity Dependency
What is an Entity Dependency?
Entity Dependencies help Empower define the order in which dims and facts get built and how they can be optimally parallelized. An accurate dependency assignment can be the difference between a fast and efficient model build and a slow or failing process.
Make sure to study your staging logic to determine what entities depend on each other.
Create a new Entity Dependency record by clicking "New Record". This will bring up the Entity Column creation modal.
Recommendations
- Assign parent-child entity relationships. In such relationships, the child table is dependent on the parent table, and it must wait for the completion of the parent table before proceeding. Make sure to assign parent and child entities correctly, and that the model_id column points to your model.
- Assign any entities that don't have a prior depency directly to the model itself. To do this: make a new record and leave the parent entity ID blank! Define the child entity to be the that entity, and define the model to be your model.
Updated 9 months ago
If you've made it this far, you are ready to define your model build process in Step Commands. You may also need a refresher in how to create a staging notebook, if you haven't created them already. See the links below for more info.