Analytics Engineering

Transforming data into analytics and information

Analytics Engineering Flows (or AE flows for short) define the intrabound movement of data (within the lakehouse) to transform the data into usable information for your business.

Empower's model building technology takes care of orchestration and parallelization for you. All you have to define as a user is the logic to transform your data, defined in SQL statements within staging notebooks.

Modeling is the process of manipulating raw data in the Lakehouse into a processed, analytics-ready model. Empower's model building technology takes care of orchestration and parallelization for you. All you have to configure AE Flows and define the transformation logic, as defined in SQL statements within staging notebooks.

📘

Quick Links

This page covers Analytics Engineering flow creation and modification.

Overview

The Analytics Engineering module is accessible from the left navigation menu.

Click Analytics Engineering to navigate to the module.

Click Analytics Engineering to navigate to the module.

From the Analytics Engineering model page, you can view the Flows or Models tabs.

Flows Tab

From the default Flows tab, you can view all Analytics Engineering flows within your chosen environment.

A list of all Analytics Engineering Flows in Neo - Augustus - Development

A list of all Analytics Engineering flows in Neo - Augustus - Development

You can search these flows by name, view configurations or historical run logs, trigger a flow to run on demand, view/set/activate scheduling, and create new flows. You can also navigate to the Models tab for data model management.

Creation

Creating an Analytics Engineering flow can be done by simply clicking on "+" at the top of the screen when on the Flows tab.

Create a new Analytics Engineering Flow by clicking "+".

Create a new Analytics Engineering flow by clicking "+".

Fill out the name of the flow and select an existing Model from the drop down. Add an optional description to help inform yourself and other users about the purpose of this flow.

Fill out the name of the flow and select an existing Model from the drop down. Add an optional description to help inform yourself and other users about the purpose of this flow.

Once you fill out the required fields (Name and Model), click "Create" to complete the creation process.

Create is now enabled, now that Name and Model (required fields) have been filled out.

Create is now enabled, now that Name and Model (required fields) have been filled out.

You can now see your newly created flow at the top of the page.

Now you can see the newly created flow!

Congratulations, you have successfully created an AE flow.

📘

INFO: 1 to Many - Models and Analytics Engineering Flows

Models and AE flows have a 1:many relationship.

This means that a single AE flow can only ever be associated with one model at any moment in time. However, a specific model may be associated with many different AE flows, all with different schedules (or even no schedule at all!).

As an addendum, you will not be able to change an AE flow's selected model post-creation.

Editing

You can modify any existing AE flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Edit the flow.

Click "..." and then "Edit" to modify an existing flow.

Click "..." and then "Edit" to modify an existing flow.

📘

INFO: AE Flow Edit Restrictions

You can only ever modify the name and description of an AE flow. You will not be able to reselect an AE flow's associated model.

If you want a different model, create a new AE flow and associate the chosen model with it instead.

Make your desired edits, and "Save" your changes when ready.

Make your desired edits, and "Save" your changes when ready.

Deletion

You can delete any existing AE flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Delete the flow.

Click "..." and then "Delete" to delete an existing flow.

Click "..." and then "Delete" to delete an existing flow.

A confirmation modal will pop up. You must confirm you wish to delete the flow in order to complete the deletion process.

Confirm you wish to delete the flow and all of its historical logs.

Confirm you wish to delete the flow and all of its historical logs.

❗️

DANGER: Deletion is Permanent

Flow deletion is a permanent action. Deleting a flow will also remove the entire historical log of that flow. You will not be able to reverse a flow's deletion, so make sure you actually want to perform this action!

Scheduling and Triggering Flows

To read about how to schedule and trigger Analytics Engineering flows or any other flow type, visit Scheduling Data Flows.

Configuration

Click "View Configuration" on any existing Analytics Engineering flow to visit its configuration page.

The configuration page for the SalesLt & Nation Model Analytics Engineering flow.

The configuration page for the SalesLt & Nation Model Analytics Engineering flow.

From the configuration page, you can view the Entities that will be built within the model associated with this flow, as well as the Entity Columns and Entity Dependencies.

When you make edits on this page and any child tabs, the change is automatically saved to the flow's associated Model configuration.

Entities

📘

What are Entities?

Entities are the gold-layer tables that are generated during the model building process. You can built any number of entities during a model build. Entities can be built using data from any bronze or silver table in your data estate. They can even be built using other gold-layer entities!

🚧

Know your Staging Notebook Locations!

This section assumes you have your staging notebooks written and locations to those notebooks are known. We will be configuring those locations to the entries in the Entities table, so have those on hand.

The Entity table defines the list of dims and facts for the AE flow's model. For each Entity, you can view its schema, the name of the entity and description, its type (a various assortment of dim and fact types), the source schema and source entity, and the staging notebook that this entity is build within. You can activate and deactivate any entity on demand. Deactivated entities will not be built during a model build process.

The Entity table, where you can view information about all the entities within this AE flow's model.

The Entity table, where you can view information about all the entities within this AE flow's model.

Below is a table describing each of the Entity columns and example values.

Column NameDescriptionExample Value
IDThe unique generated ID for this entry (autogenerated).1
SchemaThe destination schema for this entity, i.e. which schema will this entity "live within" after being fully built.empower-finance
NameThe name for this entity, i.e. what will this table be called. The newly built entity will "live within" this table after being full built.dim_product
DescriptionA short description of this entity and what its for.the product dimension for empower-finance, division A
Type IDThe type for this entity, can be any one of several dim and fact typesTYPE_ONE
Source SchemaThe schema that the source notebook will write to and model builder will read from.empower_stage
Source EntityThe table that the source notebook will write to and mode builder will read from.product
Source NotebookThe location in DBFS where the staging notebook for this entity is. This is the notebook that will be run during the build phase for this entity, and the results placed in [schema].[name]/DmBuild/Dimensions/sourcequery_product
ActiveA toggle to activate/inactivate this entityOn
Last ModifiedAn uneditable field that displays the last time this entity was updated2023-08-23 19:09 (UTC)

Creation

Create a new entity by clicking on "New Record". This will bring up the new Entity creation modal.

The Entity creation modal.

The Entity creation modal.

📘

Entities Default to Inactive

After creating a new entity, you will have to toggle it on for it to be built during the model building phase. All entities default to inactive state upon creation.

After filling out fields, clicking "Create" will result in a new entry in this table. Create an entry for each of your staging notebooks/entities in your model.

👍

Activate your Entity!

Make sure to toggle the Entity ACTIVE in order to start using it. Inactive Entities will not be built by the Empower system.

Modification

You can modify many of the fields for an entity by clicking into the cell you wish to edit. Press the ENTER key on your keyboard or click away from the cell to save your changes.

Deletion

You can multiselect entities from this table, then click "Delete" in the lower left corner of the screen to delete these entities.

Deletion is a permanent action that you must confirm to proceed.

Deletion is a permanent action that you must confirm to proceed.

You must confirm that you recognize the results of this action before proceeding.

Entity Types

Entities can be several types, depending on their use and what kind of historical tracking is needed.

Type NameDescription
Type One (Dimension)New versions of existing data points will overwrite the existing data points. Table is built as a full load (not incrementally).
Type Two (Dimension)New versions of existing data points will be inserted as new data points and flagged as the current version. Table is built as a full load (not incrementally).
Fact FullA generic fact table. Table is built as a full load (not incrementally).
Type One Incremental (Dimension)New versions of existing data points will overwrite the existing data points. Table is built incrementally.
Type Two Incremental (Dimension)New versions of existing data points will be inserted as new data points and flagged as the current version. Table is built incrementally.
Fact IncrementalA generic fact table. Table is built incrementally.

When building a model, it's essential to understand the difference between full and incremental load types. This section provides an overview of these two types and their use cases.

Full Loads

A Full Load involves loading all the data from the source to the destination each time the data is processed. In other words, the entire dataset is reloaded, regardless of whether the data has changed or not. This method can be a good choice when:

  • The dataset is small, and the time required to process and reload the entire dataset is relatively short.
  • The source data does not have a way to track changes or does not support incremental loads. **
  • The latency will be roughly equal in both full and incremental, up to the processing time. Incremental will be more "real-time" in most cases.

However, Full Loads can be time-consuming and resource-intensive, especially when dealing with large datasets or frequent updates.

Incremental Loads

Incremental Load, on the other hand, only processes and loads new or modified data since the last load. This method is more efficient, as it reduces the amount of data being transferred and processed. It can be a good choice when:

  • The dataset is large, and processing the entire dataset takes a long time.
  • The source data has a method to track changes, such as a modified date or timestamp. **
  • The data warehouse or destination can tolerate some delay in reflecting the latest changes.

Incremental Loads may require additional logic and setup to identify and process only the new or modified data. They may also require monitoring and handling of potential data conflicts or issues that may arise when merging the new data with the existing data.

Entity Columns

📘

What are entity columns?

Entity columns allow Empower to track and maintain detailed metadata about entities.

The Entity column table, where you can view all entity columns across all entities within this flow's modal.

The Entity column table, where you can view all entity columns across all entities within this flow's modal.

Entity columns should be a 1:1 mapping with the fields of all gold-layer dims and facts. All entity columns must be linked to an existing Entity.

Below is a table describing each of the fields in an Entity Column record.

Column NameDescriptionExample Value
IDThe unique generated ID for this entity (autogenerated).1
Entity IDThe entity this entity column is associated with. Must be an existing entity in the Entities table.dim_product
NameThe name for this entity column.productId
DescriptionA short description of this entity column and what it is for.the ID of the product
Type IDThe type for this entity column. Must be one of several types explained below.surrogate_key
OrdinalThe order in which this column should be placed relative to others.1
Data TypeThe data type for this entity column. May be one three primitive types: string, int, or date.string
NullableA toggle that signifies if this field is nullable or must always have a value.TRUE
Source ColumnThis column can be left blank, it is depreciated[depreciated]
ActiveA toggle to activate/inactivate this entity columnTRUE

Creation

Create a new Entity Column record by clicking "New Record". This will bring up the Entity Column creation modal.

The Entity Column creation modal.

The Entity Column creation modal.

After filling out fields, hitting the create button will result in a new entry in this table.

🚧

Ordinal Column

The ordinal setting is only taken into account during the first model build. Any subsequent build will assume the order within the existing table. In other words, if Ordinal settings are changed after the first time the model is successfully built, these changes will not be respected.

👍

Activate your Entity Column!

Make sure to toggle the Entity Column ACTIVE in order to start using it. Inactive Entity Columns will not be built by the Empower system.

Type ID

Selecting the correct Type ID for your column is as simple as selecting from a drop down list. Please refer to the following table to understand how assignments should be made.

IDNameDefinitionExample Value
1surrogate_keyA unique identifier for each row in a table, often generated automatically by the system, which has no business meaning.A primary key that is a system-generated number, such as an auto-incrementing integer in MySQL or a sequence in PostgreSQL.
2business_keyA natural key which has a real-world meaning and relevance to the business domain.For a Customer table, the customer's email address or social security number can be a business key.
3foreign_keyA key used to establish a relationship between two tables by referencing the primary key in another table.In an Order table, a foreign key might be Customer_ID, which links to the primary key of the Customer table.
4metadataData that provides information about other data, such as descriptions, data types, and relationships between entities.A description of a database table, such as the column names, data types, and constraints, is an example of metadata.
5attributeA characteristic or property of an entity within a table, represented as a column.In a Product table, 'product_name', 'price', and 'stock_quantity' are examples of attributes.
6metricA quantifiable measurement, often used for analytical purposes to track performance and trends.In a Sales table, total revenue, average order value, and the number of items sold can be considered metrics.

Entity Dependencies

📘

What is an Entity Dependency?

Entity Dependencies help Empower define the order in which dims and facts get built and how they can be optimally parallelized. An accurate dependency assignment can be the difference between a fast and efficient model build and a slow or failing process.

Make sure to study your staging logic to determine what entities depend on each other.

The Entity Dependency tab, where you can see all the dependency relationships for this flow's modal.

The Entity Dependency tab, where you can see all the dependency relationships for this flow's modal.

Use model entities to build the dependent relationships for entities in your model.

Creation

Create a new Entity Dependency record by clicking "New Record". This will bring up the Entity Column creation modal.

The Entity Dependency creation modal.

The Entity Dependency creation modal.

Assign parent-child entity relationships for your model entities. In such relationships, the child table is dependent on the parent table, and it must wait for the completion of the parent table before proceeding. Make sure to assign parent and child entities correctly.

Create empty parent records for any entities that don't have a prior depency. To do this: make a new record and leave the parent entity ID blank! Define the child entity to be the that entity

Models Tab

From the Models tab, you can view and manage all models within your selected environment. A model holds all the information and configuration necessary to successfully transform data into analytical models.

The Models tab within the Analytics Engineering module.

The Models tab within the Analytics Engineering module.

📘

Question: How are models used?

Models are the unit of measure for an atomic model building process. When the model building process is complete, the model is the final built list of gold-layer objects to be used for end processes, such as reporting, AI workflows, or uses.

Every AE flow must be associated with one model. The Entity, Entity Columns, and Entity dependency columns for that model will then appear within that flow's Configuration.

🚧

WARNING: A model change will be reflected for all associated flows!

For example: Flow A and Flow B are both associated with Model P.

If a user makes changes to an Entity within Flow A's configuration, those changes will also take effect for Flow B's configuration, as they share the same Model P.

This table displays a list of all models currently in your environment. You can view a model's name, description, and toggle it active or inactive.

Creation

To create a new model, click on "New Record".

Click "New Record" to make a new model.

Click "New Record" to make a new model.

This will bring up the model creation modal. Fill out the name (required) and the description (optional) before clicking create.

Fill the name and description fields, then click "CREATE".

Fill in the name and optional description fields. Then click "Create."

You can see this new model exists in the Model tab.

👍

Activate your Model!

Make sure to toggle the model ACTIVE in order to start using it. Inactive models will not be built by the Empower system, even if associated with a flow.

Data Modeling from from Advanced Options

For more advanced users and for Empower deployments earlier than v1.26, Empower's 2nd generation DB2Step command system also allows users to architect modeling configurations all through Empower's Advanced Options feature set. Read about it in the Configuring Model Building guide.

For detailed information about each Model-centric Advanced Options table, please visit the Advanced Options, specifically the Model, Model - Entity, Model - Entity Column, and Model - Entity Dependency pages.