Data Acquisition

Pulling data into the lakehouse from external sources

Data Acquisition Flows (or DAQ flows for short) define the inbound movement of data, i.e. data moving from a data source into your centralized, enterprise data lakehouse.

In Empower, you can create as many acquisition flows as you want, while configuring each of them to acquire data however often you want.

📘

Quick Links

This page only covers an overview of DAQ flows and how to use them

  • Logging and Monitoring: defines how to use the log page to monitor flows ("View Runs").
  • Schedules and Triggers: describes how to trigger flows to run on demand ("Preview Run") and schedule flows to run on a repeating cadence ("Schedule").

Overview

The Data Acquisition module is accessible from the left navigation menu.

Click Data Acquisition to navigate to the module.

Click Data Acquisition to navigate to the module.

From here, you can view all Data Acquisition flows within your selected environment.

There are a few restrictions when it comes to DAQ flows.

  1. An Acquisition flow can only ever be affiliated with one Data Source at a time.
    1. This means you can't bring data in from multiple data sources under the same acquisition flow! In case you have this need, create a separate acquisition flow for each of your data sources, and schedule them to run at the same time.
  2. Archiving a Data Source will also force the archiving of all affiliated DAQ flows.
    1. You will not be able to edit, schedule, or trigger these flows until you restore the underlying Data Source.
  3. Deleting a Data Source will permanently archive all affiliated DAQ flows.
    1. You will never again be able to edit, schedule, or trigger these flows.

Flows

From the default view for the Data Acquisition module, you can see all the DAQ flows within your current environment.

A list of all DAQ flows in Neo - Tiberius - Integration.

A list of all DAQ flows in Neo - Tiberius - Integration.

You can search these flows by name, view configurations or historical run logs, trigger a flow to run on demand, view/set/activate scheduling, and create new flows.

Creation

Creating a Data Acquisition flow can be done by simply clicking on "+" at the top of the screen on the homepage for the Data Acquisition module.

Create a new Data Acquisition flow by clicking "+".

Create a new Data Acquisition flow by clicking "+".

Fill out the name of the flow and select an existing Data Source from the drop down.

Fill out the name of the flow and select an existing Data Source from the drop down.

Once you fill out the required fields (Name and Data Source), click "Create" to complete the creation process.

"Create" is now enabled, now that Name and Data Source (required fields) are filled out.

"Create" is now enabled, now that Name and Data Source (required fields) are filled out.

You can now see your newly created flow at the top of the page.

Congratulations, you have successfully created a DAQ flow.

Congratulations, you have successfully created a DAQ flow.

📘

INFO: 1 to Many - Data Sources and Data Acquisition Flows

Data Sources and DAQ flows have a 1:many relationship.

This means that a single DAQ flow can only ever be associated with one Data Source at any moment in time. However, a specific Data Source may be associated with many different DAQ flows, all with different schedules and inclusion configurations.

Editing

You can modify any existing DAQ flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Edit the flow.

Click "..." and then Edit to modify an existing flow.

Click "..." and then Edit to modify an existing flow.

You may edit the Name and the Data Source associated with this flow.

🚧

WARNING: changing the Data Source field will affect your configuration!

Changing a DAQ's data source will completely wipe any previous object inclusion configurations. Be mindful of this action when you modify a DAQ flow!

Make your desired edits, and "Save" your changes when ready.

Make your desired edits, and "Save" your changes when ready.

Deletion

You can delete any existing DAQ flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Delete the flow.

Click "..." and then Edit to modify an existing flow.

Click "..." and then Delete to delete an existing flow.

A confirmation modal will pop up. You must confirm you wish to delete the flow in order to complete the deletion process.

Confirm you wish to delete the flow and all of its historical logs.

Confirm you wish to delete the flow and all of its historical logs.

❗️

DANGER: Deletion is Permanent

Flow deletion is a permanent action. Deleting a flow will also remove the entire historical log of that flow. You will not be able to reverse a flow's deletion, so make sure you actually want to perform this action!

Scheduling and Triggering Flows

To read about how to schedule and trigger DAQ flows or any other flow type, visit Scheduling Data Flows.

Configuration

Click "View Configuration" on any existing DAQ flow to visit its configuration page.

The configuration page for the Weekly Sales Acquisition flow.

The configuration page for the Weekly Sales Acquisition flow.

The configuration page contains a list of all available objects from the flow's associated Data Source. You may toggle Include to configure this object to be included as part of data acquisition when this flow is triggered.

You may search objects by name, as well as filter objects by Include, Estimated Rows and Schema Name.

When you include/exclude an object on this page, the change is automatically saved to the flow's configuration.

🚧

WARNING: Global Enablement

The Enable toggle on the object within the Data Source's Metadata Catalog will impact the ability of this flow to acquire the object.

If the object is globally disabled it will not be acquired when this flow runs.

  1. You will not be able to include globally disabled objects it in the flow.
  2. Even if the disabled object is already included in the flow, it will not be acquired when the flow runs.

How Data Acquisition Works

Data Acquisition is an umbrella term that links together three data movement concepts.

Metadata Extraction

Acquisition automatically refreshes the associated data source's metadata to ensure Schema Drift protection. Read up about Metadata Extraction and how Metadata Catalogs work.

Extraction

Extraction defines the process of collecting data from a data source and writing it to the open-source parquet file format in the Empower Landing Zone.

A handful of Empower supported data sources for extraction.

A handful of Empower supported data sources for extraction.

📘

Supported Data Sources in Empower

Empower is configured to support over 200+ data sources for extraction. Some of these are entirely self-serviceable from the UI. Others require more configuration by the Empower delivery team.

For a full list of sources, check out our Source System Support page.. Talk to your delivery manager for more information.

Ingestion

Data ingestion takes the data from the Landing Zone and ingests it into the open-sourced Delta Table format into the Bronze Layer of the Medallion Architecture.

All Empower-ingested data is tracked as part of a Type 2 data contract. This means that all changes made to the data from the original data source, including modifications and deletions, are stored as new data points in the Bronze Layer.

Bronze Layer ingested data is further ingested into the Silver Layer as the most current version of every data point.


What’s Next

To configure Data Acquisition from Data Sources, learn about how to manipulate the Metadata Catalog.