Data Acquisition
Pulling data into the lakehouse from external sources
Data Acquisition Flows (or DAQ flows for short) define the inbound movement of data, i.e. data moving from a data source into your centralized, enterprise data lakehouse.
In Empower, you can create as many acquisition flows as you want, while configuring each of them to acquire data however often you want.
Quick Links
This page only covers an overview of DAQ flows and how to use them
- Logging and Monitoring: defines how to use the log page to monitor flows ("View Runs").
- Schedules and Triggers: describes how to trigger flows to run on demand ("Preview Run") and schedule flows to run on a repeating cadence ("Schedule").
Overview
The Data Acquisition module is accessible from the left navigation menu.
From here, you can view all Data Acquisition flows within your selected environment.
There are a few restrictions when it comes to DAQ flows.
- An Acquisition flow can only ever be affiliated with one Data Source at a time.
- This means you can't bring data in from multiple data sources under the same acquisition flow! In case you have this need, create a separate acquisition flow for each of your data sources, and schedule them to run at the same time.
- Archiving a Data Source will also force the archiving of all affiliated DAQ flows.
- You will not be able to edit, schedule, or trigger these flows until you restore the underlying Data Source.
- Deleting a Data Source will permanently archive all affiliated DAQ flows.
- You will never again be able to edit, schedule, or trigger these flows.
Flows
From the default view for the Data Acquisition module, you can see all the DAQ flows within your current environment.
You can search these flows by name, view configurations or historical run logs, trigger a flow to run on demand, view/set/activate scheduling, and create new flows.
Creation
Creating a Data Acquisition flow can be done by simply clicking on "+" at the top of the screen on the homepage for the Data Acquisition module.
Once you fill out the required fields (Name and Data Source), click "Create" to complete the creation process.
You can now see your newly created flow at the top of the page.
INFO: 1 to Many - Data Sources and Data Acquisition Flows
Data Sources and DAQ flows have a 1:many relationship.
This means that a single DAQ flow can only ever be associated with one Data Source at any moment in time. However, a specific Data Source may be associated with many different DAQ flows, all with different schedules and inclusion configurations.
Editing
You can modify any existing DAQ flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Edit the flow.
You may edit the Name and the Data Source associated with this flow.
WARNING: changing the Data Source field will affect your configuration!
Changing a DAQ's data source will completely wipe any previous object inclusion configurations. Be mindful of this action when you modify a DAQ flow!
Deletion
You can delete any existing DAQ flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Delete the flow.
A confirmation modal will pop up. You must confirm you wish to delete the flow in order to complete the deletion process.
DANGER: Deletion is Permanent
Flow deletion is a permanent action. Deleting a flow will also remove the entire historical log of that flow. You will not be able to reverse a flow's deletion, so make sure you actually want to perform this action!
Scheduling and Triggering Flows
To read about how to schedule and trigger DAQ flows or any other flow type, visit Scheduling Data Flows.
Configuration
Click "View Configuration" on any existing DAQ flow to visit its configuration page.
The configuration page contains a list of all available objects from the flow's associated Data Source. You may toggle Include to configure this object to be included as part of data acquisition when this flow is triggered.
You may search objects by name, as well as filter objects by Include, Estimated Rows and Schema Name.
When you include/exclude an object on this page, the change is automatically saved to the flow's configuration.
WARNING: Global Enablement
The Enable toggle on the object within the Data Source's Metadata Catalog will impact the ability of this flow to acquire the object.
If the object is globally disabled it will not be acquired when this flow runs.
- You will not be able to include globally disabled objects it in the flow.
- Even if the disabled object is already included in the flow, it will not be acquired when the flow runs.
How Data Acquisition Works
Data Acquisition is an umbrella term that links together three data movement concepts.
Metadata Extraction
Acquisition automatically refreshes the associated data source's metadata to ensure Schema Drift protection. Read up about Metadata Extraction and how Metadata Catalogs work.
Extraction
Extraction defines the process of collecting data from a data source and writing it to the open-source parquet file format in the Empower Landing Zone.
Supported Data Sources in Empower
Empower is configured to support over 200+ data sources for extraction. Some of these are entirely self-serviceable from the UI. Others require more configuration by the Empower delivery team.
For a full list of sources, check out our Source System Support page.. Talk to your delivery manager for more information.
Ingestion
Data ingestion takes the data from the Landing Zone and ingests it into the open-sourced Delta Table format into the Bronze Layer of the Medallion Architecture.
All Empower-ingested data is tracked as part of a Type 2 data contract. This means that all changes made to the data from the original data source, including modifications and deletions, are stored as new data points in the Bronze Layer.
Bronze Layer ingested data is further ingested into the Silver Layer as the most current version of every data point.
Updated 4 months ago
To configure Data Acquisition from Data Sources, learn about how to manipulate the Metadata Catalog.