Aqcuire

Pulling data into the lakehouse from external sources

Data Acquisition Flows define the inbound movement of data, i.e. data moving from a data source into your centralized, enterprise data lakehouse.

In Empower, you can create as many acquisition flows as you want, while configuring each of them to acquire data however often you want.

📘

Quick Links

This page only covers an overview of acuqisition flows and how to use them

  • Logging and Monitoring: defines how to use the log page to monitor flows ("View Runs").
  • Schedules and Triggers: describes how to trigger flows to run on demand ("Preview Run") and schedule flows to run on a repeating cadence ("Schedule").

Overview

The Acquire module is accessible from the left navigation menu.

Click Data Acquisition to navigate to the module.

Click Acquire to navigate to the module.

From here, you can view all Acquisition flows within your selected environment.

There are a few restrictions when it comes to acquisition flows.

  1. An Acquisition flow can only ever be affiliated with one Data Source Connection at a time.
    1. This means you can't bring data in from multiple data sources under the same acquisition flow! In case you have this need, create a separate acquisition flow for each of your data sources, and schedule them to run at the same time.
  2. Archiving a Data Source Connection will also force the archiving of all affiliated Acquisition flows.
    1. You will not be able to edit, schedule, or trigger these flows until you restore the underlying Data Source Connection.
  3. Deleting a Data Source Connection will permanently archive all affiliated Acquisition flows.
    1. You will never again be able to edit, schedule, or trigger these flows.

Flows

From the default view for the Acquire module, you can see all the Acquisition flows within your current environment.

A list of all DAQ flows in Neo - Tiberius - Integration.

A list of all Acquisition flows.

You can search these flows by name, view configurations or historical run logs, trigger a flow to run on demand, view/set/activate scheduling, and create new flows.

Creation

Creating an Acquisition flow can be done by simply clicking on "+" at the top of the screen on the homepage for the Acquire module.

Create a new Data Acquisition flow by clicking "+".

Create a new Data Acquisition flow by clicking "+".

Fill out the name of the flow and select an existing Data Source from the drop down.

Fill out the name of the flow and select an existing Data Source Connection from the drop down.

Once you fill out the required fields (Name and Data Source), click "Create" to complete the creation process.

"Create" is now enabled, now that Name and Data Source (required fields) are filled out.

"Create" is now enabled, now that Name and Data Source (required fields) are filled out.

You can now see your newly created flow at the top of the page.

Congratulations, you have successfully created a DAQ flow.

Congratulations, you have successfully created a Acquisition flow.

📘

INFO: 1 to Many - Data Source Connections and Acquisition Flows

Data Source Connections and Acquisition flows have a 1:many relationship.

This means that a single Acquisition flow can only ever be associated with one Data Source Connection at any moment in time. However, a specific Data Source Connection may be associated with many different Acquisition flows, all with different schedules and inclusion configurations.

Editing

You can modify any existing Acquisition flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Edit the flow.

Click "..." and then Edit to modify an existing flow.

Click "..." and then Edit to modify an existing flow.

You may edit the Name and the Data Source Connection associated with this flow.

🚧

WARNING: changing the Data Source Connection field will affect your configuration!

Changing a Acquisition's data source will completely wipe any previous object inclusion configurations. Be mindful of this action when you modify a Acquisition flow!

Make your desired edits, and "Save" your changes when ready.

Make your desired edits, and "Save" your changes when ready.

Deletion

You can delete any existing Acquisition flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Delete the flow.

Click "..." and then Edit to modify an existing flow.

Click "..." and then Delete to delete an existing flow.

A confirmation modal will pop up. You must confirm you wish to delete the flow in order to complete the deletion process.

Confirm you wish to delete the flow and all of its historical logs.

Confirm you wish to delete the flow and all of its historical logs.

❗️

DANGER: Deletion is Permanent

Flow deletion is a permanent action. Deleting a flow will also remove the entire historical log of that flow. You will not be able to reverse a flow's deletion, so make sure you actually want to perform this action!

Scheduling and Triggering Flows

To read about how to schedule and trigger Acquisition flows or any other flow type, visit Scheduling Data Flows.

Configuration

Click "View Configuration" on any existing Acquisition flow to visit its configuration page.

The configuration page for the Weekly Sales Acquisition flow.

The configuration page for the Weekly Sales Acquisition flow.

The configuration page contains a list of all available objects from the flow's associated Data Source. You may toggle Include to configure this object to be included as part of data acquisition when this flow is triggered.

You may search objects by name, as well as filter objects by Include, Estimated Rows and Schema Name.

When you include/exclude an object on this page, the change is automatically saved to the flow's configuration.

🚧

WARNING: Global Enablement

The Enable toggle on the object within the Data Source's Metadata Catalog will impact the ability of this flow to acquire the object.

If the object is globally disabled it will not be acquired when this flow runs.

  1. You will not be able to include globally disabled objects it in the flow.
  2. Even if the disabled object is already included in the flow, it will not be acquired when the flow runs.

How Data Acquisition Works

Data Acquisition is an umbrella term that links together three data movement concepts.

Metadata Extraction

Acquisition automatically refreshes the associated data source's metadata to ensure Schema Drift protection. Read up about Metadata Extraction and how Metadata Catalogs work.

Extraction

Extraction defines the process of collecting data from a data source and writing it to the open-source parquet file format in the Empower Landing Zone.

A handful of Empower supported data sources for extraction.

A handful of Empower supported data sources for extraction.

📘

Supported Data Sources in Empower

Empower is configured to support over 200+ data sources for extraction. Some of these are entirely self-serviceable from the UI. Others require more configuration by the Empower delivery team.

For a full list of sources, check out our Source System Support page.. Talk to your delivery manager for more information.

Ingestion

Data ingestion takes the data from the Landing Zone and ingests it into the open-sourced Delta Table format into the Bronze Layer of the Medallion Architecture.

All Empower-ingested data is tracked as part of a Type 2 data contract. This means that all changes made to the data from the original data source, including modifications and deletions, are stored as new data points in the Bronze Layer.

Bronze Layer ingested data is further ingested into the Silver Layer as the most current version of every data point.


What’s Next

To configure Data Acquisition from Data Sources, learn about how to manipulate the Metadata Catalog.