Metadata Catalogs

View metadata about a specific Data Source

Overview

Empower extracts metadata from your data sources so that you can decide what data to ingest into your deployment using the Metadata Catalog.

📘

What is a Metadata Catalog?

A metadata catalog is a view of all the tables/objects and their schemas from a Data Source. For each table/object, you can also view all fields within it, as well as an estimated number of rows within this table/object. You can define what data and how that data is brought in using the catalog.

You can view the metadata catalog of any data source with at least one successful metadata extraction in its lifetime. To do so, select a data source and click "View Data Acquisition".

Click to view the Data Acquisition Flow for this Data Source Connection.

Click to view Data Acquisition Flows for this Data Source Connection.

Click to view the Metadata Catalog

Click to view the Metadata Catalog

Empower does not manipulate the source itself. It does allow you to configure how the data may be brought into the Lakehouse.

Empower does not manipulate the source itself. It does allow you to configure how the data may be brought into the Lakehouse.

Load Strategies

You can select which objects you want Flows to be able to acquire from using the "Enabled" column. The particular method of data acquisition can be defined on an object-by-object basis. You can select between two Load strategies:

  1. Full Load: This is the default load strategy. This strategy does a complete refresh of the target table. The entire source data set is read from the source and used to update the target table. Any records in the target that are not in the source are deleted. Any records in the source that are not matched get appended. This strategy is most useful for tables that have no watermark columns to use for an incremental extraction.
  2. Incremental Load: Loads an incremental dataset to the target table. A watermark column is used during extraction to only fetch records that are new relative to the last extraction. This method does not capture deletions from the data source!

With either strategy, you may select a candidate key to identify entry uniqueness. You may also elect to include or exclude certain fields during acquisition.

Enable or disable fields for Data Acquisition.

Enable or disable fields for Data Acquisition.

WhereQueryPart

You can also define a query you want the Empower system to perform to scope data acquisition before it enters the Lakehouse. Similar to the Where column of Publish Entities, it's as easy as writing a SQL WHERE clause. WhereQueryPart is a completely optional field. When this field is blank, Empower will bring in all data as defined by the Load Strategy.

Write a SQL WHERE clause to restrict acquisition to a specified subset of data.

Write a SQL WHERE clause to restrict acquisition to a specified subset of data.

Under the hood, Empower will scope source data (defined by the Load Strategy) during Acquisition to only bring the rows that match the query into the Lakehouse.

📘

Automatic Saving!

Every change you make to the metadata catalog is automatically saved.


What’s Next

You can learn how to trigger and schedule Data Acquisition next using Database to Step Commands. You can also learn how to Monitor Acquisition