Metadata Catalogs

View metadata about a specific Data Source

Overview

Metadata is "data about data". In Empower, metadata contains information about the schemas, tables, objects, and fields contained within the data source. Empower extracts metadata while automatically detecting and auto-resolving Schema Drift.

Empower extracts metadata from your data sources so that you can decide what data to ingest into your deployment using the Metadata Catalog.

📘

What is a Metadata Catalog?

A metadata catalog is a view of all the tables/objects and their schemas from a Data Source. For each table/object, you can also view all fields within it, as well as an estimated number of rows within this table/object. You can define what data and how that data is brought in using the catalog.

You can view the metadata catalog of any data source with at least one successful metadata extraction in its lifetime. To do so, select a data source and click "View Metadata Catalog".

You can "View Metadata Catalog" on any source that has at least one successful metadata extraction in its lifetime.

You can "View Metadata Catalog" on any source that has at least one successful metadata extraction in its lifetime.

The Metadata Catalog provides a view of all metadata that has been extracted from this source using the provided credentials. With Write access, you can configure different load strategies, watermarks, field-level enablement, and even global object-level enablement.

Empower enables you to configure **how** data is brought into your Lakehouse, but it does not manipulate data on the source itself!

Empower enables you to configure how data is brought into your Lakehouse, but it does not manipulate data on the source itself!

Enabled

Fields

From the metadata catalog View Field Details column, you may configure Empower to bring in certain object fields while ignoring others.

Click "View Field Details" to bring up the Field Details modal and select which fields you wish to enable/disable. When disabled, this field will no longer be extracted during Data Acquisition.

Enable or disable fields for Data Acquisition.

Enable or disable fields for Data Acquisition.

Objects

Entire objects can also be globally enabled/disabled from the metadata catalog. Enabling or disabling an object is as easy as flipping a toggle in the "Enabled" column.

As with fields, disabling an object means it will no longer be extracted during Data Acquisition. Additionally, no new Data Acquisition Flows will be able to include this object in their flow until the object is re-enabled.

Load Strategies

You can select which objects you want Flows to be able to acquire from using the "Enabled" column. The particular method of data acquisition can be defined on an object-by-object basis. You can select between two Load strategies:

  1. Full Load: This is the default load strategy. This strategy does a complete refresh of the target table. The entire source data set is read from the source and used to update the target table. Any records in the target that are not in the source are deleted. Any records in the source that are not matched get appended. This strategy is most useful for tables that have no watermark columns to use for an incremental extraction.
  2. Incremental Load: Loads an incremental dataset to the target table. A watermark column is used during extraction to only fetch records that are new relative to the last extraction. This method does not capture deletions from the data source!

Watermarks

A critical part of incremental extraction, watermarks are used as a marker for the last successful data extraction. With these markers indicating the last record that was successfully processed, Empower can identify and scope the next extraction to only process new or updated data. Watermarks are can significantly reduce the volume of data movement and processing, leading to improved performance and lower resource usage.

You can select a column within an object to act as the watermark column. Empower's UI supports the following watermark methods:

  • Timestamp: a datetime value column, e.g. modifiedDate.

Candidate Keys

A candidate key is an attribute or a set of attributes that uniquely identify a record within a database table. Every table is defined by its ability to hold unique data entries, and candidate keys are critical for establishing this uniqueness. When Empower extracts metadata from data sources, it can use candidate keys to ensure that each record is unique and to manage updates or merges effectively.

You can select multiple columns (which will be concatenated together) from which to form a candidate key for an object. Though this field is not required, we highly recommend using it to decrease ambiguity in your data estate.

WhereQueryPart

You can also define a query you want the Empower system to perform to scope data acquisition before it enters the Lakehouse. Similar to the Where column of Publish Entities, it's as easy as writing a SQL WHERE clause. WhereQueryPart is a completely optional field. When this field is blank, Empower will bring in all data as defined by the Load Strategy.

Write a SQL WHERE clause to restrict acquisition to a specified subset of data.

Write a SQL WHERE clause to restrict acquisition to a specified subset of data.

Under the hood, Empower will scope source data (defined by the Load Strategy) during Acquisition to only bring the rows that match the query into the Lakehouse.

Load Group (Advanced Users)

Variations (Advanced Users)

📘

Automatic Saving!

Every change you make to the metadata catalog is automatically saved.


What’s Next

You can learn how to trigger and schedule Data Acquisition next using Database to Step Commands. You can also learn how to Monitor Acquisition