Metadata Catalogs
View metadata about a specific Data Source
Overview
Metadata is "data about data". In Empower, metadata contains information about the schemas, tables, objects, and fields contained within the data source. Empower extracts metadata while automatically detecting and auto-resolving Schema Drift.
Empower extracts metadata from your data sources so that you can decide what data to ingest into your deployment using the Metadata Catalog.
What is a Metadata Catalog?
A metadata catalog is a view of all the tables/objects and their schemas from a Data Source. For each table/object, you can also view all fields within it, as well as an estimated number of rows within this table/object. You can define what data and how that data is brought in using the catalog.
You can view the metadata catalog of any data source with at least one successful metadata extraction in its lifetime. To do so, select a data source and click "View Metadata Catalog".
The Metadata Catalog provides a view of all metadata that has been extracted from this source using the provided credentials. With Write access, you can configure different load strategies, watermarks, field-level enablement, and even global object-level enablement.
Automatic Saving!
Every change you make to the metadata catalog is automatically saved.
Enabled
Fields
From the metadata catalog View Field Details column, you may configure Empower to bring in certain object fields while ignoring others.
Click "View Field Details" to bring up the Field Details modal and select which fields you wish to enable/disable. When disabled, this field will no longer be extracted during Data Acquisition.
Objects
Entire objects can also be globally enabled/disabled from the metadata catalog. Enabling or disabling an object is as easy as flipping a toggle in the "Enabled" column.
As with fields, disabling an object means it will no longer be extracted during Data Acquisition. Additionally, no new Data Acquisition Flows will be able to include this object in their flow until the object is re-enabled.
Bulk Enablement
You can use the select boxes to the left side of the Metadata Catalog to bulk enable/disable objects.
First, select the objects you wish to bulk edit.
Then, select whether you want all selected objects to be Enabled or Disabled.
Load Strategies
You can select which objects you want Flows to be able to acquire from using the "Enabled" column. The particular method of data acquisition can be defined on an object-by-object basis. You can select between two Load strategies:
- Full Load: This is the default load strategy. This strategy does a complete refresh of the target table. The entire source data set is read from the source and used to update the target table. Any records in the target that are not in the source are deleted. Any records in the source that are not matched get appended. This strategy is most useful for tables that have no watermark columns to use for an incremental extraction.
- Incremental Load: Loads an incremental dataset to the target table. A watermark column is used during extraction to only fetch records that are new relative to the last extraction. This method does not capture deletions from the data source!
Watermarks
A critical part of incremental extraction, watermarks are used as a marker for the last successful data extraction. With these markers indicating the last record that was successfully processed, Empower can identify and scope the next extraction to only process new or updated data. Watermarks are can significantly reduce the volume of data movement and processing, leading to improved performance and lower resource usage.
You can select a column within an object to act as the watermark column. Empower's UI supports the following watermark methods:
- Timestamp: a datetime value column, e.g. modifiedDate.
Candidate Keys
A candidate key is an attribute or a set of attributes that uniquely identify a record within a database table. Every table is defined by its ability to hold unique data entries, and candidate keys are critical for establishing this uniqueness. When Empower extracts metadata from data sources, it can use candidate keys to ensure that each record is unique and to manage updates or merges effectively.
You can select multiple columns (which will be concatenated together) from which to form a candidate key for an object. Though this field is not required, we highly recommend using it to decrease ambiguity in your data estate.
WhereQueryPart
You can also define a query you want the Empower system to perform to scope data acquisition before it enters the Lakehouse. Similar to the Where column of Publish Entities, it's as easy as writing a SQL WHERE clause. WhereQueryPart is a completely optional field. When this field is blank, Empower will bring in all data as defined by the Load Strategy.
Under the hood, Empower will scope source data (defined by the Load Strategy) during Acquisition to only bring the rows that match the query into the Lakehouse.
Options
Clicking "View Options" will bring up the Options modal, a key-value dictionary used to specify special object-options for acquisition purposes.
Only some data source types support object options. Today this short list includes:
To see what key-value pairs are supported today, visit the relevant connector pages.
Load Group (Advanced Users)
You can use the Load Group column to assign this object a Load Group for Step Command acquisition. This column will only be visible for users with Advanced Options enabled. Learn how to enable Advanced Options on the Advanced Options page.
Read about Load Group groups and Step Commands on the Orchestration page in Advanced Options.
Variations (Advanced Users)
This column will not have an affect on objects as it is only privately-preview-able for a handful of select users. Please contact the Empower product team for more information on this feature.
Updated 2 months ago
You can learn how to trigger and schedule Data Acquisition next using Database to Step Commands. You can also learn how to Monitor Acquisition