Data Publishing
Pushing data out of the Lakehouse into other tools
Data Publishing Flows define the outbound movement of data to external systems, such as Power Bi, SFTP and SQL Servers, or Dynamics 365.
Publishing flows have an associated Data Source Target (Data Target, for short) which specifies where they will be writing data to.
Quick Links
This page only covers Publishing flow creation and modification.
- Logging and Monitoring: defines how to use the log page to monitor flows ("View Runs").
- Schedules and Triggers: describes how to trigger flows to run on demand ("Preview Run") and schedule flows to run on a repeating cadence ("Schedule").
- Supported Publishing Targets:
Overview
The Data Publishing module is accessible from the left navigation menu.
From here, you can view all Data Publishing flows within your selected environment.
There are a few restrictions when it comes to Data Publishing flows.
- A Publishing flow can only ever be affiliated with one Data Source at a time.
- This means you can't push data out to multiple data sources under the same publishing flow! In case you have this need, create a separate acquisition flow for each of your data sources, and schedule them to run at the same time.
- Archiving a Data Source will also force the archiving of all affiliated Publishing flows.
- You will not be able to edit, schedule, or trigger these flows until you restore the underlying Data Source.
- Deleting a Data Source will permanently archive all affiliated Publishing flows.
- You will never again be able to edit, schedule, or trigger these flows.
Flows
From the default view for the Data Publishing module, you can see all the Publishing flows within your current environment.
You can search these flows by name, view configurations or historical run logs, trigger a flow to run on demand, view/set/activate scheduling, and create new flows.
Creation
Creating a Data Publishing flow can be done by simply clicking on "+" at the top of the screen on the homepage for the Data Publishing module.
Once you fill out the required fields (Name and Data Source), click "Create" to complete the creation process. Depending on the data target selected (SFTP in the example below) you may have additional fields to configure. See the Supported Data Publishing Targets section below for more details.
You can now see your newly created flow at the top of the page
INFO: 1 to Many - Data Sources and Data Publishing Flows
Data Targets and Publishing flows have a 1:many relationship.
This means that a single Publishing flow can only ever be associated with one Data Target at any moment in time. However, a specific Data Targets may be associated with many different Publishing flows, all with different schedules and entity configurations.
Editing
You can modify any existing Publishing flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Edit the flow.
You may edit the Name, the Data Source Target, and (if applicable) additional fields depending on the Data Source Target type.
Deletion
You can delete any existing Data Publishing flow by clicking "..." near the name of the flow. Doing so will bring up a menu with one option being to Delete the flow.
A confirmation modal will pop up. You must confirm you wish to delete the flow in order to complete the deletion process.
DANGER: Deletion is Permanent
Flow deletion is a permanent action. Deleting a flow will also remove the entire historical log of that flow. You will not be able to reverse a flow's deletion, so make sure you actually want to perform this action!
Scheduling and Triggering Flows
To read about how to schedule and trigger DAQ flows or any other flow type, visit Scheduling Data Flows.
Configuration
Click "View Configuration" on any existing Publishing flow to visit its configuration page.
From the configuration page, you can view the Entities that will be published when this flow is executed. Each Entity may have additional configurable settings (Options), described in the apty named subsection below.
When you make edits on this page, the change is automatically saved to the flow's associated Model configuration.
Entities
What are Entities in Publishing flows?
Unlike Model Entities, Publish Entities are the objects within your lakehouse that are set to be published when this publishing flow is executed. You can set any number of publish entities within a publishing flow. Entities can be configured using data from any bronze, silver, or gold table in your data estate.
The Entity table defines the list of publishable objects for the Publishing flow. For each Entity, you can view its target name, its source schema and name in your lakehouse, any SQL filter to be run before publishing, the source catalog (set to your environment's catalog by default - AUTO), and any additional options for this Entity. Options are typically defined by the data target you are publishing data out to. See the Supported Data Publishing Targets below for more details on each supported target.
You can activate and deactivate any entity on demand. Deactivated entities will not be published during this flow's execution.
Below is a table describing each of the Entity columns and example values.
Column Name | Description | Example Value |
---|---|---|
ID | The global ID for this Entity (autogenerated). | 123456-abccda-1233-124abcd |
Target Entity Name | The name of the entity or desired path (month/date/Target_entity_name) as it will be written to an SFTP server (if applicable) | dim_account_revenue |
Source Schema | The source table’s schema in your lakehouse, i.e. "where is this data coming from?" | sales_gold |
Source Entity Name | The source table's name in your lakehouse, i.e. "where is this data coming from?" | dim_account_revenue |
Source Entity Filter | An optional WHERE condition parameter, to be executed before publishing this data as a temporary view. Keep in mind that this filter must be valid SQL (everything that could be part of a SQL WHERE clause, just without the preceding WHERE in the statement). For proper behavior, make sure to use fields that are contained within the source entity name you are specifying. | account_type=”Debit” |
Source Catalog | The source table's catalog in your lakehouse. Select AUTO to use your environment's default catalog. | AUTO |
Active | A toggle which activates/deactivates this entity for publishing. Only active entities will be published. | Active/Inactive |
Options | Contains a separate key/value dictionary modal to define specific data-target-specific configurations on an object by object basis. See Supported Data Publishing Targets below for more details. | N/A |
Supported Data Publishing Targets
- Power BI Datasets
- SFTP Servers
- SQL Servers
- Dynamics 365 via Empower for Dynamics -> Not supported in the UI.
- Another Empower Instance (via Unity Catalog Sharing) -> Not supported in the UI.
- (Custom) Any Target via Databricks Notebook -> Not supported in the UI.
Updated 7 months ago