SFTP Server Publishing
Empower supports publishing datasets to SFTP servers via CSV, TSV, or parquet file formats automatically when the underlying data is updated.
Prerequisites
- An SFTP server and its credentials.
- Data in your Empower delta lake to publish.
- Advanced Options is toggled on for your account.
Steps
Add the SFTP Server as a Data Source in your Empower Environment
- Navigate to your Empower deployment's Data Sources tab in the UI.
- Click + to add a new data source, navigate to SFTP or use the search bar.
- Fill in the required fields:
- Username
- Password
- Host
- Port (defaults to 22)
- Click save and connect to save the new SFTP data source.
Connection Test Errors
You may notice an error message pop up with SFTP source types, even when the connection has saved. This is expected behavior, as Empower cannot currently test SFTP connections. Ensure that your server has an open port and you have entered all field information correctly. We will be enabling SFTP testing at a later Empower release.
Configure DatabaseToStepCommand
- Navigate to the Advanced Options tab on the left side of the screen.
- Add a new record to the DatabaseToStepCommand table.
- Use the SFTP server in the Database ID field.
- Configure this record to be a PUBLISH phase.
- Select a new load group number if you want this to run independently OR add to an existing loadgroup to have it run with the rest of that load group.
- Write 1 for the Execution Order field if this is the only command to be executed; otherwise configure to be in whatever order you wish.
- You can leave the Source Schema Suffix and Target Schema Suffix fields as they are defaulted.
- You can leave the Item Name to Execute field set to None.
Configure Publish Entity Table
- Identify the entities (tables) you want to publish to the target SFTP Server.
- Create a new record for each table you wish to publish in the Publish Entity Table.
- Ensure each new Publish Entity entry includes at least the following columns:
- Load Group: matches DatabaseToStepCommand entry’s loadgroup.
- Target Entity: this could be name of the entity or desired path (month/date/Target_entity_name) as it will be written to a path in the SFTP server.
- Source Catalog: the source table’s Unity Catalog.
- Source Schema: the source table’s schema.
- Source Entity Name: the source table’s name.
- Source Entity Filter: this adds an optional where condition parameter to the (ex: account_type=”Debit”).
- Keep in mind that this filter must be valid SQL (everything that could be part of a SQL WHERE clause, just without the preceding WHERE in the statement).
- For proper behavior, make sure to use fields that are contained within the table you are specifying.
- Target ID: The DatabaseListID of the target SFTP server (this should be the same as the DatabaseToStepCommand’s DatabaseID).
- Is Active: only active entities are published, must be set to true for this entity to be enabled.
Configure Entity Options Table
- For each of the entities specified in the Publish Entity Table, create an entry in the Entity Options table (using the same entity ID) to define the type of file (name = "file_type").
- You wish to publish to SFTP. You may select between CSV, TSV, and parquet formats.
Updated 10 months ago