SQL Server Publishing

Empower supports publishing datasets to SQL servers automatically when the underlying data is updated.


Prerequisites

  1. A SQL server and its credentials.
  2. Data in your Empower delta lake to publish.
  3. Advanced Options is toggled on for your account.

Steps

Add the SQL server as a Data Source in your Empower deployment

  1. Navigate to your Empower deployment's Data Sources tab in the UI.
  2. Click + to add a new data source, navigate to SFTP or use the search bar.
  3. Fill in the required fields:
    1. Server name
    2. Database name (schema)
    3. Username
    4. Password
    5. Always Encrypted (toggle)
    6. Trust Server Certificate (toggle)
    7. Encrypt (toggle)
    8. Persist Security Info (toggle
    9. Connection Timeout (optional in seconds)
  4. Click save and connect to save the new SQL data source.

Configure DatabaseToStepCommand

  1. Navigate to the Advanced Options tab on the left side of the screen.
  2. Add a new record to the DatabaseToStepCommand table.
  3. Use the SQL server in the Database ID field.
  4. Configure this record to be a PUBLISH a PUBLISH phase.
  5. Select a new load group number if you want this to run independently OR add to an existing loadgroup to have it run with the rest of that load group.
  6. Write 1 for the Execution Order field if this is the only command to be executed; otherwise configure to be in whatever order you wish.
  7. You can leave the Source Schema Suffix and Target Schema Suffix fields as they are defaulted.
  8. You can leave the Item Name to Execute field set to None.

Configure Publish Entity Table

  1. Identify the entities (tables) you want to publish to the target SQL server.
  2. Create a new record for each table you wish to publish in the Publish Entity Table.
  3. Ensure each new Publish Entity entry includes at least the following columns:
    1. Load Group: matches DatabaseToStepCommand entry’s loadgroup.
    2. Target Entity: this could be name of the entity or desired path (month/date/Target_entity_name) as it will be written to a path in the SQL server.
    3. Source Catalog: the source table’s Unity Catalog.
    4. Source Schema: the source table’s schema.
    5. Source Entity Name: the source table’s name.
    6. Source Entity Filter: this adds an optional where condition parameter to the (ex: account_type=”Debit”).
      1. Keep in mind that this filter must be valid SQL (everything that could be part of a SQL WHERE clause, just without the preceding WHERE in the statement).
      2. For proper behavior, make sure to use fields that are contained within the table you are specifying.
    7. Target ID: The DatabaseListID of the target SQL server (this should be the same as the DatabaseToStepCommand’s DatabaseID).
    8. Is Active: only active entities are published, must be set to true for this entity to be enabled.

Configure Entity Options Table

  1. For each of the entities specified in the Publish Entity Table, create an entry in the Entity Options table (using the same entity ID) to define the following options:
    1. sql_table_mode: defines the mode in which this table will be written to. Select between two choices:
      1. overwrite: drops the old table if exists and completely overwrite with new data. Note by default this does not include the old indexes
      2. append: add the new data to the end of this table's existing data
    2. sql_table_truncate: a boolean value (true or false) which defines index behavior if overwrite is selected in the previous option.
      1. if true: this will retain the indexes
      2. if false: indexes will be dropped with the table in overwrite mode

Configure Retry and Batch Options

  1. Retry on Failure (retry_on_failure)

    • Defines the behavior when a batch operation fails.
    • Options:
      1. True:
        • Attempts to retry row-level publishing for failed batches.
        • Logs each failed row for detailed analysis in the row log table.
      2. False:
        • Logs only batch-level success/failures in the batch log table without attempting retries at the row level.
    • Significance: Ensures data consistency while isolating problematic rows.
  2. Batch Size (batch_size)

    • Defines the number of rows processed per batch.
    • Behavior:
      1. Large batch size: Improves throughput but requires more memory.
      2. Small batch size: Suitable for constrained environments or granular control.
    • Significance: Enhances flexibility and scalability in data processing workflows.

Configure Logging Tables

  1. Batch Log Table (sql_batch_logs)

    • Logs metadata and outcomes of each batch operation.
    • Schema:
      • batch_log_id: Unique identifier for the batch.
      • timestamp: Timestamp indicating when the batch operation occurred.
      • target_table: The name of the target table in SQL Server.
      • batch_error_message: Error message if the batch fails.
      • sourceRowCount: Total rows in the source data.
      • rowsRead: Number of rows read during processing.
      • rowsSkipped: Number of rows skipped during processing.
      • rowsWritten: Number of rows successfully written to the target table.
      • FailedRecordCount: Count of rows that failed during processing.
      • publishSuccess: Indicates success (1) or failure (0) of the batch operation.
  2. Row Log Table (sql_row_level_logs)

    • Logs information about individual rows that fail during processing (when retry_on_failure=True).
    • Schema:
      • row_log_id: Unique identifier for the row log entry.
      • batch_log_id: Links the row entry to its parent batch in the batch log table.
      • timestamp: Timestamp indicating when the row operation occurred.
      • target_table: The name of the target table in SQL Server.
      • row_status: Status of the row operation (FAILED, SUCCESS).
      • error_message: Error message detailing the failure for the specific row.