Logging and error handling

Logging and Error Handling

This section describes the behavior of how the DDU library logs messages for visibilty and debugging purposes.

Logging

The DDU library records information about each batch written to the target environment in a log table. The log table is created in the source environment with the name empower_ddu.ddu_batch_log. The log table contains the following information:

Column NameDescription
log_idA unique identifier for the log record.
job_idThe workflow job id; Null if run outside of a workflow or job.
run_idA unique identifier generated for each call. Used to group together batches from the same load of data. This is the column used to map to the quarantine records.
source_tableThe name of the table being read from in the source. Also used to name the quarantine table.
table_nameThe name of the dynamics table/object the data is loaded into.
partition_idThe streaming partition number; always 0 for non-streaming loads.
partition_sizeThe number of records in the partition or batch of data.
write_timeThe total duration of the load operation (seconds).
timeThe timestamp the write completed and the log record was created.
run_summaryA JSON object containing the counts for successful and failed records. When a global error occurs, this column will contain the error message.
quarantine_tableThe name of the database and table in the metastore failed records were written to, see the next section.

Quarantine

The Quarantine is a location where records that fail processing are stored. Failed records are recorded together with the error message and metadata to track the attempted behavior and the load run id for each record. The quarantine table is created in the source environment with the name empower_ddu.<source_table_name>, by default, at the start of the write operation.

The entire record is recorded as it was when the load to Dynamics was attempted along with the following metadata columns:

Column NameDescription
__ddu_modeThe mode of the load operation, usually insert or update.
__ddu_errorThe error message that caused the record to fail.
__ddu_run_idThe run id of the load operation. Maps to the "run_id" column in the log table.

The quarantine records are maintained across multiple pushes from the same source table, the run_id is used to distinguish records from different runs. The tables are useful for debugging, fixing broken records, and tracking failure history. In addition, the quarantine tables can be exported to a file, corrected, and re-loaded to the target environment.

# Option 1: Fetch the quarantine used by the migrator
quarantine = migrator.quarantine

# Option 2: Standalone quarantine management in a separate process
from empowerspark.target.type.dynamics.quarantine_db import QuarantineDB
quarantine = QuarantineDB("empower_ddu")

table_name = "account"
run_id = "abc-123"

# Display failed records with error messages
quarantine.load_from_quarantine(table_name, run_id).display()

# Export to a CSV file
quarantine.export_quarantine(table_name, "dbfs:/empower/ddu_artifacts/account/quarantine_export.csv", "csv", run_id)

DDU Report

TODO: PowerBI report details and usage.