Logging and error handling
Logging and Error Handling
This section describes the behavior of how the DDU library logs messages for visibilty and debugging purposes.
Logging
The DDU library records information about each batch written to the target environment in a log table. The log table is created in the source environment with the name empower_ddu.ddu_batch_log
. The log table contains the following information:
Column Name | Description |
---|---|
log_id | A unique identifier for the log record. |
job_id | The workflow job id; Null if run outside of a workflow or job. |
run_id | A unique identifier generated for each call. Used to group together batches from the same load of data. This is the column used to map to the quarantine records. |
source_table | The name of the table being read from in the source. Also used to name the quarantine table. |
table_name | The name of the dynamics table/object the data is loaded into. |
partition_id | The streaming partition number; always 0 for non-streaming loads. |
partition_size | The number of records in the partition or batch of data. |
write_time | The total duration of the load operation (seconds). |
time | The timestamp the write completed and the log record was created. |
run_summary | A JSON object containing the counts for successful and failed records. When a global error occurs, this column will contain the error message. |
quarantine_table | The name of the database and table in the metastore failed records were written to, see the next section. |
Quarantine
The Quarantine is a location where records that fail processing are stored. Failed records are recorded together with the error message and metadata to track the attempted behavior and the load run id for each record. The quarantine table is created in the source environment with the name empower_ddu.<source_table_name>
, by default, at the start of the write operation.
The entire record is recorded as it was when the load to Dynamics was attempted along with the following metadata columns:
Column Name | Description |
---|---|
__ddu_mode | The mode of the load operation, usually insert or update. |
__ddu_error | The error message that caused the record to fail. |
__ddu_run_id | The run id of the load operation. Maps to the "run_id" column in the log table. |
The quarantine records are maintained across multiple pushes from the same source table, the run_id is used to distinguish records from different runs. The tables are useful for debugging, fixing broken records, and tracking failure history. In addition, the quarantine tables can be exported to a file, corrected, and re-loaded to the target environment.
# Option 1: Fetch the quarantine used by the migrator
quarantine = migrator.quarantine
# Option 2: Standalone quarantine management in a separate process
from empowerspark.target.type.dynamics.quarantine_db import QuarantineDB
quarantine = QuarantineDB("empower_ddu")
table_name = "account"
run_id = "abc-123"
# Display failed records with error messages
quarantine.load_from_quarantine(table_name, run_id).display()
# Export to a CSV file
quarantine.export_quarantine(table_name, "dbfs:/empower/ddu_artifacts/account/quarantine_export.csv", "csv", run_id)
DDU Report
TODO: PowerBI report details and usage.
Updated 7 months ago