Pipeline orchestration
Pipeline Orchestration
Dynamics Publishing
The dynamics publishing framework is the simplest method to orchestrate a data integration with Dynamics.
- Stand up a new data source as described in section 1. Be sure to create the additional secret if multiple clients are desired.
- Follow the general instructions for setting up a publishing flow.
- The source database and table should be the name of the table containing staged data to push to Dynamics.
- The target table should be the name of the table in Dynamics to push the data to.
- Set the publish entity options for each entity.
Each entity needs to be configured using these options:
mode
- The write mode for the operation: "upsert", "insert", or "append". See section 2 for details on each of the modes. The default mode is "upsert" if not provided.key_columns
- The primary key columns for the entity. This is used to determine if a record is new or needs to be updated. Required when mode is set to "upsert" or un-specified.
Dynamics File Drop
Note: As of writing, the dynamics file drop workflow needs to be requested for deployment.
The file drop workflow provides a method to load data into Dynamics using a simple file upload procedure. The procedure is as follows:
-
Create a Configuration File
The configuration file is an excel or json file containing parameters for how to push each file. The configuration file only needs to be created one time and can be re-used for multiple file drops of the same file (for example, if errors occurred).
-
Excel
Create an excel file with the following columns
Column Name Description file_name The name of the file to load. target_entity The name of the entity in the target environment. mode The write mode for the operation. key_columns The primary key columns for the entity. Required when mode is set to "upsert" or un-specified. legal_entity The legal entity to use for the load (FnO loads only). -
JSON - Support coming soon.
-
-
Upload the Configuration File
The configuration file should be uploaded to the file drop location. The file drop location is a shared location that the DDU library has access to. The location is typically the FileStore on the Databricks File System. Again, the configuration file only needs to be uploaded once and can be used for multiple executions. If the file needs to be changed, the existing file can be deleted or archived and replaced with the new version.
-
Configure the Workflow
Locate the DDUFileDrop workflow within the "Workflows" tab of Databricks workspace. Click "Run Now with Different Parameters" and verify the settings are all correct.
Setting Description dropbox_folder The file drop location containing the configuration file. config_name The name of the configuration file. secret_name The name of the keyvault secret containing the environment details. quarantine_download_folder The location to download the quarantine file if errors occur. catalog_name The metastore catalog to use. -
Upload the Data File(s)
The data file (or files) should be uploaded to the same file drop location. Any number of files can be loaded in a single execution. Currently, only excel format is supported for these files.
-
Execute the Workflow
Execute the workflow by simply clicking "run now".
The workflow will read each excel file within the directory into a Spark Dataframe and load it to the environment specified in the set keyvault secret. After the load is complete, the data files will be moved to an archive folder within the file drop location for reference and debugging. The log and quarantine tables will be updated with the results of the load to verify the success of the operation.
If any errors occured, the quarantine table will be exported to a file (currently CSV) and the file will be moved to an error folder within the file drop location. This file can be downloaded locally where errors can be addressed before re-uploading the file to the file drop location.
Updated 8 months ago