FileDrop for Data Driven Upgrade

🚧

This guide is for DDU - Filedrop only

Please see the Data-Driven Upgrade Docs for more technical details and developer-level specifications of the DDU packages.

Overview

Data-Driven Upgrade (DDU) using Filedrop is the process of uploading file/files into Databricks File System (DBFS). Customers typically load data into CSV/Excel templates, which are then uploaded into their Empower deployment and then written directly to Dynamics CE and FO tables using a simply defined object-file mapping file as a guide.

Below is a simple process guide for DDU via Filedrop:

Data-Driven Upgrade Process
1
Step 1: Prepare Data
Load data into CSV/Excel templates for the Data-Driven Upgrade (DDU) process.
3
Step 2: Write the object-mapping file
Create the object mapping file to inform Empower which files will be written to which Dynamics entities.
2
Step 3: Upload Data to DBFS
Upload the prepared file/files to the Databricks File System (DBFS) using Filedrop.
4
Step 4: Run DDU Workflow
Data is written directly to Dynamics CE and FO tables using an object-file mapping file as a guide.
5
Step 5: Check for errors
Identify and correct any errors that may have occurred during the data load using the PowerBI report and logging table. Rerun the DDU process as needed.

Prerequisites

Empower Deployment Prerequisites

All details around deployment Empower are described in the Prerequisites and Empower Deployment page. Please work with an Empower specialist if you have any questions about this process.

DDU-specific Prerequisites

Obtain Credentials via Azure AD Application

The recommended strategy to authenticate to the Dynamics environment is to use an Azure AD application. The Azure Tenant ID, Application ID, and Client Secret are required for creating a connection string.

📘

Tip: This works for both CE and FO

The same Azure application can be used to authenticate to both F&O and CRM environments if within the same tenant.

The application needs to have API permissions for Dynamics ERP. From the Azure Portal that hosts your AD application:

  1. "API Permissions" > "Add" > "Delegated Permissions"
  2. "Required Permissions" > "Add"
  3. Under "Select an API" choose "Dynamics ERP"
  4. Finally, choose "Delegated Permissions for OAuthGrantType = CODE"

Obtain Databricks Workspace Access

Ensure that you have access to the Empower-deployed Databricks Workspace, which comes standard to any Empower deployment. Work with the dedicated Empower engineer to gain access during the Empower deployment phase.

DDU Process

Step 1: Prepare Data

Prepare your data by using the tried-and-true Excel/CSV template process already part of the DMF migration path today. Make sure you do all your data mapping before finalizing it into the tabular template, as it will not be transformed further using the Empower DDU via Filedrop migration path.

Step 2: Write Object-Mapping File

A JSON file configuration can be uploaded along with the excel data template files. A JSON file is a way to store information in a structured, easy-to-understand format. In our case, we have a JSON file that helps us map files (like Excel templates) to destination entities in Dynamics.

We can store some rules for each file/entity mapping, like whether we should replace existing data in Dynamics 365 or if we should transfer everything in one go. We also can specify which legal entity in Dynamics 365 the data should go to.

Here's an example mapping JSON file:

{
  "entity_one.xlsx": {
    "target_entity": "dynamics_entity_one",
    "legal_entity": "",
    "mode": "append",
    "key_columns": "",
    "batch_size": 15,
    "overwrite": false,
    "atomic": true,
  },
  "entity_two.xlsx": {
    "target_entity": "dynamics_entity_two",
    "legal_entity": "legal_entity_1",
    "mode": "append",
    "key_columns": "",
    "batch_size": 15,
    "overwrite": true,
    "atomic": false,
  },
  "test_01.xlsx": {
     "target_entity": "test_01",
     "legal_entity": "US01",
     "mode": "upsert",
     "key_columns": "employeeid,empnumber",
     "batch_size": 15,
     "overwrite": true,
     "atomic": false
    }
}

Think of it like a recipe book, where the JSON file is the book itself, and each object (like entity_one.xlsx) is a recipe.

  • The file name, such as entity_one.xlsx, represents the name of a file we want to send to Dynamics 365.
  • The target_entity is the specific place in Dynamics 365 where the data should go. Defaults to the root name of the file.
  • The overwrite rule tells us if we should replace the existing data in that place with new data (when true) or just append new data without replacing anything (when false).
  • The atomic rule designates whether we should transfer all the data at once (when true) or piece by piece (when false).
  • The legal_entity is a required parameter if target entity has legal_entity column otherwise this is an optional field.
  • The mode represents either you want to append or upsert the records in the target entity. This an optional field, default behavior is append
  • The key_columns parameter is a comma delimited list of fields that represent primary key of the table. This is mandatory field if the mode = 'upsert', otherwise this is an optional field.
  • The batch_size This is an optional field and represents number of records to be processed in a single batch. If not specified, batch_size is evenly distributed amongst worker threads.

📘

The "Atomic" Parameter

This creates a "succeed and fail together" behavior in your entity. In other words, if any part of the write fails, all of the entity write fails.

Your Dynamics modification for that specific entity would not occur unless the entire entity writes successfully.

Step 3: Upload Data

You can upload your template files using two different methods. For simple loads, use the Databricks Workspace UI, as it is intuitive to use. For much larger loads (100+ files within a single upload session), we recommend using the Databricks CLI Tool, which can quickly move a directory into DBFS with a single command.

Via DBFS UI

The first is to use the DBFS upload utility from the Databricks UI. Currently this is found by: "New" at the top left of the workspace -> "File Upload" -> "Upload to DBFS". Specify the dropbox location in the text box, then drag and drop the files.

Select "New" on the top of the left navigation pane

Select "New" on the top of the left navigation pane

Select "File upload" from the menu

Select "File upload" from the menu

Select "upload to DBFS" from the next screen

Select "upload to DBFS" from the next screen

Choose your desired location in file storage using an intuitive name, like "ddu_dropbox" for example.

Choose your desired location in file storage using an intuitive name, like "ddu_dropbox" for example.

Follow the instructions to upload files or select from file browser as directed. Make sure to remember the file location ("ddu_dropbox" in this example) for the upload workflow in Step 4.

Via Databricks CLI Tool

The Databricks CLI Tool is a python package that can be installed with pip. The CLI can be used to upload files to the dropbox location on DBFS directly from any machine with Databricks workspace authentication. See the Databricks CLI documentation for more information on this process.

🚧

Don't forget your mapping file!

If you built one, do not forget to upload your mapping file alongside your template files!

Step 4: Run DDU Workflow

Databricks Workflows allow you to build repeatable, modular notebook execution. Please read the official Introduction to Databricks Workflows documentation for more details on Workflows. Navigate to the Databricks Workflows page by selecting "Workflows" in the left navigation pane.

Select the DDU - Filedrop workflow, which should come pre-deployed in your Databricks workspace.

Select the DDU - Filedrop workflow, which should come pre-deployed in your Databricks workspace.

Navigate to the workflow tasks using the "Tasks" tab.

Navigate to the workflow tasks using the "Tasks" tab.

The 3-default steps in the DDU - Filedrop workflow: the tasks and respective notebooks

The 3-default steps in the DDU - Filedrop workflow: the tasks and their respective notebooks.

The DDU - Filedrop workflow is already within your environment. The DDU - Filedrop workflow contains 3 core tasks which are completed serially during the workflow's run:

  1. Init: Initializes critical DDU functions, like the logging table. Takes DDU variables as input such as the location of the file dropbox in DBFS.
  2. Ingest: Runs through every file in the dropbox folder and ingests it into the Data Lake. Maps the ingested files into respective tables using the mapping configuration file as a guide.
  3. Migrate: Publishes tables out to the Dynamics 365 deployment (CE or FO) using the mapping configuration file as a guide to determine which tables are pushed to which entities and other settings.

The whole DDU - Filedrop workflow is idempotent, meaning that each time the workflow is run, only data that has not been ingested yet will be migrated. This is ensured by counting the number of records already migrated, checking the validation output, and the checkpointing location from the auto-loader. This means you wont need to drop any tables in the Data Lake in between runs to avoid duplications in Dynamics.

Click on the dropdown arrow next to the RUN button, then click on run with different parameters to see a list of all the workflow parameters and their default values. You may manipulate these values to produce different behaviors for the workflow.

Workflow Parameters

Below is a list of all the parameters you would find in the Run now with different parameters popup menu.

NameDescriptionDefault Value
databasethe name of the database in the data lakehouseddu_dropbox_demo
state_tablethe name of the table which keeps track of the migration tablesstate
log_tablethe name of the log table within the database where all logs will be writtenbatch_log
drop_databasea boolean for setting whether the entire database should be dropped at the beginning of this workflow runFALSE
dropbox_folderthe location in DBFS where the file templates are uploadedbfs:/Filestore/tables/ddu_dropbox
config_namethe name of the migration's configuration mapping file. This is expected to be within the folder defined by "dropbox_folder"ddu_config.json
migration_rootthe root location of the raw tables as they are ingested from the dropbox folderdbfs:/mnt/RAW/ddu/demo/dropb
download_quarantinea boolean value of whether or not to automatically download the quarantine table to the migration root location in DBFS after this runFALSE
quarantine_dbthe name of the table which keeps track of all failed batchesddu_quarantine
reset_checkpiontsa boolean value to reset any checkpoints made from any previous runFALSE
secretthe name of the keyvault secret within the Empower keyvault that contains the connection string to the Dynamics environmentempowerdynamics-jdbc-oauth-fno-isd

📘

The "Secret" Parameter

This parameter refers to the name of the secret in your Azure Keyvault (within the same tenant as Empower) which contains the connection string for your Dynamics environment.

For examples of what FO and CE connection strings look like, please see the DDU Getting Started guide's section titled "Connection Strings".

Step 5: Identify Errors and Rerun

Log Table

The log table (default location batch_log) contains all logged ingest and upload actions for a given run. Those with SQL experience can quickly identify records by querying the log table as defined in the Init task of Step 4. To query this table, from any notebook in Databricks, run the following command (assuming default values):

SELECT * FROM [catalog].ddu.batch_log

PowerBI Report

We also deploy a PowerBI report that can keep track of the log table. Please speak with your delivery manager for how to access this report in your workspace.

Quarantine Table

The quarantine table (default location at ddu_quarantine) keeps track of failed batches of data during the publish phase.

📘

Quarantined Batches

At the end of a run, the default DDU settings will download the entire quarantine table in the form of a CSV file to DBFS. From there, you can download it to your local machine, fix the items in question, then reupload via Step 1 and restart the process.

With default settings, quarantine file will be located under the ddu_artifacts/ folder in DBFS.

You may query the quarantine table at any time by running the following SQL statement (assuming default values):

SELECT * FROM [catalog].ddu.ddu_quarantine

FAQ

Q: What is DDU using Filedrop?

A: Data-Driven Upgrade (DDU) using Filedrop is the process of uploading file/files into Databricks File System (DBFS). Customers typically load data into CSV/Excel templates, which are then uploaded into their Empower deployment and then written directly to Dynamics CE and FO tables using a simply defined object-file mapping file as a guide.


Q: What are the prerequisites for DDU?

A: The prerequisites for DDU include

  1. Empower Deployment: All details around deployment Empower are described in the Prerequisites and Empower Deployment page.
  2. Obtain Credentials via Azure AD Application: The recommended strategy to authenticate to the Dynamics environment is to use an Azure AD application.
  3. Obtain Databricks Workspace Access: Ensure that you have access to the Empower-deployed Databricks Workspace.

Q: How to upload data in the DDU process?

A: You can upload your template files using two different methods. For simple loads, use the Databricks Workspace UI. For much larger loads (100+ files within a single upload session), we recommend using the Databricks CLI Tool.


Q: What is the purpose of the object-mapping file in the DDU process?

A: A JSON file configuration can be uploaded along with the excel data template files. This helps us map files (like Excel templates) to destination entities in Dynamics.


Q: How to run the DDU Workflow?

A: Databricks Workflows allow you to build repeatable, modular notebook execution. The DDU - Filedrop workflow contains 3 core tasks which are completed serially during the workflow's run: Init, Ingest, and Migrate.


Q: How to identify errors in the DDU process?

A: Errors in the DDU process can be identified by querying the log table as defined in the Init task of Step 4. You can also track the log table using a PowerBI report that comes with the deployment.


Q: Can the DDU process be repeated?

A: Yes, the DDU - Filedrop workflow is idempotent, meaning that each time the workflow is run, only data that has not been ingested yet will be migrated.