Getting Started

There are three main parts to the setup of the Empower for Dynamics package:

  1. Prerequisites and Empower Deployment
  2. Empower for Dynamics Prerequisites (below)
  3. Setting up your Notebooks (below)

General Data Flow / Process

When using Empower, customers are sourcing data from on-prem or other systems into the lakehouse ("Data Acquisition"), doing data munging and mapping ("Analytics Engineering"), and then pushing data out to Dynamics and other systems ("Data Publishing").

Data Acquisition

Pulling data in
Filedrop or Direct

Analytics Engineering

Mapping and transforming
SQL

Data Publishing

Pushing data out
Dynamics Connector

  1. Data Acquisition: In large part, the data acquisition phase of using DDU is highly UI-driven. This can be done one of two methods:
    1. Filedrop Upload: Drag and drop template files into the Lakehouse. This is explained in the filedrop guide.
    2. Extract/Ingest from Data Sources: The "traditional" Empower approach, please visit Data Acqusition for how to get data loaded into the lakehouse.
  2. Analytics Engineering: As we get to Analytics Engineering, that is when we are going to start using the interactive notebook experience in Azure Databricks and Microsoft Fabric to move data across the lakehouse. Customers can wire (orchestrate) these notebooks as needed through the Empower UX.
  3. Data Publishing: Finally -- we execute the publishing capabilities. Please note that the capabilities described in this section differ slightly from the Data Publishing section which covers how to ship data to SQL Server (for Virtual entities) and other locations, like Power Bi.

Empower for Dynamics Prerequisites

🚧

This setup process assumes Empower itself has already been deployed.

Obtain Credentials via Azure AD Application

The recommended strategy to authenticate to the Dynamics environment is to use an Azure AD application. The Azure Tenant ID, Application ID, and Client Secret are required for creating the connection string. Note that the same Azure application can be used to authenticate to both F&O and CRM environments if within the same tenant.

The application needs to have API permissions for Dynamics ERP. From the azure portal for your application:

  1. "API Permissions" > "Add" > "Delegated Permissions"
  2. "Required Permissions" > "Add"
  3. Under "Select an API" choose "Dynamics ERP"
  4. Finally, choose "Delegated Permissions for OAuthGrantType = CODE"

Getting Started

Empower for Dynamics uses the notebook experience in Databricks or Microsoft Fabric to orchestrate data movement from the Datalake to a Dynamics 365 deployment. For more information about Databricks notebooks, see the Introduction to Databricks Notebooks guide.

Install Python Package

The code for the empower DDU python package is available within this repository. The package can be installed from the empower PyPI repository:

pip install --index-url https://artifacts.empoweranalytics.io/repository/pypi-group/simple empowerddu==<version>

Connection Strings

The JDBC Dynamics Driver uses Open Authorization (OAuth) to validate the connection to D365. When connecting to Dynamics for the first time, the environment url and the Azure AD application credentials are needed. The connection strings for both CRM and F&O are similar but with minor differences. An example of both is shown below.

d365_url      = ... # A CRM or F&O environment url
azure_tenant  = ... # The Azure tenant ID
client_id     = ... # The Azure AD application ID
client_secret = ... # The Azure AD application secret

# CRM
d365_connection_string = f"jdbc:dynamicscrm:;AuthScheme=OAuth;InitiateOAuth=GETANDREFRESH;URL={d365_url};AzureTenant={azure_tenant};OAuthClientID={client_id};OAuthClientSecret={client_secret};CRMVersion=CRMOnline;OAuthGrantType=CLIENT"

# F&O
d365_connection_string = f"jdbc:dynamics365:;AuthScheme=OAuth;InitiateOAuth=GETANDREFRESH;OrganizationURL={d365_url};AzureTenant={azure_tenant};OAuthClientID={client_id};OAuthClientSecret={client_secret};OAuthGrantType=CLIENT"

NOTE: Add CrossCompany=true to the F&O connection string to enable cross company queries for reading. Optionally, also add CustomUrlParams=%24filter=dataAreaId%20eq%20%27ABCDEF%27 (replace "ABCDEF" with the legal entity code) to only see data from a specific company.

The connection string can be tested by reading some data from the environment. If a valid connection string has been assembled, the below code will display a list of all tables entities in the environment.

from empowerddu.client import DynamicsClient

client = DynamicsClient(d365_url)
client["sys_tables"].display()

Once the connection string is ready, it is recommended to store it within a key vault for easier access, reusability, and enhanced security.

Run Time Key

An RTK is required to license the Dynamics connectors utilized by Empower. Please consult your Empower contact.

OAuth Settings File

NOTE: This section is optional.

Users can add OAuthSettingsLocation=/dbfs/path/to/.oauth_settings_fno.txt to the connection string to encrypt the connection string in a permanent file location on DBFS.

Once the connection string has been successfully used at least once, the connection string can be simplified to just reference the generated OAuth settings file, though this is not required and no performance differences have been observed.

# CRM
d365_connection_string = f"jdbc:dynamicscrm:;AuthScheme=OAuth;InitiateOAuth=REFRESH;URL={d365_url};OAuthSettingsLocation=/dbfs/path/to/.oauth_settings_crm.txt"

# F&O
d365_connection_string = f"jdbc:dynamics365:;AuthScheme=OAuth;InitiateOAuth=REFRESH;OrganizationURL={d365_url};OAuthSettingsLocation=/dbfs/path/to/.oauth_settings_fno.txt"

Data Access Setup

External Systems

The python package's DataClient class provides programmatic access to data in external environments for use in the data migration. Currently supported clients are for any JDBC connector and a specialized client for the Dynamics 365 connectors. Other clients will be supported in the future as needed.

To initialize a client for Dynamics 365, simply pass a valid connection string, prefixed with jdbc:dynamics365(crm):, to the client constructor.

from empowerddu.client import DynamicsClient

d365_url = get_connection_string()  # Likely from key-vault
client = DynamicsClient(d365_url)

Reading

The read_table method is used to read a table entity from the external environment into a Spark DataFrame. A list of all available tables can be checked by reading from the sys_tables table entity. Details about specific table column metadata is in the sys_tablecolumns table entity.

# Read data from the table entity my_table
# into a Spark DataFrame
df = client.read_table("my_table")

# Equivalent to the above
df = client["my_table"]

# Some SQL operations are also supported
df = client.query("SELECT * FROM my_table WHERE id = 1")

# Result is always a Spark DataFrame
N = df.count()
...

Writing

To write a Spark DataFrame to the client environment, use the append table method. This will append the DataFrame to an existing table entity in the client.

# Append a DataFrame to the table entity my_table
client.append_table(df, "my_table")

Supported Data Clients

Class NameDescription
JDBCClientClient access to systems using JDBC drivers
DynamicsClientClient access to D365 cloud environments, F&O, CRM, etc.

Lake House

The DataSource class is used for accessing data stored within the lake house. Currently supported sources are for parquet files, Delta files, and Delta tables in the hive metastore. Support for other sources will be added as needed.

DataSources provide a common interface for reading data. The read method will return a Spark DataFrame with the latest version of the data in the source. For sources that support data versioning, specific versions can be read by passing the version number to the method, source.read(version=version). A streaming dataframe can also be initialized using the stream() method. This will return a streaming DataFrame that can be used with an auto-loader or other functionality.

from empowerddu.source import TableSource

source = TableSource("my_database.my_table")

# Read the latest version of the data
df = source.read()

# Read a specific version of the data
df_v2 = source.read(version=2)

# Start a streaming DataFrame
df = source.stream()

The contents of a source can be overwritten using the source.overwrite(df) method. This will overwrite the contents of the source with the provided DataFrame for exploratory transformations.

Support Data Sources

Class NameDescription
ParquetSourceSource access to parquet files in the lakehouse
DeltaSourceSource access to Delta files in the lakehouse
TableSourceSource access to Delta tables in the metastore