Current Empower Resources

Resources

This document covers a technical description of the Empower resources which will be deployed to the customer's tenant.

Subscription and Resource Group layout

We highly recommend separating Empower's production environment at both a subscription and resource group level. Our recommendation is two subscriptions, one non-production and one production.

We also recommend having at least three environments. A development and staging environment which would be placed in the non-production subscription and a production environment in the production subscription.

Architecture Diagram

Note: While this diagram only shows one non-prod resource group, we highly recommend at least two non-prod resource groups. Multiple have been omitted to help keep the diagram clean but all non-prod resource groups are identical.


Service Principals and Access

Please see the Empower Access Details documentation for specific details regarding what child service principals will be created in your environment.

Empower Environment

Description: Empower will deploy these resources to each environment resource group which form the core of the Empower product.

ResourceDescription
Databricks Access ConnectorProvides databricks with a storage credential that is used to setup an external location in our storage account for the databricks catalog.
Databricks WorkspaceContains compute, workflows and notebook code required for Empower.
Datafactory - ClientThis datafactory can be used for custom datafactory code and will not be overwritten by our deployments.
Datafactory - EmpowerContains Empower product pipelines and parameters. These get deployed every release and any custom changes made to this datafactory will be overwritten upon release. This datafactory contains a linked IR that points to the production IR host datafactory. See the production specific section below.
Event Grid TopicNotifies databricks when a storage blob has been created by datafactory.
KeyvaultHolds all the secrets required by a specific Empower environment to function. This keyvault is currently managed through the use of access policies which must be applied to the keyvault in order to access secrets.
Log Analytics WorkspaceHolds metric logs for the storage account deployed in your environment.
Network Security GroupThis network security group is specifically for the databricks vnet. Contains the rules specifically outlined by databricks in their documentation.
Storage AccountHolds any data extracted by Empower in a datalake. The data will not leave your tenant. This storage account is accessible through a firewall. Databricks will access the storage account over private connection.
Virtual NetworkContains at least two subnets that each have 256 addresses. These subnets are the public and private subnets for databricks compute. 512 addresses are available for future features and additional subnets.

Production Environment

Description: Empower will deploy these additional resources to your production environment.

ResourcesDescription
Datafactory - Integration runtimeThis data factories job is to hold the integration runtime for Empower. Each Empower datafactory and each client datafactory then links to the integration runtime datafactory. Only one IR datafactory exists by default per organization.

Unity Catalog

Description: One Unity Catalog Resource Group is set up for each Azure region to which Empower has been deployed. This group contains resources listed in the table below. If your environment already contains a metastore in this region, only the networking components of this resource group will be set up.

ResourcesDescription
Databricks Access ConnectorThis is used to setup the storage credential for the above storage account.
Private DNS ZonesThis resource group contains two private dns zones. One for blob and the other for dfs A records for the various Empower private endpoints deployed in this resource group.
Private EndpointsEach Empower storage account has a dfs and blob private endpoint within this resource group. These allow resources in the Empower resource groups to communicate with the storage accounts privately.
Storage AccountThis storage account holds a container associated with the metastore. It can hold managed tables. Empower currently uses external tables so this container is not used by default.
VnetThis vnet contains one subnet which holds the various private endpoints setup by Empower. Currently this subnet holds the private endpoints for Empower storage account. This vnet is peered to each databricks injected vnet.

Logging resource group

Description: One logging resource group will be setup for each Empower organization. This resource group is managed by Hitachi to provide security alerts for Empower.

ResourcesDescription
API connectionsThe API connections are sub-resources of the logic apps and provide connections from the Logic Apps to the Sentinel and Defender for Cloud services.
Log analytics workspacesThe Workspaces collects the resource activity logs from across the Empower deployment, and with the Sentinel service enabled on it, generates security alerts.
Logic AppsThe Logic Apps hook the Sentinel alerts and Defender for Cloud alerts into our ticketing system for review by the team.
Storage AccountThis storage account is used to store read logs for the other Empower storage accounts. A storage account is used rather than a log analytics workspace due to storage accounts being much cheaper than a log analytics workspace.

Databricks Workspace

Description: The following resources are set up at the databricks workspace level.

ResourcesDescription
CatalogEmpower sets up one catalog per environment. These catalogs, unless desired by the customer, are not isolated. The catalog are associated with the corresponding external location defined above.
ComputeAll compute used by Empower is job specific compute. It is created and spun up upon job run. Empower deploys two pre-defined clusters named DataEngineering-Interactive and ModelBuilder. These compute resources are deprecated and are awaiting safe removal from Empower.
External LocationEach environment specific storage account setup by Empower has a corresponding external location setup by the deployment process. This external location uses the corresponding external location.
NotebooksEmpower deploys many notebooks to the workspace. See the Databricks Code section of this article for more information about these notebooks.
PoolsStandard compute makes use of T-shirt sized pools deployed as part of Empower with the following names: XS-Standard-Pool, S-Standard-Pool, M-Standard-Pool, L-Standard-Pool, XL-Standard-Pool. These pools currently use DSv3 compute for clusters and have a pre-loaded spark version of 14.3 LTS. Any job clusters that are spun up make use of these pools. Empower also deploys three other pools: DataEngineeringDriverPool, DataEngineeringWorkerPool, and ModelBuilderPool. These are deprecated and are awaiting safe removal from Empower.
SQL WarehousesEmpower deploys one SQL Warehouse named PowerBI-Interactive. It is a small sized classic SQL warehouse. Databricks will soon be rolling out features to allow us to change this warehouse over to a serverless warehouse. Once this is rolled out generally to all customers, this warehouse will be shifted over to serverless. See documentation
Storage CredentialEach storage account deployed as part of Empower has a corresponding access connector deployed with it. This access connector is used to setup storage credentials for each storage account.
WorkflowsEmpower deploys four workflows: IaC - GrantDatabasePermissions, IaC - OneTimeMountSetup, OptimizeDelta, Vacuum Delta Tables. Additional workflows may be created by the customer or Data Engineer/Architect.

Databricks Code

Description: This section covers the organizational structure of the Empower Databricks code.

Notebook Folder Structure


FolderDescription
DataEngineeringThe main organizational folder for Empower product code.
DimensionsSource queries that are creating dimension tables for a data warehouse practice. These provide structured labeling information.
DmBuildThe main organization folder for client specific code.
DWHCurrently outdated assets awaiting removal.
EmpowerSystemEmpower product code and is maintained by the Empower product team. Custom notebooks must be placed outside this folder. Please talk with your Data Engineer/Architect for a good place to put them.
FactsSource queries that are creating fact tables for a data warehouse practice. These provide measurements, metrics or facts about a business process.
UCMigrationOne time run notebook useful for implementing Unity Catalog.

Databricks Account

Description: The following resources are set up at the databricks account level. At least one databricks workspace must have been deployed in an Azure tenant to have an associated account. If your tenant already has a metastore in this region, then empower will integrate with the existing metastore.

ResourcesDescription
GroupEmpower sets up a data engineers group per azure region that holds various Empower resources and engineers. If it is desired, customers can make and manage their own groups and then add their group to the data engineers group, thus inheriting the group's permissions. See access documentation
MetastoreOne unity catalog metastore is deployed per Azure region. The metastore is the highest level of databricks organization and contains the empower catalogs.
Network Connectivity ConfigurationThis feature is currently in public preview and is not part of the default Empower setup. If you would like more information on this feature, please see this documentation

Secrets

Description: The following secrets are stored in each Empower keyvault as part of the automated deployment process. Additional customer specific secrets will be created by the Data Engineer/Architect.

NameValue
ad-tenant-idTenant ID of the environment
adb-access-token-idA databricks PAT associated with the deployment service principal.
adb-app-reg-idApp ID of the databricks service principal
adb-app-reg-secretSecret of the databricks service principal
adb-cluster-{cluster name}-idThe id of all of the static clusters created by the Empower deployment.
adb-oauth-client-idEmpowerclient id
adb-oauth-client-secret-nameEmpower client secret name
adb-oauth-token-endpointEmpower token endpoint
adb-resource-idResource ID of the databricks workspace
adb-workspace-urlThe url of the databricks workspace
empower-api-urlBase URL of the Empower API
keycloak-client-secret-{enterprise}-{environment}-dfEmpowerclient secret
pbi-app-reg-idApp ID of the Power BI service principal.
pbi-app-reg-secretSecret of the Power BI service principal.
pbi-tenant-idTenant ID of the Power BI workspace
org-pbi-app-reg-idApp ID of the org level Power BI service principal.
org-pbi-app-reg-secretSecret of the org level Power BI service principal.

Integration Runtime

Description: Empower currently requires the deployment of a single integration runtime. The compute for this IR can be deployed in either Azure or Non-Azure environments. These IR nodes will need access to all desired data sources that cannot be pulled through the Auto-Resolved Integration Runtime.

After the deployment of Microsoft fabric, we will be able to explore the option of supporting multiple integration runtimes. At present, multiple organizations will be needed if multiple integration runtimes are desired.

If the integration runtime nodes are deployed in Azure, the following additional resources will be deployed in the production environment.

NameValue
Login CredentialsThe login credentials will be stored in the keyvault under the names irnode-username and irnode-password.
Virtual MachinesTwo virtual machines will be deployed to serve as the 2 IR nodes. These may need to be scaled out to 3 or 4 nodes depending upon desired performance. They will have the standard virtual machine resources such as OS disks and network interfaces.