Overview

This document outlines the process for integrating a custom interactive cluster into the Empower product, along with the necessary configuration steps.

Need for an Interactive Cluster

Efficiency: Instead of launching multiple job clusters, a single interactive cluster can be utilized, tailored to the required size.
Performance Boost: This results in improved product performance, lowering both execution time and costs.

Configuration Steps

Create interactive cluster in corresponding databricks instance with the help of following script/notebook

import requests

DATABRICKS_URL = (
    "https://adb-613269140414450.10.azuredatabricks.net"  # Replace with databricks URL
)
DATABRICKS_API_KEY = "dapi3fa77cf20ff72f4bb00eb0175857c4f5"  # Replace with API token


def main():
    header = {"Authorization": f"Bearer {DATABRICKS_API_KEY}"}

    body = {
        "cluster_name": "EmpowerInteractiveCluster",
        "num_workers": 1,
        "spark_version": "14.3.x-photon-scala2.12",
        "data_security_mode": "SINGLE_USER",
        "single_user_name": "de0a3ca9-5e39-4b00-9813-0e298fe12ae4",  # Replace with respective ADF service principal 
        "instance_pool_id": "0927-105040-stop6-pool-el2max83",  # Replace with pool id - this is S Pool 
        "policy_id": "001890F816051BBF",  # Replace with EmpowerSparkJobPolicy Id
        "autotermination_minutes": 60,
    }

    resp = requests.post(
        f"{DATABRICKS_URL}/api/2.1/clusters/create",
        headers=header,
        timeout=60,
        json=body,
    )
    if resp.status_code != 200:
        print(f"Error creating cluster: {resp.text}")
        return

    out = resp.json()
    print(f"Created new cluster with ID: {out['cluster_id']}")


if __name__ == "__main__":
    main()

Cluster created by above script and retrieve the cluster id from cluster's url (highlighted)

Configure the following values in [state_config].[DatabaseToStepCommand] table.
1. BatchVersion = 'v2'
2. ClusterID = cluster ID from step 1
3. JobConcurrency = -1
4. SparkWorkers = 0
How to confirm the Intractive Cluster is used for execution.
Make sure the compute is EmpowerInteractiveCluster and run as Adf service principal

How to use this feature

a. Triggering from Main pipeline

Initiate the main pipeline for the respective load group of the DataSource.

b. Triggering from flows

Triggering from Flows is currently not supported but will be available in future releases.

Supported Connectors

The following Spark-Based Connectors are supported through SPARK_PL.
ServiceNow Argus Kronos AttendanceOnDemand GuildQuality Five9 Workday GoogleAnalytics Kafka EventHubs DynamicsSynapseLink Databricks BirdEye SAPSac Origami PrimaveraP6 AzureSQLServer Clma MicrosoftEntra