Custom Interactive Cluster Configuration
Overview
This document outlines the process for integrating a custom interactive cluster into the Empower product, along with the necessary configuration steps.
Need for an Interactive Cluster
-
Efficiency: Instead of launching multiple job clusters, a single interactive cluster can be utilized, tailored to the required size.
-
Performance Boost: This results in improved product performance, lowering both execution time and costs.
Configuration Steps
- Create interactive cluster in corresponding databricks instance with the help of following script/notebook
import requests
DATABRICKS_URL = (
"https://adb-613269140414450.10.azuredatabricks.net" # Replace with databricks URL
)
DATABRICKS_API_KEY = "dapi3fa77cf20ff72f4bb00eb0175857c4f5" # Replace with API token
def main():
header = {"Authorization": f"Bearer {DATABRICKS_API_KEY}"}
body = {
"cluster_name": "EmpowerInteractiveCluster",
"num_workers": 1,
"spark_version": "14.3.x-photon-scala2.12",
"data_security_mode": "SINGLE_USER",
"single_user_name": "de0a3ca9-5e39-4b00-9813-0e298fe12ae4", # Replace with respective ADF service principal
"instance_pool_id": "0927-105040-stop6-pool-el2max83", # Replace with pool id - this is S Pool
"policy_id": "001890F816051BBF", # Replace with EmpowerSparkJobPolicy Id
"autotermination_minutes": 60,
}
resp = requests.post(
f"{DATABRICKS_URL}/api/2.1/clusters/create",
headers=header,
timeout=60,
json=body,
)
if resp.status_code != 200:
print(f"Error creating cluster: {resp.text}")
return
out = resp.json()
print(f"Created new cluster with ID: {out['cluster_id']}")
if __name__ == "__main__":
main()
-
Configure the following values in
[state_config].[DatabaseToStepCommand]
table.- BatchVersion = 'v2'
- ClusterID = cluster ID from step 1
- JobConcurrency = -1
- SparkWorkers = 0
-
How to confirm the Intractive Cluster is used for execution.
Make sure the compute isEmpowerInteractiveCluster
and run asAdf service principal
-
How to use this feature
a. Triggering from Main pipeline
Initiate the main pipeline for the respective load group of the DataSource.
b. Triggering from flows
Triggering from Flows is currently not supported but will be available in future releases.
Supported Connectors
- The following Spark-Based Connectors are supported through SPARK_PL.
ServiceNow Argus Kronos AttendanceOnDemand GuildQuality Five9 Workday GoogleAnalytics Kafka EventHubs DynamicsSynapseLink Databricks BirdEye SAPSac Origami PrimaveraP6 AzureSQLServer Clma MicrosoftEntra
Updated 3 months ago