Databricks


Connector Details

Connector AttributesDetails
NameDatabricks
DescriptionThe Databricks platform is a unified analytics solution designed to accelerate data-driven innovation by enabling seamless collaboration between data engineers, data scientists, and business analysts. It provides a powerful environment for processing, analyzing, and visualizing large-scale data with built-in support for machine learning and AI workloads. Leveraging Apache Spark at its core, Databricks simplifies the development of complex data pipelines, while ensuring scalability and performance. With integrated data lakes, real-time streaming, and collaborative notebooks, Databricks empowers organizations to extract actionable insights, optimize workflows, and drive transformative outcomes across various industries.
Connector TypeClass B

Features

Feature NameFeature Details
Load StrategiesFull Load
Metadata ExtractionSupported
Data AcquisitionSupported
Data PublishingNot Supported
Automated Schema Drift HandlingNot Supported

📘

Delta Sharing vs Data Copy

Databricks Unity Catalogs natively support delta sharing. Our Databricks connector does not use this technology, as it is already possible within the Databricks Workspace experience (see documentation here). If you want to use Delta Share instead of data copy, simply follow the steps in the Delta Share documentation above.

Our connector copies data from one catalog to another. Outside of having a manipulatable copy of the data rather than a Delta Share read only version, the Empower Databricks connector also allows you to take advantage of the Type 2 history stored in your bronze layer tables without having to only use Time Travel.

Source Connection Attributes

Connection ParametersData TypeExample
Connection NameStringDATABRICKS
Server hostnameStringServer Hostname
TokenString<your-dayabricks-token-here: dapi*>
HTTP PathStringHTTP Path
CatalogStringCatalog name
Silver Schema (Optional)String
Bronze Schema (Optional)String

Connector Specific Configuration Details

  1. Databricks connector has optional values such as Bronze Schema and Silver Schema

  2. If you are using Delta Sharing, ensure that the PROVIDER has assigned the correct permissions on the source catalogs, tables, etc., that the RECIPIENT will access. At a minimum, read access is required. You can verify this by running a simple select query.

  3. The cluster you will use with Databricks should be set up with Unity Catalog.

  4. Set up the authorization. You can select one of the strategies: Personal access token authentication (PAT) or Authenticate access to [Azure] Databricks with a service principal using OAuth (OAuth M2M)

4.1 Personal access token authentication (PAT)
Generate the access Token for the user:

  1. Log in to Databricks Go to your Databricks workspace URL and log in.

  2. Open User Settings Once logged in, click on your user profile icon in the upper right corner of the screen. From the dropdown menu, click User Settings.

  3. Generate a New Token Under the Access Tokens tab, click the Generate New Token button. In the dialog box, provide a comment or description for the token (optional but recommended). Optionally, set an expiration date for the token. If no expiration date is set, the token will last for the default period, which varies by workspace configuration. Click Generate.

  4. Copy and Save the Token After generating the token, copy the token immediately as it will only be shown once. Store it securely (e.g., in a password manager or a secure environment).

  5. Use the Token You can use the token in various APIs, SDKs, or CLI commands to authenticate with Databricks.

Official documentation:

  1. Log in to Databricks Go to your Databricks workspace URL and log in.

  2. Open User Settings Once logged in, click on your user profile icon in the upper right corner of the screen. From the dropdown menu, click User Settings.

  1. Application ID is a client_id:

  2. Select the tab Secrets and press Generate secret:

  3. Save the Secret and Client ID for the connector:

Official documentation:


  1. Open the list of available clusters or warehouses. Choose your own and get from the cluster settings the next values:

    1. HTTP path from Advanced options:
    2. Server hostname from Advanced options:
  2. Get the Catalog name from the catalog explorer, but make sure to check the permissions. You need sufficient permissions to read both the catalog and the schema.



Troubleshooting

PAT

  • Pay attention to the access to specific catalogs to tables. Some of them or a whole catalog could demand access to the SELECT command:
  • It is possible that using the extractor you can experience connection issues regarding the Grant Access to the catalog. This is common while extracting a sub-set of the ObjectName's fields. There may be an issue with the ObjectName which has a required or mandatory set of "key" columns to capture. The user may be missing the required sub-set of columns (see Custom Extraction Mandatory keys below).
  • Common issues and solutions related to unit testing of the connector.

Service principal (OAuth)

  1. Make sure your service principal has access to the catalog and sufficient privileges to work with it.

  2. If you're using a warehouse, ensure that the service principal's permissions are also extended to that warehouse.



  3. If you're using a cluster, ensure that the service principal's permissions are extended to that cluster, and the cluster type SHARED.



Screenshot To Use Connector

If you are going to use the PAT auth strategy then skip input the Client ID(OAuth), Secret (OAuth).

If you want to use OAuth then fill inClient ID(OAuth), Secret (OAuth)and skip Token (PAT).