How-To: Pipeline Migration

Migration Sequence

The steps of a full migration can be complicated. Luckily, we provide a step-by-step guide to walk through a full environment migration based on real-world scenarios. This guide is meant to complement the feature documentation for Configuration Migration.

  1. Migrate Connections
  2. Update Connections in Target Environment
  3. Migrate Acquire Tasks
  4. Migrate Transform Tasks
  5. Migrate Notebooks/Scripts
  6. Migrate Custom ADF Pipelines
  7. Validation and Final Checks
  8. Troubleshooting & Rollback Procedures

Step-by-step Walkthrough

Migrate Connections

First you must migrate connections from your source environment to your target environment.

  • Ensure you are in the source environment.
  • Navigate to Settings -> Data Migration
  • Select the target environment.
  • Select Connection Credentials.
  • Choose relevant Connections and click Migrate.

Update Connections in Target Environment

Gather required connection details for your connectors. This may include username/passwords for the correct dev/stage/prod connection, as well as API/Auth strings, etc.

  • In Empower, ensure you are in the target environment.
  • Navigate to Connect module.
  • For each relevant connection, update fields and click "Save and Test".

Migrate Acquire Tasks

After connections are setup in the target environment, we can begin to move over Tasks.

🚧

Tasks for Connections not yet migrated will not be visible for migration!

If you have a task built on a connection, and you have not yet migrated that connection to the Target environment, you will not see this task available for migration!

You must first migrate the connections, then migrate the tasks.

  • Ensure you are in the source environment.
  • In Empower, navigate Settings -> Data Migration.
  • Create a new migration by clicking "+".
  • Select Tasks from the radio button menu.
  • Choose which Acquire Tasks to migrate over.
  • Preview if applicable to ensure configs line up with source expectations.
  • Edit individual tasks to confirm environment-specific settings.
    • This may include configuration specific items, like certain objects being enabled/disabled.
    • This may also include schedule triggers -> schedules are not migrated across environments.

Migrate Transform Tasks

  • .Ensure you are in the source environment.
  • In Empower, navigate to Settings -> Data Migration.
  • Create a new migration by clicking "+".
  • Select Tasks from the radio button menu.
  • Choose which Transform Tasks to migrate over.
  • Preview if applicable to ensure configs line up with source expectations.
  • Edit individual model tasks to confirm environment specific settings.
    • This may include configuration specific items, like certain entities being enabled/disabled.
    • This may also include schedule triggers -> schedules are not migrated across environments.

Migrate Notebooks and Scripts

If migrating Extend Tasks or Transform Tasks, you will need to manually migrate over any notebooks and scripts between Databricks workspaces. You have two options to do so.

Using Git Folders

Databricks supports CI/CD for notebooks and scripts using Git Folders (https://docs.databricks.com/aws/en/repos). We recommend setting this up before you begin development to easily clone/branch/migrate notebooks across environments without hassle.

If using this method for file migration, this is as easy as checking out the branch/commit of the repo in your target environment.

Without Git Folders

In case you have not setup git integration, or do not wish to:

  • In Databricks, navigate to the source workspace.
  • In a separate tab (also in Databricks) navigate to the target workspace.
  • Copy relevant entity notebooks from the source workspace tab.
  • Upload these entity notebooks in your target workspace tab.
  • In Empower, navigate to the target environment's model entity pane.
  • Ensure the filepaths for the notebook field for each entity matches the target workspace filepath in the Databricks filesystem.

Note that this method does make it more complicated to track changes between notebooks. We highly recommend using Git Folders with Databricks for clean CI/CD on notebooks.