Skip to content
  • There are no suggestions because the search field is empty.

Integrate Databricks with Cascade

This guide explains how to integrate Databricks with Cascade to automatically Sync KPI values from Databricks into Cascade Measures / Metrics. The integration uses Databricks SQL Warehouses and the Databricks Statement Execution API to securely run SQL queries and synchronize results into Cascade on a scheduled basis.

Method

Databricks SQL + Statement Execution API (Required). Cascade connects to Databricks by:

  1. Executing SQL statements against a Databricks SQL Warehouse
  2. Retrieving query results via the Statement Execution API
  3. Transforming results and loading them into Cascade

1) Authentication Setup

Option A — Personal Access Token (POC Only)

For quick proofs of concept, a Databricks Personal Access Token (PAT) can be used. PATs are user-scoped and not recommended for long-term production integrations.

Refer to the following link to setup a Permanent Access Token:
https://docs.databricks.com/aws/en/dev-tools/auth/pat

Customer steps

  1. In your Databricks workspace, click your username in the top bar and select Settings.
  2. Click Developer.
  3. Next to Access tokens, click Manage.
  4. Click Generate new token (Enter a comment that helps you to identify this token)
  5. Set the token's lifetime in days (set the maximum lifetime for the new token)
  6. Click Generate and then click Done.

What the customer provides to Cascade

  • A Databricks PAT
  • Confirmation of warehouse and dataset access

Option B — OAuth (Recommended for Production)

Databricks supports machine-to-machine OAuth using a service principal.

Refer to the following link to authorize service principal access to Databricks with OAuth: https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m

Refer to the following link to add a Workspace Level Service Principal: https://docs.databricks.com/aws/en/admin/users-groups/manage-service-principals

Customer steps (Databricks Admin)

  1. Create or identify a workspace service principal
  2. Grant the service principal:
    • Permission to use the target SQL Warehouse (refer to step 2)
    • Read access to the required catalogs, schemas, and tables/views (refer to step 3)
  3. Enable OAuth access for the service principal

What the customer provides to Cascade

  • OAuth credentials for the Databricks service principal
  • Confirmation of the SQL Warehouse and datasets the service principal can access
  • Databricks Workspace URL

Cascade will use the credentials provided with the Databricks Instance ID to make an API call to retrieve a bearer Token.


2) SQL Warehouse Identification

All queries are executed against a Databricks SQL Warehouse. If the Databricks service principal has been granted access, Cascade can automatically identify available SQL Warehouses within the workspace and select the appropriate execution target.

Customer steps

  1. Grant the Databricks service principal CAN_USE permission on the intended SQL Warehouse
  2. Confirm which SQL Warehouse Cascade should use if multiple warehouses are available
Customers do not need to manually provide a Warehouse ID unless they want to explicitly control which warehouse is used.

What the customer provides to Cascade

  • Confirmation that the service principal has CAN_USE access to at least one SQL Warehouse
  • The name of the preferred SQL Warehouse if multiple are available

3) Define the Data to Sync

Cascade pulls data using SQL queries defined by the customer.

Create Curated Views in Databricks

Curated views act as a stable contract between Databricks and Cascade.

Why views are recommended

Views allow customers to:

  • Encapsulate business logic in one place
  • Maintain stable schemas even if underlying tables change
  • Update logic without changing the integration
  • Restrict access to only approved fields

Customer steps (Databricks)

  1. Open Databricks SQL in the Databricks workspace
  2. Navigate to a schema intended for integrations (for example analytics or reporting)
  3. Create a SQL view that returns exactly the data Cascade needs

Note: Cascade executes the SQL query exactly as defined and does not automatically add WHERE clauses, date filters, or offsets.

It is highly recommended to add incremental behavior (WHERE clause with date restriction) and it should be implemented in the SQL view or query provided.

Grant Cascade read-only access to the view

Once the view is created, grant the service principal SELECT access. Cascade will only be able to read from the explicitly shared view.

What the Customer Provides to Cascade

  • Catalog, Schema and View name(s)
  • Confirmation that the service principal has SELECT access
  • Expected refresh cadence (daily, hourly, etc.)

4) Pulling the Data

The steps below walk through how Cascade pulls the data at a high level

          1. Getting the Statement ID

Endpoint

  • POST https://<workspace-host>/api/2.0/sql/statements/

Required inputs

  • Authorization: Bearer token (OAuth or PAT)
  • warehouse_id: the SQL Warehouse you’re executing against
  • statement: the SQL text (your SELECT * FROM <view>) 

    2. Getting Data

Endpoint

  • GET https://<workspace-host>/api/2.0/sql/statements/{statement_id}
  • When status.state becomes SUCCEEDED, this response also includes:
    • a manifest (metadata about the results)
    • the first chunk of result data (chunk 0)
             3. Getting Chunked Data

Databricks can split large results into multiple chunks.

Endpoint for chunk N

  • GET https://<workspace-host>/api/2.0/sql/statements/{statement_id}/result/chunks/{chunk_index}
  • Databricks documents this as the way to fetch additional chunks beyond the first.

5) Mapping Databricks Data to Cascade

KPIs > Cascade Metrics

Each view should return:

  • metric_key
  • metric_date
  • metric_value

How Cascade maps this:

  • metric_key > Cascade Metric
  • metric_date + metric_value > Data Point Upsert

6) Common Troubleshooting Scenarios

Permission errors

  • Service principal lacks access to the SQL Warehouse
  • Missing read permissions on catalog/schema/table

Data issues

  • Non-numeric values for KPI fields
  • Unexpected schema changes