Integrate Databricks with Cascade

Method

Databricks SQL + Statement Execution API (Required). Cascade connects to Databricks by:

Executing SQL statements against a Databricks SQL Warehouse
Retrieving query results via the Statement Execution API
Transforming results and loading them into Cascade

1) Authentication Setup

Option A — Personal Access Token (POC Only)

For quick proofs of concept, a Databricks Personal Access Token (PAT) can be used. PATs are user-scoped and not recommended for long-term production integrations.

Refer to the following link to setup a Permanent Access Token:
https://docs.databricks.com/aws/en/dev-tools/auth/pat

Customer steps

In your Databricks workspace, click your username in the top bar and select Settings.
Click Developer.
Next to Access tokens, click Manage.
Click Generate new token (Enter a comment that helps you to identify this token)
Set the token's lifetime in days (set the maximum lifetime for the new token)
Click Generate and then click Done.

What the customer provides to Cascade

A Databricks PAT
Confirmation of warehouse and dataset access

Option B — OAuth (Recommended for Production)

Databricks supports machine-to-machine OAuth using a service principal.

Refer to the following link to authorize service principal access to Databricks with OAuth: https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m

Refer to the following link to add a Workspace Level Service Principal: https://docs.databricks.com/aws/en/admin/users-groups/manage-service-principals

Customer steps (Databricks Admin)

Create or identify a workspace service principal
Grant the service principal:
- Permission to use the target SQL Warehouse (refer to step 2)
- Read access to the required catalogs, schemas, and tables/views (refer to step 3)
Enable OAuth access for the service principal

What the customer provides to Cascade

OAuth credentials for the Databricks service principal
Confirmation of the SQL Warehouse and datasets the service principal can access
Databricks Workspace URL

Cascade will use the credentials provided with the Databricks Instance ID to make an API call to retrieve a bearer Token.

2) SQL Warehouse Identification

All queries are executed against a Databricks SQL Warehouse. If the Databricks service principal has been granted access, Cascade can automatically identify available SQL Warehouses within the workspace and select the appropriate execution target.

Customer steps

Grant the Databricks service principal CAN_USE permission on the intended SQL Warehouse
Confirm which SQL Warehouse Cascade should use if multiple warehouses are available

Customers do not need to manually provide a Warehouse ID unless they want to explicitly control which warehouse is used.

What the customer provides to Cascade

Confirmation that the service principal has CAN_USE access to at least one SQL Warehouse
The name of the preferred SQL Warehouse if multiple are available

3) Define the Data to Sync

Cascade pulls data using SQL queries defined by the customer.

Create Curated Views in Databricks

Curated views act as a stable contract between Databricks and Cascade.

Why views are recommended

Views allow customers to:

Encapsulate business logic in one place
Maintain stable schemas even if underlying tables change
Update logic without changing the integration
Restrict access to only approved fields

Customer steps (Databricks)

Open Databricks SQL in the Databricks workspace
Navigate to a schema intended for integrations (for example analytics or reporting)
Create a SQL view that returns exactly the data Cascade needs

Note: Cascade executes the SQL query exactly as defined and does not automatically add WHERE clauses, date filters, or offsets.

It is highly recommended to add incremental behavior (WHERE clause with date restriction) and it should be implemented in the SQL view or query provided.

Grant Cascade read-only access to the view

Once the view is created, grant the service principal SELECT access. Cascade will only be able to read from the explicitly shared view.

What the Customer Provides to Cascade

Catalog, Schema and View name(s)
Confirmation that the service principal has SELECT access
Expected refresh cadence (daily, hourly, etc.)

4) Pulling the Data

The steps below walk through how Cascade pulls the data at a high level

1. Getting the Statement ID

Endpoint

POST https://<workspace-host>/api/2.0/sql/statements/

Required inputs

Authorization: Bearer token (OAuth or PAT)
warehouse_id: the SQL Warehouse you’re executing against
statement: the SQL text (your SELECT * FROM <view>)

2. Getting Data

Endpoint

GET https://<workspace-host>/api/2.0/sql/statements/{statement_id}
When status.state becomes SUCCEEDED, this response also includes:
- a manifest (metadata about the results)
- the first chunk of result data (chunk 0)

3. Getting Chunked Data

Databricks can split large results into multiple chunks.

Endpoint for chunk N

GET https://<workspace-host>/api/2.0/sql/statements/{statement_id}/result/chunks/{chunk_index}
Databricks documents this as the way to fetch additional chunks beyond the first.

5) Mapping Databricks Data to Cascade

KPIs > Cascade Metrics

Each view should return:

metric_key
metric_date
metric_value

How Cascade maps this:

metric_key > Cascade Metric
metric_date + metric_value > Data Point Upsert

6) Common Troubleshooting Scenarios

Permission errors

Service principal lacks access to the SQL Warehouse
Missing read permissions on catalog/schema/table

Data issues

Non-numeric values for KPI fields
Unexpected schema changes