Integrate Databricks with Cascade
This guide explains how to integrate Databricks with Cascade to automatically Sync KPI values from Databricks into Cascade Measures / Metrics. The integration uses Databricks SQL Warehouses and the Databricks Statement Execution API to securely run SQL queries and synchronize results into Cascade on a scheduled basis.
Method
Databricks SQL + Statement Execution API (Required). Cascade connects to Databricks by:
- Executing SQL statements against a Databricks SQL Warehouse
- Retrieving query results via the Statement Execution API
- Transforming results and loading them into Cascade
1) Authentication Setup
Option A — Personal Access Token (POC Only)
For quick proofs of concept, a Databricks Personal Access Token (PAT) can be used. PATs are user-scoped and not recommended for long-term production integrations.
Refer to the following link to setup a Permanent Access Token:
https://docs.databricks.com/aws/en/dev-tools/auth/pat
Customer steps
- In your Databricks workspace, click your username in the top bar and select Settings.
- Click Developer.
- Next to Access tokens, click Manage.
- Click Generate new token (Enter a comment that helps you to identify this token)
- Set the token's lifetime in days (set the maximum lifetime for the new token)
- Click Generate and then click Done.
What the customer provides to Cascade
- A Databricks PAT
- Confirmation of warehouse and dataset access
Option B — OAuth (Recommended for Production)
Databricks supports machine-to-machine OAuth using a service principal.
Refer to the following link to authorize service principal access to Databricks with OAuth: https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m
Refer to the following link to add a Workspace Level Service Principal: https://docs.databricks.com/aws/en/admin/users-groups/manage-service-principals
Customer steps (Databricks Admin)
- Create or identify a workspace service principal
- Grant the service principal:
- Permission to use the target SQL Warehouse (refer to step 2)
- Read access to the required catalogs, schemas, and tables/views (refer to step 3)
- Enable OAuth access for the service principal
What the customer provides to Cascade
- OAuth credentials for the Databricks service principal
- Confirmation of the SQL Warehouse and datasets the service principal can access
- Databricks Workspace URL
Cascade will use the credentials provided with the Databricks Instance ID to make an API call to retrieve a bearer Token.
2) SQL Warehouse Identification
All queries are executed against a Databricks SQL Warehouse. If the Databricks service principal has been granted access, Cascade can automatically identify available SQL Warehouses within the workspace and select the appropriate execution target.
Customer steps
- Grant the Databricks service principal CAN_USE permission on the intended SQL Warehouse
- Confirm which SQL Warehouse Cascade should use if multiple warehouses are available
What the customer provides to Cascade
- Confirmation that the service principal has CAN_USE access to at least one SQL Warehouse
- The name of the preferred SQL Warehouse if multiple are available
3) Define the Data to Sync
Cascade pulls data using SQL queries defined by the customer.
Create Curated Views in Databricks
Curated views act as a stable contract between Databricks and Cascade.
Why views are recommended
Views allow customers to:
- Encapsulate business logic in one place
- Maintain stable schemas even if underlying tables change
- Update logic without changing the integration
- Restrict access to only approved fields
Customer steps (Databricks)
- Open Databricks SQL in the Databricks workspace
- Navigate to a schema intended for integrations (for example analytics or reporting)
- Create a SQL view that returns exactly the data Cascade needs
Note: Cascade executes the SQL query exactly as defined and does not automatically add WHERE clauses, date filters, or offsets.
It is highly recommended to add incremental behavior (WHERE clause with date restriction) and it should be implemented in the SQL view or query provided.
Grant Cascade read-only access to the view
Once the view is created, grant the service principal SELECT access. Cascade will only be able to read from the explicitly shared view.
What the Customer Provides to Cascade
- Catalog, Schema and View name(s)
- Confirmation that the service principal has SELECT access
- Expected refresh cadence (daily, hourly, etc.)
4) Pulling the Data
The steps below walk through how Cascade pulls the data at a high level
1. Getting the Statement ID
Endpoint
- POST https://<workspace-host>/api/2.0/sql/statements/
Required inputs
- Authorization: Bearer token (OAuth or PAT)
- warehouse_id: the SQL Warehouse you’re executing against
- statement: the SQL text (your SELECT * FROM <view>)
2. Getting Data
Endpoint
- GET https://<workspace-host>/api/2.0/sql/statements/{statement_id}
- When status.state becomes SUCCEEDED, this response also includes:
- a manifest (metadata about the results)
- the first chunk of result data (chunk 0)
Databricks can split large results into multiple chunks.
Endpoint for chunk N
- GET https://<workspace-host>/api/2.0/sql/statements/{statement_id}/result/chunks/{chunk_index}
- Databricks documents this as the way to fetch additional chunks beyond the first.
5) Mapping Databricks Data to Cascade
KPIs > Cascade Metrics
Each view should return:
- metric_key
- metric_date
- metric_value
How Cascade maps this:
- metric_key > Cascade Metric
- metric_date + metric_value > Data Point Upsert
6) Common Troubleshooting Scenarios
Permission errors
- Service principal lacks access to the SQL Warehouse
- Missing read permissions on catalog/schema/table
Data issues
- Non-numeric values for KPI fields
- Unexpected schema changes