Integrate Databricks with Cascade
Sync Cloud Data with Cascade Metrics and Measures
Overview
The Databricks integration allows you to automatically sync KPI and performance data from Databricks into Cascade, ensuring your strategy dashboards always reflect the latest data from your data warehouse.
This integration is designed to be:
- Read-only — no changes are made to your Databricks data
- Stable — built on curated views with controlled schemas
- Secure — uses least-privilege access and governed data access controls
- Automated — eliminates manual exports or spreadsheet uploads
What This Integration Supports
Typical use cases include:
- Updating Cascade Metrics from warehouse-driven KPIs (e.g., Revenue, OEE, FPY, On-time Delivery)
- Syncing time-series data (daily, weekly, monthly performance)
- Connecting Cascade directly to your source-of-truth analytics layer in Databricks
- Replacing manual reporting workflows with automated data syncs
Recommended Data Approach
Use a Curated View (Strongly Recommended). This approach ensures:
- Business logic remains controlled within Databricks
- Schema changes do not break the integration
- Only approved data is exposed to Cascade
- Integration maintenance is minimized
Authentication Setup
Option A — Personal Access Token (POC Only)
For quick proofs of concept, a Databricks Personal Access Token (PAT) can be used. PATs are user-scoped and not recommended for long-term production integrations.
Refer to the following link to setup a Permanent Access Token:
https://docs.databricks.com/aws/en/dev-tools/auth/pat
Customer steps
- In your Databricks workspace, click your username in the top bar and select Settings.
- Click Developer.
- Next to Access tokens, click Manage.
- Click Generate new token (Enter a comment that helps you to identify this token)
- Set the token's lifetime in days (set the maximum lifetime for the new token)
- Click Generate and then click Done.
What the customer provides to Cascade
- A Databricks PAT
- Confirmation of warehouse and dataset access
Option B — OAuth (Recommended for Production)
Databricks supports machine-to-machine OAuth using a service principal.
Refer to the following link to authorize service principal access to Databricks with OAuth: https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m
Refer to the following link to add a Workspace Level Service Principal: https://docs.databricks.com/aws/en/admin/users-groups/manage-service-principals
Customer steps (Databricks Admin)
- Create or identify a workspace service principal
- Grant the service principal:
- Permission to use the target SQL Warehouse (refer to step 2)
- Read access to the required catalogs, schemas, and tables/views (refer to step 3)
- Enable OAuth access for the service principal
What the customer provides to Cascade
- OAuth credentials for the Databricks service principal
- Confirmation of the SQL Warehouse and datasets the service principal can access
- Databricks Workspace URL
Cascade will use the credentials provided with the Databricks Instance ID to make an API call to retrieve a bearer Token.
1) SQL Warehouse Access
All queries are executed against a Databricks SQL Warehouse.
Cascade can automatically identify available warehouses once access is granted.
Customer steps
- Grant the service principal
CAN_USEaccess to at least one SQL Warehouse - (Optional) Specify a preferred warehouse if multiple are available
What the customer provides
- Confirmation that warehouse access is configured
- (Optional) Preferred SQL Warehouse name
2) Data Access (Unity Catalog Permissions)
Access to the SQL Warehouse alone is not sufficient.
The service principal must also be granted read access to the underlying data.
Required permissions
USE CATALOGUSE SCHEMASELECTon the required tables or views
Example
GRANT USE CATALOG ON CATALOG main TO <service_principal>;
GRANT USE SCHEMA ON SCHEMA main.reporting TO <service_principal>;
GRANT SELECT ON VIEW main.reporting.cascade_kpis TO <service_principal>;
3) Define the Data to Sync
Cascade retrieves data using SQL queries defined by the customer.
Create Curated Views in Databricks (Recommended)
Customers should create views that return only the data required for Cascade.
Customer steps
- Open Databricks SQL
- Navigate to a reporting or analytics schema
- Create a view with the required output
Note: Cascade executes the SQL query exactly as defined and does not automatically add WHERE clauses, date filters, or offsets.
It is highly recommended to add incremental behavior (WHERE clause with date restriction) and it should be implemented in the SQL view or query provided.
Grant Cascade read-only access to the view
Once the view is created, grant the service principal SELECT access. Cascade will only be able to read from the explicitly shared view.
What the Customer Provides to Cascade
- Catalog, Schema and View name(s)
- Confirmation that the service principal has SELECT access
- Expected refresh cadence (daily, hourly, etc.)
4) How Data is Retrieved
Once access is configured, Cascade automatically retrieves data from Databricks on a scheduled basis.
- Cascade runs a query against your Databricks view
- Databricks executes the query in a SQL Warehouse
- Cascade retrieves the results via API
- The data is mapped and synced into Cascade Metrics
This process is fully automated and requires no manual intervention.
5) Mapping Databricks Data to Cascade
Recommended Output Schema
To map cleanly into Cascade, your view should return:
| Field | Type | Description |
|---|---|---|
| metric_name | string | Unique Name (e.g., revenue, oee) |
| metric_value | number | Value to sync |
| metric_date | date | Timestamp for time-series |
Mapping Logic
-
metric_key → Cascade Metric
-
metric_date + metric_value → Data point update
6) Common Troubleshooting Scenarios
Permission errors
- Confirm service principal has:
CAN_USEon SQL WarehouseSELECTon viewsUSE CATALOGandUSE SCHEMA
Query failures
- Confirm view exists and is accessible
- Confirm schema and catalog are correctly referenced
Data issues
- Ensure numeric fields are properly formatted
- Avoid schema changes without coordination
Summary
The Databricks integration provides a secure, scalable, and automated way to connect your data warehouse to Cascade.
By using curated views and controlled access, customers can ensure:
- Reliable data syncing
- Minimal maintenance
- Full control over business logic