@task decorator as other workflow tasks, but pass task_config=DatabricksJobTaskConfig(...) so the task runs by calling the Databricks Jobs API. The task triggers the job with run_now(), and by default waits for the run to complete (or times out and cancels the run).
Example
timeout_seconds).
How it runs
- The task calls Databricks
jobs.run_now()with an idempotency token derived from the Flyte execution ID so the same logical run is not submitted twice. - If
skip_wait_for_completionisFalse: the task polls until the run terminates or is skipped, or untiltimeout_secondselapses. On timeout, the run is canceled and aRuntimeErroris raised. On success, the task completes after the Databricks run finishes. - If
skip_wait_for_completionisTrue: the task returns immediately after triggering the job; it does not wait for completion.
DatabricksJobTaskConfig
Image spec for the task (e.g.
TaskPythonBuild with pip_packages). The task process runs in this image; the actual job execution happens in Databricks.Databricks workspace URL (e.g.
https://<workspace>.cloud.databricks.com).The Databricks job ID to run (from your Databricks workspace Jobs).
Kubernetes service account name for the task pod. Used for OIDC token exchange when
DATABRICKS_PERSONAL_ACCESS_TOKEN is not set (workload identity federation).Optional parameters to pass to the job run (e.g. notebook params, jar params).
Maximum seconds to wait for the job run to complete. If not set, defaults to 20 minutes. When the timeout is reached, the run is canceled and a
RuntimeError is raised.If
False (default), the task triggers the job and waits for it to complete (or timeout). If True, the task only triggers the job and returns immediately without waiting.Environment variables for the task (plain or secret refs). Use this to inject credentials such as
DATABRICKS_PERSONAL_ACCESS_TOKEN so the task can authenticate to Databricks. See Environment variables.Optional CPU/memory resources for the task pod.
Authentication
The task supports two ways to authenticate to Databricks:- Personal Access Token (PAT) — If
DATABRICKS_PERSONAL_ACCESS_TOKENis set in the task’s environment, the task uses it to authenticate. You can provide the PAT via the task configenv(e.g. a secret reference) or via Flyte Propeller’s Kubernetes default env vars so it is available to the task pod. - OAuth token federation — If
DATABRICKS_PERSONAL_ACCESS_TOKENis not set, the task uses OAuth token federation (workload identity). This requiresDATABRICKS_SERVICE_PRINCIPAL_CLIENT_IDin the task’s environment and a service account on the task pod (setservice_accountinDatabricksJobTaskConfig). The Kubernetes service account token is exchanged for a Databricks access token via the workspace OIDC endpoint; the Databricks workspace must have OIDC federation configured.
| Variable | Required | Description |
|---|---|---|
DATABRICKS_PERSONAL_ACCESS_TOKEN | No* | Databricks Personal Access Token. If set, the task uses PAT auth. Can be provided via task env (e.g. secret ref) or cluster default env vars. |
DATABRICKS_SERVICE_PRINCIPAL_CLIENT_ID | For OIDC only | Client ID of the Databricks service principal. Required when using OAuth token federation (i.e. when PAT is not set). |
- Authorize service principal access with OAuth (OAuth M2M) — Creating an OAuth secret and using client ID + secret for workspace-level operations.
- Authenticate using OAuth token federation — Using workload identity tokens (e.g. Kubernetes OIDC) instead of Databricks secrets.
Checklist
- Use
@task(task_config=DatabricksJobTaskConfig(...))with requiredimage,workspace_host, andjob_id. - For auth: set
DATABRICKS_PERSONAL_ACCESS_TOKEN(e.g. in taskenvas a secret ref) for PAT auth, or use OAuth token federation withDATABRICKS_SERVICE_PRINCIPAL_CLIENT_IDandservice_account. - Optionally set
env,job_parameters,timeout_seconds,skip_wait_for_completion, andresourcesas needed. - Register the task in a
@workflowand call it like any other task (e.g.run_databricks_job()).