Skip to main content

Jobs

A job represents a past, present or future "task" or "work" to be executed by a worker. Future jobs or jobs waiting for a worker are called "queued jobs", and are ordered by the time at which they were scheduled for (scheduled_for). Jobs that are created without being given a future scheduled_for are scheduled for the time at which they were created.

Workers fetch jobs from the queue, start executing them, atomically set their state in the queue to "running", stream the logs while executing them, then once completed remove them from the queue and create a new "completed job".

Every job has a unique UUID attached to it and as long as you have visibility over the execution of the script, you are able to inspect the execution logs, output and metadata in the dedicated details page of the job.

Job kinds

There are 5 main kinds of jobs, that each have a dedicated tab in the runs page:

  • Script Jobs: Run a script as defined by the hash of the script (that uniquely and immutably defines a specific version of a script), its input arguments (args) and the permissioned_as user or group of whom it is going to act on behalf of and inherit the visibility to other items such as resources and variables from. An user can NEVER escalates his privileges but only de-escalates it by launching a script with either the same permissions as himself or a subset of it (by giving the permissions of a group that he is member of).

  • Preview Jobs: similar to script jobs but instead of hash, they contain the whole raw code they will run. Those are the jobs that are launched from the script editors. Even when code is executed as a preview, you keep a trace of its execution.

  • Dependencies Jobs: Scripts written in Python generate a lockfile when they are saved/created. This lockfile ensures that an execution of the same hash will always use the same versions. The process of generating this lockfile is also a job in itself so you can easily inspect the issues generating the lockfile if any. See Dependency Management for more information.

  • Flow Jobs: A flow job is the "meta" job that orchestrates the execution of every step. The execution of the steps are in-themselves jobs. It is defined similarly to a script job but instead of being defined by a path to a script, it is defined by a path to the flow.

  • Preview Flow Jobs: A preview flow is a job that contains the raw json definition of the flow instead of merely a path to it. It is the underlying job for the preview of flows in the flow editor interface.

Run jobs on behalf of

The permissioned_as value from script and preview jobs is the most important concept to grasp to understand what makes Windmill's security and permission model consistent, predictable and safe. permissioned_as is distinct from the created_by value, even though in the vast majority of jobs, they will be the same. It represents the level of permissions this job will execute with. As a direct consequence, the variables (including secrets) that are accessible to the scripts are only those whom the user or group has visibility on, given his permissions.

Similarly for the Contextual Variable WM_TOKEN which contains an ephemeral token (ephemeral to the script execution), which has the same privilege and permissions as the owner in permissioned_as. The Python client inside the script implicitly uses that same token to be granted privilege to do Windmill operations (like running other scripts or getting resources), meaning that the same script ran by 2 different users, will run differently and may be unauthorized to do partially or all operations of the script. This is what enables anyone to share scripts doing sensitive operations safely as long as the resources and secrets that the script relies on are permissioned correctly.

A user can only run a script permissioned as either himself, one of the group that he is a member of.

Job inputs and Script parameters

Jobs take a JSON object as input which can be empty. That input is passed as the payload of the POST request that triggers the Script. The different key-value pairs of the objects are passed as the different parameters of the main function, with just a few language-specific transformations to more adequate types in the target language, if necessary (e.g base64/datetime encoding). Values can be nested JSON objects themselves, but we recommend trying to keep the input flat when possible.

If the payload contains keys that are not defined as parameters in the main function, they will be ignored. This allows you to handle arbitrary JSON payloads, as you can choose which keys to define as parameters in your script and process the data accordingly.

Retention Policy

The retention policy for jobs runs details varies depending on your team's plan:

  • Community plan (cloud): Jobs runs details are retained for 60 days.
  • Team plan (cloud): Jobs runs details are retained for 60 days.
  • Enterprise plan (cloud): Unlimited retention period.
  • Open Source (self-host): Jobs runs details are retained for maximum 30 days.
  • Enterprise plan (self-host): Unlimited retention period.

You can set a custom retention period for the jobs runs details. The retention period can be configured in the instance settings, in the "Core" tab.

Set Retention Period

Large logs management with S3

To optimize log storage and performance, Windmill leverages S3 for log management. This approach minimizes database load by treating the database as a temporary buffer for up to 5000 characters of logs per job.

For jobs with extensive logging needs, Windmill Enterprise Edition users benefit from seamless log streaming to S3. This ensures logs, regardless of size, are stored efficiently without overwhelming local resources.

This allows the handling of large-scale logs with minimal database impact, supporting more efficient and scalable workflows.

For large logs storage (and display) and cache for distributed Python jobs, you can connect your instance to a bucket from the instance settings.

S3/Azure for Python Cache & Large Logs

This feature has no overlap with the Workspace object storage.

You can choose to use either S3 or Azure Blob Storage. For each you will find a button to test settings from a server or from a worker.

S3

NameTypeDescription
BucketstringName of your S3 bucket.
RegionstringIf left empty, will be derived automatically from $AWS_REGION.
Access key IDstringIf left empty, will be derived automatically from $AWS_ACCESS_KEY_ID, pod or ec2 profile.
Secret keystringIf left empty, will be derived automatically from $AWS_SECRET_KEY, pod or ec2 profile.
EndpointstringOnly needed for non AWS S3 providers like R2 or MinIo.
Allow httpbooleanDisable if using https only policy.

Azure Blob

NameTypeDescription
Account namestringThe name of your Azure Storage account. It uniquely identifies your Azure Storage account within Azure and is required to authenticate with Azure Blob Storage.
Container namestringThe name of the specific blob container within your storage account. Blob containers are used to organize blobs, similar to a directory structure.
Access keystringThe primary or secondary access key for the storage account. This key is used to authenticate and provide access to Azure Blob Storage.
Tenant IDstring(optional) The unique identifier (GUID) for your Azure Active Directory (AAD) tenant. Required if using Azure Active Directory for authentication.
Client IDstring(optional) The unique identifier (GUID) for your application registered in Azure AD. Required if using service principal authentication via Azure AD.
Endpointstring(optional) The specific endpoint for Azure Blob Storage, typically used when interacting with non-Azure Blob providers like Azurite or other emulators. For Azure Blob Storage, this is auto-generated and not usually needed.