What is a Temporal Worker?
There is a tight coupling between Temporal Task Queues and Worker Processes.
What is a Worker?
In day-to-day conversations, the term Worker is used to denote either a Worker Program, a Worker Process, or a Worker Entity. Temporal documentation aims to be explicit and differentiate between them.
What is a Task?
A Task is the context that a Worker needs to progress with a specific Workflow Execution, Activity Execution, or a Nexus Task Execution.
There are three types of Tasks:
What is a Worker Program?
A Worker Program is the static code that defines the constraints of the Worker Process, developed using the APIs of a Temporal SDK.
What is a Worker Entity?
A Worker Entity is the individual Worker within a Worker Process that listens to a specific Task Queue.
A Worker Entity listens and polls on a single Task Queue. A Worker Entity contains a Workflow Worker and/or an Activity Worker, which makes progress on Workflow Executions and Activity Executions, respectively.
Can a Worker handle more Workflow Executions than its cache size or number of supported threads?
Yes it can. However, the trade off is added latency.
Workers are stateless, so any Workflow Execution in a blocked state can be safely removed from a Worker. Later on, it can be resurrected on the same or different Worker when the need arises (in the form of an external event). Therefore, a single Worker can handle millions of open Workflow Executions, assuming it can handle the update rate and that a slightly higher latency is not a concern.
Operation guides:
What is a Worker Identity?
Workers have an associated identifier that helps identify the specific Worker instance.
By default, Temporal SDKs set a Worker Identity to ${process.pid}@${os.hostname()}
, which combines the Worker's process ID (process.pid
) and the hostname of the machine running the Worker (os.hostname()
).
The Worker Identity is visible in various contexts, such as Workflow History and the list of pollers on a Task Queue.
You can use the Worker Identity to aid in debugging operational issues. By providing a user assigned identifier, you can trace issues back to specific Worker instances.
What are some limitations of the default identity?
While the default identity format may seem sensible, it often proves to be of limited usefulness in cloud environments. Some common issues include:
- Docker containers: When running Workers inside Docker containers, the process ID is always
1
, as each container typically runs a single process. This makes the process identifier meaningless for identification purposes. - Random hostnames: In some cloud environments, such as Amazon ECS (Elastic Container Service), the hostname is a randomly generated string that does not provide any meaningful information about the Worker's execution context.
- Ephemeral IP addresses: In certain cases, the hostname might be set to an ephemeral IP address, which can change over time and does not uniquely identify a Worker instance.
What are some recommended approaches?
It is recommended that you ensure that the Worker Identity can be linked back to the corresponding machine, process, execution context, or log stream. In some execution environment, this might require that you explicitly specify the Worker Identity.
Here are some approaches:
- Use environment-specific identifiers: Choose an identifier that is specific to your execution environment. For example, when running Workers on Amazon ECS, you can set the Worker Identity to the ECS Task ID, which uniquely identifies the task running the Worker.
- Include relevant context: Incorporate information that helps establish the context of the Worker, such as the deployment environment (
staging
orproduction
), region, or any other relevant details. - Ensure uniqueness: Make sure that the Worker Identity is unique within your system to avoid ambiguity when debugging issues.
- Keep it concise: While including relevant information is important, try to keep the Worker Identity concise and easily readable to facilitate quick identification and troubleshooting.
What is a Worker Process?
Component diagram of a Worker Process and the Temporal Server
A Worker Process is responsible for polling a Task Queue, dequeueing a Task, executing your code in response to a Task, and responding to the Temporal Service with the results.
More formally, a Worker Process is any process that implements the Task Queue Protocol and the Task Execution Protocol.
- A Worker Process is a Workflow Worker Process if the process implements the Workflow Task Queue Protocol and executes the Workflow Task Execution Protocol to make progress on a Workflow Execution. A Workflow Worker Process can listen on an arbitrary number of Workflow Task Queues and can execute an arbitrary number of Workflow Tasks.
- A Worker Process is an Activity Worker Process if the process implements the Activity Task Queue Protocol and executes the Activity Task Processing Protocol to make progress on an Activity Execution. An Activity Worker Process can listen on an arbitrary number of Activity Task Queues and can execute an arbitrary number of Activity Tasks.
Worker Processes are external to a Temporal Service. Temporal Application developers are responsible for developing Worker Programs and operating Worker Processes. Said another way, the Temporal Service (including the Temporal Cloud) doesn't execute any of your code (Workflow and Activity Definitions) on Temporal Service machines. The Temporal Service is solely responsible for orchestrating State Transitions and providing Tasks to the next available Worker Entity.
While data transferred in Event Histories is secured by mTLS, by default, it is still readable at rest in the Temporal Service.
To solve this, Temporal SDKs offer a Data Converter API that you can use to customize the serialization of data going out of and coming back in to a Worker Entity, with the net effect of guaranteeing that the Temporal Service cannot read sensitive business data.
In many of our tutorials, we show you how to run both a Temporal Service and one Worker on the same machine for local development. However, a production-grade Temporal Application typically has a fleet of Worker Processes, all running on hosts external to the Temporal Service. A Temporal Application can have as many Worker Processes as needed.
A Worker Process can be both a Workflow Worker Process and an Activity Worker Process. Many SDKs support the ability to have multiple Worker Entities in a single Worker Process. (Worker Entity creation and management differ between SDKs.) A single Worker Entity can listen to only a single Task Queue. But if a Worker Process has multiple Worker Entities, the Worker Process could be listening to multiple Task Queues.
Worker Processes executing Activity Tasks must have access to any resources needed to execute the actions that are defined in Activity Definitions, such as the following:
- Network access for external API calls.
- Credentials for infrastructure provisioning.
- Specialized GPUs for machine learning utilities.
The Temporal Service itself has internal workers for system Workflow Executions. However, these internal workers are not visible to the developer.
What is a Workflow Task?
A Workflow Task is a Task that contains the context needed to make progress with a Workflow Execution.
- Every time a new external event that might affect a Workflow state is recorded, a Workflow Task that contains the event is added to a Task Queue and then picked up by a Workflow Worker.
- After the new event is handled, the Workflow Task is completed with a list of Commands.
- Handling of a Workflow Task is usually very fast and is not related to the duration of operations that the Workflow invokes.
What is a Workflow Task Execution?
A Workflow Task Execution occurs when a Worker picks up a Workflow Task and uses it to make progress on the execution of a Workflow Definition (also known as a Workflow function).
What is an Activity Task?
An Activity Task contains the context needed to proceed with an Activity Task Execution. Activity Tasks largely represent the Activity Task Scheduled Event, which contains the data needed to execute an Activity Function.
If Heartbeat data is being passed, an Activity Task will also contain the latest Heartbeat details.
What is an Activity Task Execution?
An Activity Task Execution occurs when a Worker uses the context provided from the Activity Task and executes the Activity Definition (also known as the Activity Function).
The ActivityTaskScheduled Event corresponds to when the Temporal Service puts the Activity Task into the Task Queue.
The ActivityTaskStarted Event corresponds to when the Worker picks up the Activity Task from the Task Queue.
Either ActivityTaskCompleted or one of the other Closed Activity Task Events corresponds to when the Worker has yielded back to the Temporal Service.
The API to schedule an Activity Execution provides an "effectively once" experience, even though there may be several Activity Task Executions that take place to successfully complete an Activity.
Once an Activity Task finishes execution, the Worker responds to the Temporal Service with a specific Event:
- ActivityTaskCanceled
- ActivityTaskCompleted
- ActivityTaskFailed
- ActivityTaskTerminated
- ActivityTaskTimedOut
What is a Nexus Task?
A Nexus Task represents a single Nexus request to start or cancel a Nexus Operation. The Nexus Task includes details such as the Nexus Service and Nexus Operation names, and other information required to process the Nexus request. The Temporal Worker triggers the registered Operation handler based on the Nexus task information.
What is a Nexus Task Execution?
A Nexus Task Execution occurs when a Worker uses the context provided from the Nexus Task and executes an action associated with a Nexus Operation which commonly includes starting a Nexus Operation using it's Nexus Operation handler plus many additional actions that may be performed on a Nexus Operation.
The NexusOperationScheduled Event corresponds to when the Temporal Service records the Workflow's intent to schedule an operation.
The NexusOperationStarted Event corresponds to when the Worker picks up the Nexus Task from the Task Queue, starts an asynchronous Nexus Operation, and returns an Operation ID to the caller indicating the asynchronous Nexus Operation has started.
Either NexusOperationCompleted or one of the other Closed Nexus Operation Events corresponds to when the Nexus Operation has reached a final state due to successfully completing the operation or unsuccessfully completing the operation in the case of a failure, timeout, or cancellation.
A Nexus Operation Execution appears to the caller Workflow as a single RPC, while under the hood the Temporal Service may issue several Nexus Tasks to attempt to start the Operation. Hence, a Nexus Operation Handler implementation should be idempotent. The WorkflowRunOperation provided by the SDK leverages Workflow ID based deduplication to ensures idempotency and provide an "effectively once" experience.
A Nexus Task Execution completes when a Worker responds to the Temporal Service with either a RespondNexusTaskCompleted or RespondNexusTaskFailed call, or when the Task times out.
The Temporal Service interprets the outcome and determines whether to retry the Task or record the progress in a History Event:
- NexusTaskCompleted
- NexusTaskFailed
What is a Task Queue?
A Task Queue is a lightweight, dynamically allocated queue that one or more Worker Entities poll for Tasks. For each Task Queue name, Temporal creates separate queues for each Task Queue type, specifically:
- Activity Task Queue: A queue that holds Activity Tasks. Activity Tasks represent units of work within a larger business process. Each Activity Task contains the context required by an Activity Definition. Workers poll Activity Tasks from the Activity Task Queue and use them to initiate Activity Executions.
- Workflow Task Queue: A queue that holds Workflow Tasks. Workflow Tasks contain the context needed by a Workflow Definition. Workers poll Workflow Tasks from the Workflow Task Queue and use them to initiate Workflow Executions.
- Nexus Task Queue (Public Preview): A queue that holds Nexus Tasks. Nexus Tasks are used for units of work passed between Namespaces. Workers configured with the Nexus Service poll the Nexus Task Queue for available tasks and handle them by initiating Workflow Executions in the target Namespace. Each Nexus Task contains the context required by the target Workflow Definition.
Each Task Queue type provides unaggregated data.
Task Queue component
Task Queues are very lightweight components. Task Queues do not require explicit registration but instead are created on demand when a Workflow Execution or Activity spawns or when a Worker Process subscribes to it. When a Task Queue is created, both a Workflow Task Queue and an Activity Task Queue are created under the same name. There is no limit to the number of Task Queues a Temporal Application can use or a Temporal Service can maintain.
Workers poll for Tasks in Task Queues via synchronous RPC. This implementation offers several benefits:
- A Worker Process polls for a message only when it has spare capacity, avoiding overloading itself.
- In effect, Task Queues enable load balancing across many Worker Processes.
- Task Queues enable what we call Task Routing, which is the routing of specific Tasks to specific Worker Processes or even a specific process.
- Task Queues support server-side throttling, which enables you to limit the Task dispatching rate to the pool of Worker Processes while still supporting Task dispatching at higher rates when spikes happen.
- When all Worker Processes are down, messages simply persist in a Task Queue, waiting for the Worker Processes to recover.
- Worker Processes do not need to advertise themselves through DNS or any other network discovery mechanism.
- Worker Processes do not need to have any open ports, which is more secure.
All Workers listening to a given Task Queue must have identical registrations of Activities and/or Workflows. The one exception is during a Server upgrade, where it is okay to have registration temporarily misaligned while the binary rolls out.
Where to set Task Queues
There are four places where the name of the Task Queue can be set by the developer.
- A Task Queue must be set when spawning a Workflow Execution:
- How to start a Workflow Execution using the Temporal CLI
- How to start a Workflow Execution using the Go SDK
- How to start a Workflow Execution using the Java SDK
- How to start a Workflow Execution using the PHP SDK
- How to start a Workflow Execution using the Python SDK
- How to start a Workflow Execution using the TypeScript SDK
- How to start a Workflow Execution using the .NET SDK
- A Task Queue name must be set when creating a Worker Entity and when running a Worker Process:
Note that all Worker Entities listening to the same Task Queue name must be registered to handle the exact same Workflows Types and Activity Types.
If a Worker Entity polls a Task for a Workflow Type or Activity Type it does not know about, it will fail that Task. However, the failure of the Task will not cause the associated Workflow Execution to fail.
- A Task Queue name can be provided when spawning an Activity Execution:
This is optional. An Activity Execution inherits the Task Queue name from its Workflow Execution if one is not provided.
- How to start an Activity Execution using the Go SDK
- How to start an Activity Execution using the Java SDK
- How to start an Activity Execution using the PHP SDK
- How to start an Activity Execution using the Python SDK
- How to start an Activity Execution using the TypeScript SDK
- How to start an Activity Execution using the .NET SDK
- A Task Queue name can be provided when spawning a Child Workflow Execution:
This is optional. A Child Workflow Execution inherits the Task Queue name from its Parent Workflow Execution if one is not provided.
- How to start a Child Workflow Execution using the Go SDK
- How to start a Child Workflow Execution using the Java SDK
- How to start a Child Workflow Execution using the PHP SDK
- How to start a Child Workflow Execution using the Python SDK
- How to start a Child Workflow Execution using the TypeScript SDK
- How to start a Child Workflow Execution using the .NET SDK
Task ordering
Task Queues can be scaled by adding partitions. The default number of partitions is 4.
Task Queues with multiple partitions do not have any ordering guarantees. Once there is a backlog of Tasks that have been written to disk, Tasks that can be dispatched immediately (“sync matches”) are delivered before tasks from the backlog (“async matches”). This approach optimizes throughput.
Task Queues with a single partition are almost always first-in, first-out, with rare edge case exceptions. However, using a single partition limits you to low- and medium-throughput use cases.
This section is on the ordering of individual Tasks, and does not apply to the ordering of Workflow Executions, Activity Executions, or Events in a single Workflow Execution. The order of Events in a Workflow Execution is guaranteed to remain constant once they have been written to that Workflow Execution's History.
What is a Sticky Execution?
A Sticky Execution is when a Worker Entity caches the Workflow in memory and creates a dedicated Task Queue to listen on.
A Sticky Execution occurs after a Worker Entity completes the first Workflow Task in the chain of Workflow Tasks for the Workflow Execution.
The Worker Entity caches the Workflow in memory and begins polling the dedicated Task Queue for Workflow Tasks that contain updates, rather than the entire Event History.
If the Worker Entity does not pick up a Workflow Task from the dedicated Task Queue in an appropriate amount of time, the Temporal Service will resume Scheduling Workflow Tasks on the original Task Queue. Another Worker Entity can then resume the Workflow Execution, and can set up its own Sticky Execution for future Workflow Tasks.
Sticky Executions are the default behavior of the Temporal Platform.
What is Task Routing?
Task Routing is simply when a Task Queue is paired with one or more Workers, primarily for Activity Task Executions.
This could also mean employing multiple Task Queues, each one paired with a Worker Process.
Task Routing has many applicable use cases.
Some SDKs provide a Session API that provides a straightforward way to ensure that Activity Tasks are executed with the same Worker without requiring you to manually specify Task Queue names. It also includes features like concurrent session limitations and worker failure detection.
Flow control
A Worker that consumes from a Task Queue asks for an Activity Task only when it has available capacity, so it is never overloaded by request spikes. If Activity Tasks get created faster than Workers can process them, they are backlogged in the Task Queue.
Throttling
The rate at which each Activity Worker polls for and processes Activity Tasks is configurable per Worker. Workers do not exceed this rate even if it has spare capacity. There is also support for global Task Queue rate limiting. This limit works across all Workers for the given Task Queue. It is frequently used to limit load on a downstream service that an Activity calls into.
Specific environments
In some cases, you might need to execute Activities in a dedicated environment. To send Activity Tasks to this environment, use a dedicated Task Queue.
Route Activity Tasks to a specific host
In some use cases, such as file processing or machine learning model training, an Activity Task must be routed to a specific Worker Process or Worker Entity.
For example, suppose that you have a Workflow with the following three separate Activities:
- Download a file.
- Process the file in some way.
- Upload a file to another location.
The first Activity, to download the file, could occur on any Worker on any host. However, the second and third Activities must be executed by a Worker on the same host where the first Activity downloaded the file.
In a real-life scenario, you might have many Worker Processes scaled over many hosts. You would need to develop your Temporal Application to route Tasks to specific Worker Processes when needed.
Code samples:
Route Activity Tasks to a specific process
Some Activities load large datasets and cache them in the process. The Activities that rely on those datasets should be routed to the same process.
In this case, a unique Task Queue would exist for each Worker Process involved.
Workers with different capabilities
Some Workers might exist on GPU boxes versus non-GPU boxes. In this case, each type of box would have its own Task Queue and a Workflow can pick one to send Activity Tasks.
Multiple priorities
If your use case involves more than one priority, you can create one Task Queue per priority, with a Worker pool per priority.
Versioning
Task Routing is the simplest way to version your code.
If you have a new backward-incompatible Activity Definition, start by using a different Task Queue.
What is a Worker Session?
A Worker Session is a feature provided by some SDKs that provides a straightforward API for Task Routing to ensure that Activity Tasks are executed with the same Worker without requiring you to manually specify Task Queue names. It also includes features like concurrent session limitations and Worker failure detection.
What is Worker Versioning?
- In Pre-release
- Introduced in Temporal Server version 1.21.0
- CLI, recommended and minimum 0.13.1
- Go SDK recommended v1.27.0, minimum v1.23.0
- Java SDK minimum v1.20.0
- TypeScript SDK minimum v1.8.0
- Python SDK minimum v1.3.0
- .NET SDK minimum v0.1.0-beta1
Worker Versioning simplifies the process of deploying changes to Workflow Definitions.
See the Pre-release README.md for more information.
If you're using the older version of the Worker Versioning pre-release that was released in late 2023, go here.