Upgrade Zeebe

This section describes how to upgrade Zeebe to a new version.

caution

Currently, we are facing an issue that can corrupt the data when upgrading to a new version. The issue affects the reprocessing (i.e. rehydrating the data from the records on the log stream) and can be omitted by restoring the data from a snapshot. Please follow the recommended procedure to minimize the risk of losing data. This issue affects only users upgrading from a version lower than 0.24.4 to 0.24.4 or newer.

Rolling upgrade

Zeebe is designed to allow a rolling upgrade of a cluster. The brokers can be upgrade one after the other. The other brokers in the cluster continue processing until the whole upgrade is done.

Upgrade the first broker and wait until it is ready again
Continue with the next broker until all brokers are upgraded
Upgrade the standalone gateways

Helm Charts

If you are using the Helm charts, simply update your values file and change the image tag to the new version you wish to upgrade to, then follow the Helm upgrade guide.

caution

If you are upgrading from a Zeebe version lower than 0.24.4, it is not recommended to perform a rolling upgrade. Please follow the recommended upgrade procedure instead.

Upgrade procedure for Zeebe < 0.24.4

The following procedure describes how to upgrade a Zeebe broker pre 0.24.4. If the cluster contains multiple brokers then these steps can be done for all brokers in parallel. Standalone gateways should be upgraded after all brokers in the cluster are upgraded to avoid mismatches in the protocol version.

caution

This procedure results in a downtime of the whole cluster.

Experimental: Detect reprocessing inconsistency

With Zeebe 0.24.5 and 0.25.1 a new exterimental feature was introduced which detects inconsistency of the logstream on upgrade to mitigate the following issue.

We recommend to enable it after upgrading Zeebe from a version lower than 0.24.4 to a version greater than or equal to 0.24.4 on the first run after the upgrade, as described in the update proceedure. You can enable it using the following environment variable:

ZEEBE_BROKER_EXPERIMENTAL_DETECTREPROCESSINGINCONSISTENCY="true"

After you verified that the upgrade was successful, we recommend to disable it again by removing the environment variable and restarting your brokers.

Preparing the upgrade

Stop the workflow processing
- Close all job workers
- Interrupt the incoming connections to avoid user commands
Wait until a snapshot is created for all partitions
- By default, a snapshot is created every 15 minutes
- Verify that a snapshot is created by looking at the Metric zeebe_snapshot_count on the leader and the followers
- Note that no snapshot is created if no processing happened since the last snapshot
Make a backup of the data folder

Performing the upgrade

With inconsistency detection
Without inconsistency detection

Shut down the broker
Replace the /bin and /lib folders with the versions of the new distribution
Start up the broker with the experimental inconsistency detection enabled
Verify the upgrade
Restart the broker with experimental inconsistency detection disabled

Shut down the broker
Replace the /bin and /lib folders with the versions of the new distribution
Start up the broker
Verify the upgrade

Verifying the upgrade

The upgrade is successful if the following conditions are met:

the broker is ready (see Ready Check)
the broker is healthy (see Health Check)
all partitions are healthy (see the Metric zeebe_health)
the stream processors of the partition leaders are in the phase PROCESSING (see Partitions Admin Endpoint)

If the upgrade failed because of a known issue then a partition change its status to unhealthy, and the log output may contain the following error message:

Sample Upgrade Error Message

Unexpected error on recovery happens.
io.zeebe.engine.processor.InconsistentReprocessingException: Reprocessing issue detected!
  Restore the data from a backup and follow the recommended upgrade procedure. [cause:
  "The key of the record on the log stream doesn't match to the record from reprocessing.",
  log-stream-record: {"partitionId":1,"value":{"version":1,"bpmnProcessId":"parallel-tasks",
  "workflowKey":2251799813685249,"parentElementInstanceKey":-1,"parentWorkflowInstanceKey":-1,
  "bpmnElementType":"PARALLEL_GATEWAY","flowScopeKey":2251799813685251,
  "elementId":"ExclusiveGateway_0tkgnd5","workflowInstanceKey":2251799813685251},
  "key":2251799813685256,"sourceRecordPosition":4294997784,"valueType":"WORKFLOW_INSTANCE",
  "timestamp":1601025180728,"recordType":"EVENT","intent":"ELEMENT_ACTIVATING",
  "rejectionType":"NULL_VAL","rejectionReason":"","position":4294998112},
  reprocessing-record: {key=2251799813685255, sourceRecordPosition=4294997784,
  intent=WorkflowInstanceIntent:ELEMENT_ACTIVATING, recordType=EVENT}]

In this case, the broker should be rolled back to the previous version and the backup should be restored. Ensure that the upgrade was prepared correctly. If it is still unclear why it was not successful then please contact the Zeebe team and ask for guidance.

Partitions admin endpoint

This endpoint allows querying the status of the partitions and performing operations to prepare an upgrade.

In version 0.23
In version >= 0.24

The endpoint is available under http://{zeebe-broker}:{zeebe.broker.network.monitoringApi.port}/partitions (default port: 9600).

It is enabled by default and cannot be disabled.

The endpoint is available under http://{zeebe-broker}:{zeebe.broker.network.monitoringApi.port}/actuator/partitions (default port: 9600).

It is enabled by default. It can be disabled in the configuration by setting:

management.endpoint.partitions.enabled=false

Query the partition status

The status of the partitions can be queried by a GET request:

/actuator/partitions

The response contains all partitions of the broker mapped to the partition-id.

Full Response

{
    "1":{
        "role":"LEADER",
        "snapshotId":"399-1-1601275126554-490-490",
        "processedPosition":490,
        "processedPositionInSnapshot":490,
        "streamProcessorPhase":"PROCESSING"
    }
}

Rolling upgrade​

Upgrade procedure for Zeebe < 0.24.4​

Experimental: Detect reprocessing inconsistency​

Preparing the upgrade​

Performing the upgrade​

Verifying the upgrade​

Partitions admin endpoint​

Query the partition status​

Rolling upgrade

Upgrade procedure for Zeebe < 0.24.4

Experimental: Detect reprocessing inconsistency

Preparing the upgrade

Performing the upgrade

Verifying the upgrade

Partitions admin endpoint

Query the partition status