In order to define and size your environment for Camunda Platform 8 appropriately, you need to understand the factors that influence hardware requirements. Then you can apply this knowledge to select the appropriate Camunda Platform 8 SaaS hardware package or size your self-managed Kubernetes cluster.
This best practice targets Camunda Platform 8 only! If you are looking at Camunda Plaform, please visit Sizing your Camunda 7 environment.
Understanding influencing factors
Let's understand the important numbers.
Throughput defines, how many process instances can be executed in a certain timeframe.
It is typically easy to estimate the number of process instances per day you need to execute. If you only know the number of process instances per year, we recommend to divide this number by the 250 (average number of working days in a year).
But the hardware sizing depends more on the number of BPMN tasks in a process model. For example, you will have a much higher throughput for processes with one service task than for processes with 30 service tasks.
If you already know your future process model, you can use this to count the number of tasks for your process. For example, the following onboarding process contains five service tasks in a typical execution.
If you don't yet know the number of service tasks, we recommend to assume 10 service tasks as a rule of thumb.
The number of tasks per process allows you to calculate the required number of tasks per day (tasks/day) which can also be converted into tasks per second (tasks/s) (devide by 24 hours * 60 minutes * 60 seconds).
|Onboarding instances per year||5,000,000||Business input|
|Process instances per business day||20,000||/ 250||average number of working days in a year|
|Tasks per day||100,000||* 5||Tasks in the process model as counted above|
|Tasks per second||1.16||/ (24*60*60)||Seconds per day|
In most cases, we define throughput per day, as this time frame is easier to understand. But in high-performance use cases you might need to define the throughput per second.
In most scenarios, your load will be volatile and not constant. For example, your company might start 90% of their monthly process instances in the same day of the month. The ability to handle those peaks is the more crucial requirement and should drive your decision instead of looking at the average load.
In the above example, that one day with the peak load defines your overall throughput requirements.
Sometimes, looking at peaks might also mean, that you are not looking at all 24 hours of a day, but only 8 business hours, or probably the busiest 2 hours of a day, depending on your typical workload.
Latency and cycle time
In some use cases, the cycle time of a process (or sometimes even the cycle time of single tasks) matter. For example, you want to provide a REST endpoint, that starts a process instance to calculate a score for a customer. This process needs to execute four service tasks, but the REST request should return a response synchronously, no later than 250 milliseconds after the request.
While the cycle time of service tasks depends very much on what you do in these tasks, the overhead of the workflow engine itself can be measured. In an experiment with Camunda Platform 8 1.2.4, running all worker code in in the same GCP zone as Camunda Platform 8, we measured around 10ms processing time per process node and approximately 50 ms latency to process service tasks in remote workers. Hence, to execute 4 service tasks results in 240 ms workflow engine overhead.
The closer you push throughput to the limits, the more latency you will get. This is basically, because the different requests compete for hardware resources, especially disk write operations. As a consequence, whenever cycle time and latency matters to you, you should plan for hardware buffer to not utilize your cluster too much. This makes sure, your latency does not go up because of resource contention. A good rule of thumb is to multiply your average load by 20. This means, you cannot only accomodate unexpected peak loads, but also have more free resources on average, keeping latency down.
|Onboarding instances per year||5,000,000||Business input, but irrelevant|
|Expected process instances on peak day||150,000||Business input|
|Process instances per second within business hours on peak day||5.20||/ (8*60*60)||Only looking at seconds of the 8 business hours of a day|
|Process instances per second including buffer||104.16||* 20||Adding some buffer is recommended in critical high-performance or low-latency use cases|
Every process instance can hold a payload (known as process variables). The payload of all running process instances must be managed by the runtime workflow engine, and all data of running and ended process instances is also forwarded to Operate and Optimize.
The data you attach to a process instance (process variables) influences resource requirements. For example, it makes a big difference if you only add one or two strings (requiring around 1 KB of space) to your process instances, or a full JSON document containing 1 MB. Hence, the payload size is an important factor when looking at sizing.
There are a few general rules regarding payload size:
- The maximum variable size per process instance is limited, currently to roughly 3 MB.
- We don't recommend storing much data in your process context. See our best practice on handling data in processes.
- Every partition of the Zeebe installation can typically handle up to 1 GB of payload in total. Larger payloads can lead to slower processing. For example, if you run one million process instances with 4 KB of data each, you end up with 3.9 GB of data, and you should run at least four partitions. In reality, this typically means six partitions, as you want to run the number of partitions as a multiple of the replication factor, which by default is three.
The payload size also affects disk space requirements, as described in the next section.
The workflow engine itself will store data along every process instance, especially to keep the current state persistent. This is unavoidable. In case there are human tasks, data is also sent to Tasklist and kept there, until tasks are completed.
Furthermore, data is also sent Operate and Optimize, which store data in Elasticsearch. These tools keep historical audit data for some time. The total amount of disk space can be reduced by using data retention settings. We typically delete data in Operate after 30 to 90 days, but keep it in Optimize for a longer period of time to allow more analysis. A good rule of thumb is something between 6 and 18 months.
Elasticsearch needs enough memory available to load a large amount of this data into memory.
Assuming a typical payload of 15 process variables (simple strings, numbers or booleans) we measured the following approximations for disk space requirements using Camunda Platform 8 SaaS 1.2.4. Please note, that these are not exact numbers, but they might give you an idea what to expect:
- Zeebe: 75 kb / PI
- Operate: 57 kb / PI
- Optimize: 21 kb / PI
- Tasklist: 21 kb / PI
- Sum: 174 kb / PI
Using your throughput and retention settings, you can now calculate the required disk space for your scenario. Example:
|Process instances per day||20,000|
|Typical process cycle time||* 5 days||100,000||How long is a process instance typically active? Determines the number of active process instances|
|Disk space for Zeebe||* 75 kib||7.15 GiB||(Converted into GB by / 1024 / 1024)|
|Disk space for Tasklist||* 21 kib||0.67 GiB|
|PI in retention time||* 30 day||600,000|
|Disk space||* 57 kib||32.62 GiB|
|PI in retention time||* 6 months||3,600,000|
|Disk space||* 21 kib||72.10 GiB|
Understanding sizing and scalability behavior
Spinning up a Camunda Platform 8 Cluster means you run multiple components that all need resources in the background, like the Zeebe broker, Elasticsearch (as the database for Operate, Tasklist, and Optimize), Operate, Tasklist, and Optimize. All those components need to be equiped with resources.
All components are clustered to provide high-availability, fault-tolerance and resiliency.
Zeebe scales horizontally by adding more cluster nodes (pods). This is limited by the number of partitions configured for a Zeebe cluster, as the work within one partition cannot be parallelized by design. Hence, you need to define enough partitions to utilize your hardware. The number of partitions cannot be changed after the cluster was initially provisioned (at least not yet), elastic scalability of partitions is not yet possible.
If you anticipate the load increasing over time, prepare by configuring more partitions than you currently need as a buffer. For example, you could multiply the number of partitions you need for your current load by four to add a buffer. This typically has just a small impact on performance.
Camunda Platform 8 runs on Kubernetes. Every component is operated as a so-called pod, that gets resources assigned. These resources can be vertically scaled (=get more or less hardware resources assigned dynamically) within certain limits. Note that vertically scaling not always results in more throughput, as the various components have dependencies on each other. This is a complex topic and requires running experiments with benchmarks. In general, we recommend to start with the minimalistic hardware package as described below. If you have further requirements, you use this as a starting point to increase resources.
Note that Camunda licensing does not depend on the provisioned hardware resources, making it easy to size according to your needs.
Sizing your runtime environment
First, calculate your requirements using the information provided above, taking the example calculations from above:
- Throughput: 20,000 process instances / day
- Disk space: 114 GB
Now you can select a hardware package that can cover these requirements. In this example this fits well into a cluster of size S.
Camunda Platform 8 SaaS
Camunda Platform 8 defines three fixed hardware packages you can select from. The table below gives you an indication what requirements you can fullfill with these. If your requirements are above the mentioned numbers, please contact us to discuss a customized sizing.
|Max Throughput Tasks/day||5.9 M||23 M||43 M|
|Max Throughput Tasks/second||65||270||500|
|Max Throughput Process Instances/day||0.5 M||2.3 M||15 M|
|Max Total Number of Process Instances stored (in Elasicsearch in total)||100 k||5.4 M|
|Approx resources provisioned **||15 vCPU, 20 GB mem, 640 GB disk||28 vCPU, 50 GB mem, 640 GB disk||56 vCPU, 85 GB mem, 1320 GB disk|
* The numbers in the table where measured using Camunda Platform 8 (version 8.0) and the benchmark project. It uses a ten task process. To calculate day-based metrics, an equal distribution over 24 hours is assumed.
** These are the resource limits configured in the Kubernetes cluster and are always subject to change.
You might wonder why the total number of process instances stored is that low. This is related to limited resources provided to Elasticsearch, yielding performance problems with too much data stored there. By increasing the available memory to Elasticsearch you can also increase that number. At the same time, even with this rather low number, you can always guarantee the throughput of the core workflow engine during peak loads, as this performance is not influenced. Also, you can always increase memory for Elasticsearch later on if it is required.
Camunda Platform 8 self-managed
Provisioning Camunda Platform 8 onto your self-managed Kubernetes cluster might depend on various factors. For example, most customes already have own teams providing Elasticsearch for them as a service. However, the following example shows a possible configuration which is close to a cluster of size S in Camunda Platform 8 SaaS, which can serve as a starting point for your own sizing. Such a cluster can serve 500,000 process instances per day and store up to 100,000 process instances in Elasticsearch (in-flight and history).
|gateway||embedded in broker|
|Mem [GB] limit||0.2||1|
|Mem [GB] limit||0.2||1|
|Mem [GB] limit||0.2||1|
|Mem [GB] limit||0.2||2|
|Mem [GB] limit||0.4||1|
|Mem [GB] limit||0.4||1|
|Mem [GB] limit||3||6|
|Disk [GB] request||64||100|
|Mem [GB] limit||0.25||0.5|
|Other (Worker, Analytics, ...)|
|Mem [GB] limit||0.45||0.45|
Planning non-production environments
All clusters can be used for development, testing, integration, Q&A, and production. In Camunda Platform 8 SaaS, production and test environments are organized via separate organizations within Camunda Platform 8 to ease the management of clusters, while also minimizing the risk to accidentally accessing a production cluster.
Note that functional unit tests that are written in Java and use zeebe-process-test, will use an in-memory broker in unit tests, so no development cluster is needed for this use case.
For typical integration or functional test environments, you can normally just deploy a small cluster, like the one shown above, even if your production environment is sized bigger. This is typically sufficient, as functional tests typically run much smaller workloads.
Load or performance tests ideally run on the same sizing configuration as your production instance to yield reliable results.
A typical customer set-up consists of:
- 1 Production cluster
- 1 Integration or pre-prod cluster (equal in size to your anticipated production cluster if you want to run load tests or benchmarks)
- 1 Test cluster
- Multiple developer clusters
Ideally, every active developer runs its own cluster, so that the workflow engine does not need to be shared amongst developers. Otherwise clusters are not isolated, which can lead to errors if for example developer A deploys a new version of the same process as developer B. Typically, developer clusters can be deleted when they are no longer used, as no data needs to be kept, so you might not need one cluster per developer that works with Camunda Platform 8 at some point in time. And using in-memory unit tests further reduces the contention on developer clusters.
However, some customers do share a Camunda Platform 8 cluster amongst various developers for economic reasons. This can work well if everybody is aware of the problems that can arise.
Running experiments and benchmarks
If you are in doubt about which package to choose, you can do a load test with a representative workload with the target hardware package. This will help you decide if the specific package can serve your needs.
This is recommended if you exceed the above numbers of three million process instances per day.
Take a look at the Camunda Platform 8 benchmark project as a starting point for your own benchmarks.