Version: 8.9 (unreleased)

Test your AI agents with CPT

Test your AI agent processes in Camunda 8 with Camunda Process Test (CPT).

About

AI agent processes are non-deterministic: the AI Agent connector inside an ad-hoc sub-process decides at runtime which tools to invoke and in what order, and its free-text output varies across runs.

In this guide, you will write integration tests that keep the AI agent and LLM interaction real while mocking external tool executions, using two CPT features:

Conditional behavior reacts to whichever tasks the agent activates, instead of blocking on a single hard-coded execution order. This addresses the non-deterministic control flow.
Judge assertions verify AI-generated output or tool execution results with a judge LLM that scores whether a value satisfies a natural-language expectation, replacing brittle exact-match checks.

After completing this guide, you will be able to test your AI agents using CPT.

Prerequisites

You use Camunda 8.9+.
You use the Camunda Process Test Spring Boot Starter.
You have Camunda Process Test set up.
You have downloaded the AI Agent Chat With Tools process to your local machine.

important

This guide is a follow-up to Build your first AI agent, in which you use the same example AI agent process. Completing that guide first is recommended. However, you can also apply this guide to other AI agent process implementations.

Step 1: Prepare the example AI agent blueprint

Place the BPMN file and any associated forms for your AI agent process in the src/main/resources directory of your Spring Boot project. Create it if it does not already exist.

You can organize files into subdirectories such as bpmn/ and forms/.

Step 2: Configure the LLM provider and connectors

Judge assertions send a process variable and a natural language expectation to a configured LLM, which scores how well they match. The assertion passes if the score meets a configurable threshold. This avoids brittle string-matching on free-text AI output.

For this testing style, first configure both the connector runtime and the judge LLM. The goal is to keep the AI agent and LLM interaction real while disabling outbound connector execution for the tool calls you want to control in the test.

Configure the connector runtime

Add the following connector runtime configuration to your test configuration, for example in src/test/resources/application.yaml or as inline properties on @SpringBootTest. For the full property reference, see the CPT configuration docs.

camunda:
  process-test:
    assertion:
      timeout: PT1M
    connectors-enabled: true
    connectors-env-vars:
      CAMUNDA_CONNECTOR_POLLING_ENABLED: "false"
      CONNECTOR_OUTBOUND_DISCOVERY_DISABLED: "true"
      CONNECTOR_OUTBOUND_DISABLED: "io.camunda:http-json:1"

With this setup:

The assertion timeout is increased to one minute. AI agent processes involve LLM interactions and typically take longer than standard BPMN processes.
CPT starts the connector runtime needed by the AI agent process.
Outbound connector executions, such as the HTTP JSON connector, are disabled so tool behavior can be controlled by the test with conditional behavior.

If your AI agent tools use different outbound connectors, adjust CONNECTOR_OUTBOUND_DISABLED accordingly.

Configure the LLM provider

Configure the LLM provider for the judge. The judge does not need the same provider or model as your AI agent. A lighter model often works well since the judge context is much smaller.

Amazon Bedrock
Ollama

camunda:
  process-test:
    connectors-secrets:
      AWS_BEDROCK_ACCESS_KEY: ${AWS_LLM_BEDROCK_ACCESS_KEY}
      AWS_BEDROCK_SECRET_KEY: ${AWS_LLM_BEDROCK_SECRET_KEY}
    judge:
      chat-model:
        provider: "amazon-bedrock"
        model: "eu.anthropic.claude-haiku-4-5-20251001-v1:0"
        region: "eu-central-1"
        credentials:
          access-key: ${AWS_LLM_BEDROCK_ACCESS_KEY}
          secret-key: ${AWS_LLM_BEDROCK_SECRET_KEY}

Use this provider for Ollama.

camunda:
  process-test:
    judge:
      chat-model:
        provider: "openai-compatible"
        model: "gpt-oss:20b"
        base-url: "http://localhost:11434/v1"

Manage secrets safely

Avoid committing credentials to your test configuration files. CPT properties support Spring's external configuration, so you can inject secrets through environment variables, CI/CD secret stores, or other techniques. See the CPT configuration reference for details.

The AI agent can still interact with the configured LLM provider, while the test controls the tool executions.

For the full property reference, see judge configuration.

Step 3: Set up the test class

Add the @Deployment annotation to your Spring Boot application class to declare which resources CPT should deploy:

@SpringBootApplication
@Deployment(resources = {"classpath*:/bpmn/**/*.bpmn", "classpath*:/forms/**/*.form"})
public class MyApplication {}

Then create a test class annotated with @SpringBootTest and @CamundaSpringProcessTest, and inject the CamundaClient and CamundaProcessTestContext:

@SpringBootTest(classes = MyApplication.class)
@CamundaSpringProcessTest
class AiAgentProcessTest {

    @Autowired
    private CamundaClient client;

    @Autowired
    private CamundaProcessTestContext processTestContext;
}

For the full setup including dependencies and project structure, see Getting started with Camunda Process Test.

Step 4: Handle non-deterministic flow paths

The test uses the prompt "Send Ervin a joke". In response, the agent:

Calls ListUsers, LoadUserByID, and Jokes_API in any order.
Presents an email for review via AskHumanToSendEmail.
Collects feedback through User_Feedback.

With conditional behavior, you can register background reactions that monitor the process state and execute actions as conditions are met, without blocking the test thread. Register behaviors before starting the process; they then react independently as the process progresses.

Each behavior watches for a specific element to become active and then completes it with test data. If the agent never activates that element, the behavior simply never triggers and the test does not stall.

Complete tool tasks

Register a behavior for each tool task the agent might invoke. In this integration test, these behaviors stand in for external tool executions such as REST connector calls.

First, define records for the tool call results:

record User(int id, String name, String username) {}
record UserDetail(int id, String name, String username, String email) {}

processTestContext
    .when(
        () -> assertThatProcessInstance(ProcessInstanceSelectors.byProcessId("ai-agent-chat-with-tools"))
            .hasActiveElements("ListUsers"))
    .as("complete ListUsers")
    .then(
        () -> processTestContext.completeJob(
            JobSelectors.byElementId("ListUsers"),
            Map.of("toolCallResult",
                List.of(
                    new User(1, "Leanne Graham", "Bret"),
                    new User(2, "Ervin Howell", "Antonette")))));

processTestContext
    .when(
        () -> assertThatProcessInstance(ProcessInstanceSelectors.byProcessId("ai-agent-chat-with-tools"))
            .hasActiveElements("LoadUserByID"))
    .as("complete LoadUserByID")
    .then(
        () -> processTestContext.completeJob(
            JobSelectors.byElementId("LoadUserByID"),
            Map.of("toolCallResult",
                new UserDetail(2, "Ervin Howell", "Antonette", "123@abc.local"))));

Register a behavior that completes the Jokes_API tool. This behavior uses chained .then() calls to return different jokes on repeated invocations:

String firstJoke = "Why did the workflow cross the road? To get to the happy path.";
String secondJoke = "Why did the BPMN diagram apply for a job? It had excellent flow experience.";

processTestContext
    .when(
        () -> assertThatProcessInstance(ProcessInstanceSelectors.byProcessId("ai-agent-chat-with-tools"))
            .hasActiveElements("Jokes_API"))
    .as("complete jokes tool")
    .then(
        () -> processTestContext.completeJob(
            byElementId("Jokes_API"), Map.of("toolCallResult", firstJoke)))
    .then(
        () -> processTestContext.completeJob(
            byElementId("Jokes_API"), Map.of("toolCallResult", secondJoke)));

Complete user tasks

The AskHumanToSendEmail user task requires human approval. Register a behavior that auto-approves the email when the task appears:

processTestContext
    .when(
        () -> assertThatProcessInstance(ProcessInstanceSelectors.byProcessId("ai-agent-chat-with-tools"))
            .hasActiveElements("AskHumanToSendEmail"))
    .as("approve email")
    .then(
        () -> processTestContext.completeUserTask(
            "AskHumanToSendEmail", Map.of("emailOk", true)));

important

Each behavior's action should resolve the process state that the condition checks for. For example, if the condition checks for an active user task, the action should complete that task. Otherwise the behavior may execute repeatedly.

Handle repeated invocations

Use chained .then() calls when a behavior should produce different results on repeated invocations. The first action is consumed on the first invocation, and the last action repeats for all subsequent invocations.

In this example, the first feedback rejection sends the agent back with a follow-up request, and the second feedback loop approves the result:

processTestContext
    .when(
        () -> assertThatProcessInstance(ProcessInstanceSelectors.byProcessId("ai-agent-chat-with-tools"))
            .hasActiveElements("User_Feedback"))
    .as("feedback loop")
    .then(
        () -> processTestContext.completeUserTask(
            "User_Feedback",
            Map.of(
                "userSatisfied", false,
                "followUpInput", "This joke is bad, send Ervin a better joke")))
    .then(
        () -> processTestContext.completeUserTask(
            "User_Feedback", Map.of("userSatisfied", true)));

For the full conditional behavior API, see Utilities.

Step 5: Verify agent output with judge assertions

After the process completes, use a judge assertion to verify that the agent's output satisfies a natural language expectation. The following example puts the full test together: it registers the conditional behaviors from Step 4, starts the process with the prompt "Send Ervin a joke", and then asserts that the agent completed the scenario correctly:

@Test
void shouldSendErvinAJoke() {
    ProcessInstanceEvent processInstance = client.newCreateInstanceCommand()
        .bpmnProcessId("ai-agent-chat-with-tools")
        .latestVersion()
        .variables(Map.of("inputText", "Send Ervin a joke"))
        .send()
        .join();

    assertThat(processInstance).isCompleted();
    assertThat(processInstance)
        .hasVariableSatisfiesJudge(
            "agent",
            """
            The agent correctly identified Ervin by calling the following tools:
            1. ListUsers
            2. LoadUserByID with id=2.
            Furthermore, the agent called AskHumanToSendEmail and the email
            should have been sent successfully!
            The mail must contain a joke.
            After the user rejected the first joke and asked for another one, the
            agent offered a second, different joke.
            """);
}

The expectation is a plain-text description of what the agent should have done. The judge does not compare strings literally. It evaluates whether the actual variable content satisfies the expectation semantically, so different phrasing or formatting in the agent's output does not cause false failures.

The judge evaluates matches using the following scoring scale:

Score	Meaning
1.0	Fully satisfied semantically. Different wording or formatting that conveys the same meaning counts as fully satisfied.
0.75	Satisfied in substance with only minor differences that do not affect correctness.
0.5	Partially satisfied. Some required elements are present but others are missing or incorrect.
0.25	Mostly not satisfied. Only marginal relevance.
0.0	Not satisfied at all, or the actual value is empty.

The LLM may return any value between these anchor points (for example, 0.6 or 0.85). The default threshold is 0.5. This means the assertion passes when the response is at least partially satisfied according to the rubric, which is a practical default for AI-generated output that may vary in wording or completeness across runs. Use a higher threshold when the response must satisfy stricter semantic requirements. You can change the threshold globally in the judge configuration or per assertion using withJudgeConfig.

If the assertion fails, for example because the agent never called LoadUserByID or sent the email to the wrong address, the judge returns a low score with an explanation of which parts of the expectation were not met. This gives you a clear, human-readable failure message instead of a generic assertion error.

Step 6: Tune the judge evaluation

Use withJudgeConfig to set a stricter threshold for individual assertions:

assertThat(processInstance)
    .withJudgeConfig(config -> config.withThreshold(0.8))
    .hasVariableSatisfiesJudge(
        "agent",
        "The email body contains a joke addressed to Ervin.");

You can also replace the default evaluation criteria with a custom prompt. The custom prompt replaces only the evaluation criteria. The system still controls the expectation and value injection, the scoring rubric, and the JSON output format.

Set a custom prompt globally in configuration:

camunda:
  process-test:
    judge:
      custom-prompt: "You are evaluating whether an AI agent correctly identified the intended recipient, used the right tools, and produced an appropriate email response."

Or override the prompt for a single assertion:

assertThat(processInstance)
    .withJudgeConfig(config -> config
        .withCustomPrompt("You are evaluating whether an AI agent correctly identified the intended recipient, used the right tools, and produced an appropriate email response."))
    .hasVariableSatisfiesJudge("agent", "The email body contains a joke addressed to Ervin.");

For the full assertion API, see Assertions.

Limitations

Judge assertions evaluate the serialized JSON string of a process variable. The judge LLM receives this plain-text representation and reasons over it to produce a score. This works well for structured data and natural-language text, but the judge cannot reason about non-textual content such as Camunda documents or other embedded binaries. In these cases, the judge sees only metadata or encoded strings, not the actual content.

Next steps

Now that you know how to test your AI agents, you can:

Learn more about Camunda Process Test assertions, including all judge assertion methods.
Review judge configuration for the full property reference and chat model provider settings.
Explore conditional behavior, including chained actions and lifecycle details.
Learn more about Camunda agentic orchestration and the AI Agent connector.

About​

Prerequisites​

Step 1: Prepare the example AI agent blueprint​

Step 2: Configure the LLM provider and connectors​

Configure the connector runtime​

Configure the LLM provider​

Step 3: Set up the test class​

Step 4: Handle non-deterministic flow paths​

Complete tool tasks​

Complete user tasks​

Handle repeated invocations​

Step 5: Verify agent output with judge assertions​

Step 6: Tune the judge evaluation​

Limitations​

Next steps​