Version: 8.8 (unreleased)

IDP reference

Technical reference information for IDP, including technical architecture, supported documents, and known limitations.

Technical architecture

IDP offers a composable architecture that allows you to customize and extend IDP capabilities as needed. This flexibility enables you to adapt quickly to evolving business needs while maintaining a streamlined and manageable system.

IDP allows you to create, configure, and publish a document extraction template. This is a type of connector template.

The document extraction template integrates with Camunda document handling connectors and APIs such as Amazon S3, Amazon Textract, Amazon Comprehend, and Amazon Bedrock to retrieve, analyze, and process documents.

Document upload: The template accepts uploaded documents as input. These documents can be uploaded to a local document store, and their references used in the extraction process. For example, the connector uploads the document to an Amazon S3 bucket for extraction.
Amazon Textract: Uploaded documents are analyzed by Amazon Textract, which extracts text data and returns the results. The template configuration includes specifying the document, the S3 bucket name for temporary storage during Amazon Textract analysis, and other required parameters such as extraction fields and Amazon Bedrock Converse parameters.
Amazon Bedrock: Your extraction field prompts are used by Amazon Bedrock to extract data from the document. The extracted content is mapped to process variables, and the results stored in a specified result variable.

note

You may encounter errors during extraction and validation if you have not added your Amazon AWS IAM account access key and secret key as a connector secret to your cluster. See configuring IDP.

Document storage

IDP stores documents as follows during the different extraction stages:

Web Modeler: Uploaded sample documents are stored within Web Modeler itself (SaaS) or the database (Self-Managed).
Cluster: During extraction testing (for example, when you click Extract document) the document is stored in the cluster using the document handling API.
Extraction: Finally, when you extract content using a document extraction template, it is stored in an Amazon AWS S3 bucket, where it can be accessed by AWS Textract.

info

To learn more about storing, tracking, and managing documents in Camunda 8, see document handling.

Document file formats

IDP currently only supports data extraction from the following uploaded document file formats.

File format	Description
PDF	PDF documents must not be password protected. Maximum document file size is 4MB for all IDP operations. Both text and image content can be extracted from a PDF document. For example, data can be extracted from a scanned image that has been converted to PDF.

Document language support

IDP supports data extraction and processing of documents in multiple languages.

IDP integrates with Amazon Textract, which supports multilingual text extraction and is capable of detecting and extracting text in multiple languages. This ensures that the extracted text can be accurately mapped to process variables and used within your workflows, regardless of document language.

note

At the time of the 8.7 release (April 2025), Amazon Textract can detect printed text and handwriting from the Standard English alphabet and ASCII symbols, and can extract printed text, forms and tables in English, German, French, Spanish, Italian and Portuguese. Refer to Amazon Textract FAQs for current information on supported languages.

Extraction field data types

Specify the extraction field data type to indicate to the LLM what type of data it should be trying to extract. This helps the LLM more accurately analyze and extract the correct data.

For example, if you want to extract an expected numeric value (such as a monetary value), select the Number data type for the extraction field.

Supported data types

You can specify the following extraction field data types.

Data type	Description
Boolean	The LLM should expect a true or false value, such as "yes" or "no".
Number	The LLM should expect to extract a numeric value.
String	The LLM should expect to extract a sequence of characters.

Extraction models

You can choose from the following supported LLM extraction models during data extraction.

Extraction model	Model provider	Documentation
Claude Sonnet 4	Anthropic	Anthropic's Claude in Amazon Bedrock
Claude 3.5 Sonnet	Anthropic	Anthropic's Claude in Amazon Bedrock
Claude 3 Sonnet	Anthropic	Anthropic's Claude in Amazon Bedrock
Claude 3 Haiku	Anthropic	Anthropic's Claude in Amazon Bedrock
Llama 3 70B Instruct	Meta	Meta's Llama in Amazon Bedrock
Llama 3 8B Instruct	Meta	Meta's Llama in Amazon Bedrock
Titan Text Premier	Amazon AWS	Amazon Titan Text models

note

Amazon Bedrock LLM extraction models are only available in specific regions.

You must ensure your selected cluster region supports the LLM extraction model you want to use. For example, if you are using the eu-central-1 region, you cannot use Claude 3 Haiku as it is only available in US regions.
If you have chosen a model not supported in your region, you will receive a 403 "You don't have access to the model with the specified model ID" exception error.
Some newer models (including Claude Sonnet 4) require cross-region inference profiles and are automatically handled by IDP. When you select these models, IDP infers the appropriate regional prefix (us., eu., apac., or us-gov.) from your configured AWS region and adds it to enable access across supported regions within your geographic area.

For current regional support information, refer to supported foundation models in Amazon Bedrock. For more details about cross-region inference, see inference profiles.

Table data extraction

IDP can extract table data using LLM foundation models to identify and structure tabular data based on your prompts.

Default JSON extraction format

When extracting repeated elements from a document, the extraction defaults to JSON format unless instructed.

In this format:

Table data is represented as an array of objects.
Each object corresponds to a row.
Column names are used as object keys, with values mapped accordingly.

Example JSON output

Prompt: "Extract a list of name and ages of patients on floor 1".

[
  {
    "name": "Kaitlin Jones",
    "age": 41
  },
  {
    "name": "Thomas Hampton",
    "age": 57
  }
]

CSV extraction

To extract table data in CSV format, specify this in the prompt. The output is then structured in a CSV-compatible format.

Example CSV output

Prompt: "Extract a list of name and ages of patients on floor 1 as CSV".

Name,Age
Kaitlin Jones,41
Thomas Hampton,57

Customize table data extraction

You can further refine table extraction by:

Explicitly specifying column headers.
Defining delimiter preferences for CSV.
Requesting additional context for ambiguous data.

Access rights and permissions

Access to IDP features is determined by your Web Modeler user role and associated access rights and permissions.

For example, users with a Viewer or Commenter role only have read-only access to IDP features, and cannot upload documents, manage extraction fields, or publish document extraction templates.

Feature	Viewer/Commenter	Editor/Project Admin	Super-user
View IDP application
View document extraction
View documents
View extraction fields/prompts
View validate extraction
Create/edit/delete IDP application
Create/edit/delete document extraction
Upload/delete documents
Add/edit/delete extraction fields/prompts
Extract data
Save as test case
Validate extraction (test documents)
Publish template
View versions
Manage versions (edit, restore, delete)

Key: Can access Full access | Cannot access Read-only access

Validation status

During validation, a validation status is shown for extraction fields to indicate the accuracy of the extracted data.

Icon	Status	Description
	Pass	The document validation passed with accurate and expected results.
	Caution	A test case is missing for comparison. Click Save test case to create a test case for this field.
	Fail	The validation results do not match the expected output for the document. Click Review document to investigate and resolve.

Example

The following example shows the results of a partially successful extraction against three documents.

The expanded contract_start_date field shows that each document returned different validation results.

The first document passed the validation, with the Extracted value matching the Expected test case output.
The second document could not be validated as a test case was not found for comparison. Click Save test case to create a test case for the document.
The third document failed validation as the Extracted value did not match the Expected test case output. Click Review document to open the document again and check the prompt for this field.

Technical architecture​

Document storage​

Document file formats​

Document language support​

Extraction field data types​

Supported data types​

Extraction models​

Table data extraction​

Default JSON extraction format​

Example JSON output​

CSV extraction​

Example CSV output​

Customize table data extraction​

Access rights and permissions​

Validation status​

Example​

Technical architecture

Document storage

Document file formats

Document language support

Extraction field data types

Supported data types

Extraction models

Table data extraction

Default JSON extraction format

Example JSON output

CSV extraction

Example CSV output

Customize table data extraction

Access rights and permissions

Validation status

Example