Skip to main content
Version: 8.8 (unreleased)

Vector database connector

The vector database connector allows embedding, storing, and retrieving Large Language Model (LLM) embeddings. This enables building AI-based solutions for your organizations, such as context document search, long-term LLM memory, and agentic AI interaction.

note

The vector database connector uses the LangChain4j library. Data models and possible implementations are limited to the latest stable released LangChain4j library.

Prerequisites

Before using the vector database connector, ensure you understand the concept of LLM embeddings.

To start using the vector database connector, ensure you have access to a supported LLM embeddings API to convert document content into vectorized embedding form. You will also need to have write access to a supported database.

Create a connector task

You can apply a connector to a task or event via the append menu. For example:

  • From the canvas: Select an element and click the Change element icon to change an existing element, or use the append feature to add a new element to the diagram.
  • From the properties panel: Navigate to the Template section and click Select.
  • From the side palette: Click the Create element icon.

change element

After you have applied a connector to your element, follow the configuration steps or see using connectors to learn more.

Operations

The embed document operation performs the following steps:

  1. Consume a document.
  2. Parse the document depending on a file format (optionally split into text chunks).
  3. Convert chunks into a vector form with LLM help.
  4. Store produced vectors in a vector database.

To perform this operation, enter the following:

As a result of this operation, you will get an array of created embedding chunk IDs, for example ["d599ec62-fe51-4a91-bbf0-26e1241f9079", "a1fad021-5148-42b4-aa02-7de9d590e69c"].

Updating embedded documents

Each time you embed a document, the connector generates a new set of chunks and stores them in the vector database.
If the document was previously embedded, this creates duplicate chunks.

To prevent duplicates:

  1. Delete the existing chunks before re-embedding the document.
  2. Use the chunk IDs returned by the previous embedding operation.
  3. If the embedded document is from Camunda, use the filename metadata field to find the chunk IDs.
  4. Follow your vector store’s documentation for deleting chunks.

Embedding models

The vector database connector supports Amazon Titan V1 and V2 models.
You can also specify any custom model that supports text embedding and is available in your Amazon Bedrock account.

To use Amazon Bedrock as an embedding model, provide:

  • Access key – Access key for a user with permissions for the Amazon Bedrock InvokeModel action.
  • Secret key – Secret key for the user associated with the provided access key.
  • Region – AWS region where the model is hosted (for example, us-east-1). See AWS model region support for details.
  • Model name – One of:
    • Amazon Titan V1amazon.titan-embed-text-v1
    • Amazon Titan V2amazon.titan-embed-text-v2:0
    • Custom model – Name of your custom Amazon Bedrock embedding model.

When using Amazon Titan V2, you can also specify:

  • Embedding dimensions – Number of dimensions for the embedding vector.
  • Normalize – Whether to normalize the embedding vector. See AWS blog for more details.

For all models, the following parameter is optional:

  • Max retries – Maximum number of retries for the embedding request in case of failure.

Vector stores

Enter the following parameters:

  • Access key and Secret key – Enter AWS IAM credentials for the user that has read/write access.
  • Server URL – An Amazon OpenSearch URL without protocol, for example my-opensearch.aws.com:port.
  • Region – Region of the Amazon OpenSearch instance.
  • Index name – Name of the index where you wish to store embeddings.
    • When embedding: If the index is not present, the connector will create a new one.
    • When retrieving: If the index is absent, the connector will raise an error.

Embedding document configuration

Document source

The Document source can be either Plain text or a Camunda document.

Plain text can be useful when you deal with small size data that can fit into a text field or a process instance variable. Input will be handled as a regular UTF-8 text.

note

A FEEL string conversion function might be useful if you have JSON input.

The Camunda document might be useful when you deal with larger document pipelines that come from webhook or user tasks. Input documents will be parsed with Apache Tika, so files can be of any Apache Tika-supported formats.

Splitting

Splitting is an action of breaking large documents into smaller pieces. It can be either recursive or no splitting at all. Seek guidance from your local data scientist to determine if you require splitting.

Learn more about splitting in the LangChain4j documentation.