Skip to main content
Version: 8.8 (unreleased)

Vector database connector

The vector database connector allows embedding, storing, and retrieving Large Language Model (LLM) embeddings. This enables building AI-based solutions for your organizations, such as context document search, long-term LLM memory, and agentic AI interaction.

note

The vector database connector uses the LangChain4J library. Data models and possible implementations are limited to the latest stable released LangChain4j library.

Prerequisites

Before using the vector database connector, ensure you understand the concept of LLM embeddings.

To start using the vector database connector, ensure you have access to a supported LLM embeddings API to convert document content into vectorized embedding form. You will also need to have write access to a supported database.

Create a connector task

You can apply a connector to a task or event via the append menu. For example:

  • From the canvas: Select an element and click the Change element icon to change an existing element, or use the append feature to add a new element to the diagram.
  • From the properties panel: Navigate to the Template section and click Select.
  • From the side palette: Click the Create element icon.

change element

After you have applied a connector to your element, follow the configuration steps or see using connectors to learn more.

Operations

The embed document operation performs the following steps:

  1. Consume a document.
  2. Parse the document depending on a file format (optionally split into text chunks).
  3. Convert chunks into a vector form with LLM help.
  4. Store produced vectors in a vector database.

To perform this operation, enter the following:

As a result of this operation, you will get an array of created embedding chunk IDs, for example ["d599ec62-fe51-4a91-bbf0-26e1241f9079", "a1fad021-5148-42b4-aa02-7de9d590e69c"].

Embedding models

Amazon Bedrock

The vector database connector currently supports only Amazon Titan V1/V2 models. Review the official Amazon documentation to understand how to choose request parameters.

The vector database connector uses LangChain4j implementation.

Vector stores

The vector database connector can use Elasticsearch as a vector store. The Elasticsearch version must be 8+.

Enter the following parameters:

  • Base URL: The Elasticsearch base URL, including protocol, for example https://host:port.
  • Username: For the Elasticsearch user that has read/write access.
  • Password: For the Elasticsearch user that has read/write access.
  • Index name: Name of the index where you wish to store embeddings.
    • When embedding: If index is not present, the connector will create a new one.
    • When retrieving: If the index is absent, the connector will raise an error.

Embedding document configuration

Document source

The Document source can be either Plain text or a Camunda document.

Plain text can be useful when you deal with small size data that can fit into a text field or a process instance variable. Input will be handled as a regular UTF-8 text.

note

A FEEL string conversion function might be useful if you have JSON input.

The Camunda document might be useful when you deal with larger document pipelines that come from webhook or user tasks. Input documents will be parsed with Apache Tika, so files can be of any Apache Tika-supported formats.

Splitting

Splitting is an action of breaking large documents into smaller pieces. It can be either recursive or no splitting at all. Seek guidance from your local data scientist to determine if you require splitting.

Learn more about splitting in the LangChain4j documentation.