Version: 8.8 (unreleased)

IDP concepts

When using IDP it is helpful to understand the following key concepts and terms.

Structured and unstructured documents

Documents are typically classified as containing either structured or unstructured data.

Structured documents

Structured documents have a predefined, consistent layout and fixed format, such as rows and columns in a database or spreadsheet, or fields in a standardized form.

Data in a structured document has a fixed location. For example, the ID, date, and company name are always located in the same place.

Example structured documents include:

Invoices/ customer records
Forms
Identity documents

Unstructured documents

Unstructured documents have a less defined, free-form layout that can be more difficult to extract structured data from, such as free-text paragraphs where key information is located in unpredictable places.

IDP uses an LLM foundation model to extract data from this document type.

Example unstructured documents include:

Emails
Reports
Memos

Document classification

Document classification is performed as part of document automation.

Documents are analyzed, classified, and assigned to the relevant document extraction template, based on the document content.
Classification ensures that documents processed through IDP are organized into the correct type, so that extracted data is assigned/mapped to the correct property.
Classification accuracy is improved with a well-defined taxonomy (set of extraction fields) and a set of example documents that accurately represents each type of document you want to process.

Extraction model/Large Language Models (LLM)

LLM Foundation models are large-scale, pre-trained AI models that can be adapted for various document processing tasks without extensive retraining.

For IDP, these models serve as a powerful base for extracting, understanding, and processing data from diverse document types. Algorithms are used to learn document patterns and to improve data extraction accuracy over time.
IDP allows you to work with and test different extraction models until you find the model that best suits your budget and accuracy requirements.
See extraction models for a list of currently supported LLM extraction models.

Extraction fields

Extraction fields are the data fields you want to extract from a document, such as an invoice ID, date, customer name, and so on.

You must add a separate field for each piece of information you want to extract from a document.
For example, for an invoice, add a separate field for the invoice ID, date, customer name, invoice amount, and so on.

info

To learn more about extraction field data types, see extraction field data types.

Structured and unstructured documents​

Structured documents​

Unstructured documents​

Document classification​

Extraction model/Large Language Models (LLM)​

Extraction fields​