Azure, Power Platform and Everything cloud

Generative AI is already well known to empower chatbots but it’s also very efficient extracting info from media like documents, video and audio. Much more than an OCR, you can go from extracting fields from a document to compare data across multiple documents.

Content Understanding leverages Generative AI for document data extraction—eliminating the need for LLMOps and the complexities of training LLM models.

To play with the preview version of content understanding, you can find it in the Azure AI Foundry right here:

The documentation is right here: https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/

If you work with Document Intelligence you can spot a banner on the top of the Intelligence Studio:

Generative AI unlocks long-awaited enhancements for custom models designed for document data extraction, significantly reducing the need for retraining when document layouts change.

So, what do we need to run Content Understanding? Running the “project create” screen will show:

The project creation wizard screen will generate these services:

The setup is similar to Document Intelligence except for the key vault. Never less the price is much cheaper than Document Intelligence Custom extraction.

You can check the cost here: https://azure.microsoft.com/en-us/pricing/details/content-understanding/?msockid=12448e1d0b2363ec02339bd50a3d62ee

So, what Content Understand brings to table that Document Intelligence lacks?

Document Intelligence uses Neural networks, what is an improvement over the old layout model but still struggles with harsh layout variation and label variation.

Azure Document Intelligence (DI) is great for extracting structured data from unstructured documents in most scenarios. However, as in our case, dealing with tax documents that have thousands of different templates makes it challenging for the DI to capture specific tax information across different tax forms.

More here: https://techcommunity.microsoft.com/blog/azurearchitectureblog/complex-data-extraction-using-document-intelligence-and-rag/4267718

The dependency on document structure can be observed during training of the model. Document Intelligence will recognize the text, and we link test to a label. The label on this case will be a data field.

To extract data using Content Understanding you can start with a “schema”. The Schema is mostly a dataset where the field description works like a prompt.

  1. Define your schema by specifying the fields you want to extract from the input files. Choose clear and simple field names. Use field descriptions to provide explanations, exceptions, rules of thumb, and other details to clarify the desired behavior.
  2. For each field, indicate the value type of the desired output. Besides basic types like strings, dates, and numbers, you can define more complex structures such as tables (repeated items with subfields) and fixed tables (groups of fields with common subfields).

Think of field description like a prompt where you specify where the field is in the document.

Let’s extract the “Professional Experience” section of a Curriculum Vitae:

CVs are an example we can all relate to. We know each section by heart, and regardless of the format, the basic idea remains the same: to make professional experience easy to find and crystal clear to the reader. While we try to add some flair, the core purpose is always to present information effectively. No matter the layout, if we were to create a dataset representing professional experience, it would look like this:

Field NameField DescriptionValue Type
PROFESSIONAL_EXPERIENCEExtract the ‘Professional Experience’ section from the CV. Ensure the output includes all relevant details such as job title, description, start and end dates, location, and company name for each position heldtable
TITLEExtract the job title or role title for each professional experience entry listed in the ‘Professional Experience’ section of the curriculum vitae.string
DESCRIPTIONExtract the job descriptions from the ‘Professional Experience’ section of the CV. The description should include details about the professional experience, outlining key responsibilities, accomplishments, and any relevant contributions made in each role. This information is found under the ‘Professional Experience’ section of the curriculum vitae, providing insights into the candidate’s career history.string
START_DATEExtract the start dates for each job experience in the CV, ensuring proper formatting.string
END_DATEExtract the end dates for each job experience in the CV, maintaining consistency in formattingstring
LOCATIONExtract the work locations (city and country) specified for each role in the CV.string
COMPANYExtract the company names associated with each job experience listed in the CV.string

The idea is to tell the model about the data and where to find it.

More here: https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/concepts/best-practices#use-field-descriptions-to-guide-output

Another major improvement over Document Intelligence, you can export that schema and commit it into a code repository

{
"scenario": "document",
"fieldSchema": {
"fields": {
"PROFESSIONAL_EXPERIENCE": {
"type": "array",
"items": {
"type": "object",
"properties": {
"TITLE": {
"type": "string",
"method": "extract",
"description": "Extract the job titles from the 'Professional Experience' section of the CV."
},
"DESCRIPTION": {
"type": "string",
"method": "extract",
"description": "Extract the job descriptions from the 'Professional Experience' section of the CV. The description should include details about the professional experience, outlining key responsibilities, accomplishments, and any relevant contributions made in each role. This information is found under the 'Professional Experience' section of the curriculum vitae, providing insights into the candidate's career history."
},
"START_DATE": {
"type": "string",
"method": "extract",
"description": "Extract the start dates for each job experience in the CV, ensuring proper formatting."
},
"END_DATE": {
"type": "string",
"method": "extract",
"description": "Extract the end dates for each job experience in the CV, maintaining consistency in formatting"
},
"LOCATION": {
"type": "string",
"method": "extract",
"description": "Extract the work locations (city and country) specified for each role in the CV."
},
"COMPANY": {
"type": "string",
"method": "extract",
"description": "Extract the company names associated with each job experience listed in the CV."
}
},
"method": "extract"
},
"method": "generate",
"description": "Extract the 'Professional Experience' section from the CV. Ensure the output includes all relevant details such as job title, description, start and end dates, location, and company name for each position held"
}
},
"definitions": {}
}
}

This schema has successfully extracted the data from 10 different CVs regardless the layout variation

This CV has some differences from the one used for training. The confidence score is low but remember this is still not a GA grade service. The important for now is to check accuracy using something like “Confusion Matrix

Now, Trying the same in Document Intelligence has not performed as well:

Of course this can be remediated with document intelligence composite models, but every time a new layout challenges the model, we need to train for the differences while Gen AI will extract the data and search for the field.

Document Intelligence can also add cost on maintaining models for cases when there is constant variation.

When Content Understanding moves into GA, I’ll be exploring this for a more detailed post. For now, is good to see we have a product that brings the ability to extract content from documents using Gen AI without the complexity of working with LLM models.

2 responses to “The new Content Understanding and how it compares to Document Intelligence”

  1. Great write-up Bruno! I appreciate you taking the time to share these comparisons between the two technologies.

  2. Well written and interesting – thanks!

Leave a Reply

Discover more from Bruno Lucas Azure Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading