Using AWS Comprehend to generate AI-powered insights from data
Extracting insights and specific information from mixed & unstructured data with AWS Comprehend AI
Organizations rely on data to illuminate their path in a complex world. As customer demands and market conditions can shift so rapidly, it’s essential that you can get the true value from your data.
However, many data sources are hard to use because they are in physical formats like paper documents or stored in mixed or unstructured sources like transcripts or emails.
In a previous blog we showed you how to use AWS AI services like Textract to automatically extract data from documents, such as scans or images. In this blog we examine the value of AWS’s other AI services for extracting insights and specific data from unstructured data sources.
AI data analysis and discovery from unstructured data
If you have a large amount of unstructured data, it's reasonable to assume that you don’t already know what kind of information it contains. This is especially true for data from multiple sources.
You can’t use conventional search tools for this because you don’t necessarily know what you’re looking for.
This is why you need an AI-powered tool that can help discover specific types of information within a large pool of data: AWS Comprehend.
AWS Comprehend identifies the key information contained in unstructured data sources, using an Analysis Job to squeeze all the goodies from each document. The Analysis Job can do several things, depending on how you configure it. It can identify events, key phrases, entities, and sentiment from any raw text.
The best first step is to use the DetectEntities API, which will discover all the entities in your dataset with a confidence score. This means it can pick out things like events, points in time, locations, organizations, people, products, quantities, and titles. Once you have a handle on what you’re working with, you can customize Comprehend further in the Comprehend console.
Both the ingested data and the output are stored in S3 locations you specify. This means that you need to make sure all data is accessible via S3 storage.
Examples of how to extract important data with AI using AWS Comprehend
Get insights per topic or project
Used as an advanced search and insights tool, Comprehend can find specific documents relating to a project or topic. It can also analyze sentiments for specific entities like colleagues, customers, or products.
Identify and redact personal information
When instructed to, Comprehend can identify and redact sensitive personal information from documents. This means you can clean up data and minimize the risk of data leaks by applying a need-to-know policy.
Topic modeling
Want to know about the most relevant topics being discussed today? Comprehend can trawl through documents such as scientific/industry journals, and give you a clear overview of the most significant topics and related keywords, people, or organizations.
Safer interactions
AWS Comprehend has a useful safety classification feature which can detect unsafe or triggering content within documents. This can be used to ensure that LLMs and chatbots are trained on sanitized content.
FAQ discovery and creation
There’s a huge potential for Comprehend in customer service and customer satisfaction. Perhaps the simplest example is using it to identify the most common topics from customer service transcripts, and using this information to create new FAQs or to redesign contact center menus and flows.
Going deeper: AI-powered insights from data lakes and other sources
So, AWS Comprehend is already a powerful way of extracting key information and simple insights from unstructured and mixed data sources. Data lakes often contain a wide variety of data types and formats, so being able to identify what’s down there is already a huge leap towards getting more value from it.
However, by using Comprehend as a microservice in combination with other tooling, you can do even more. In this setup, Comprehend pre-digests the raw data so it can be used in more sophisticated processes like generating deep insights.
For example, you can connect it to CloudWatch to search and index your logs and metrics, or use it with OpenSearch, Kendra, and AWS Lambda to index large text-based data.
But what else can you do by combining different AWS Services with Comprehend?
Gathering insights from raw web data
Using a web crawler to discover and gather webpage text and links, You can create insights based on open-source data such as webpages. It will require some help from AWS Lambda to process the raw data into simple text, stored on S3, but then you have a massive data source that’s ready to crack open and deconstruct. Comprehend can then analyze this open-source data to track important topics, new discoveries or innovations, sentiments, and keywords. It can also model the relationships between these, which can be visualized in QuickSight. Using a delta function, you can track changes in sentiment or trends.
Making sense of siloed medical data
Medical data has incredible value, but this is rarely easy to extract or turn into meaningful insights. Important information is trapped in doctor’s letters, text summaries of appointments, and lab results with varying annotation schema. To make sense of data this complex, you may want to use a knowledge graph, which can be generated in AWS Neptune. This can be done by using Comprehend to generate a JSON output that can be used to create the underlying relational database. Using SageMaker, you can uncover all the variables before designing your data model.
Sentiment analysis from customer service transcripts
Sentiment analysis is a native feature of AWS Comprehend, so you don’t need much extra tooling here unless you want to visualize the output or reprocess it. With a properly configured Analysis Job, Comprehend can analyze any text for sentiment and connect it to specific entities, events, or topics. This could be used in a wide variety of situations, such as improving product design based on customer service transcripts and emails, or to understand common reasons for returns or negative reviews.
Querying unstructured data with Amazon Q
Amazon Q is a hot new product in the AI services family. It’s a powerful conversational AI that’s AWS’s answer to Microsoft Copilot or Google Duet. As this service is only just in a preview phase, we can expect many exciting developments over the coming years.
The first service to be integrated with Amazon Q is QuickSight, which means that you can use natural language to instruct the conversational AI to query your data and visualize insights. For unstructured data, this will require some additional help such as Comprehend and SageMaker to analyze and pre-process raw text and build an appropriate data model. However, what it means is that you can create an easy-to-use interface that allows non-technical users to produce custom insights from mixed data sources.
What’s next?
Hopefully these examples will give you some idea of the scope of what is possible with AWS AI Services like Comprehend, Kendra, and Amazon Q. But this is really just scratching the surface of what might be possible.
Recent developments in AI are fantastic news for organizations that want to get to grips with their data. Current tooling already allows a tremendous breadth of possibilities for harnessing your data sources to help navigate a complex, competitive, and fast-moving world. And, best of all, it’s relatively easy to get started.
Amazon has made it incredibly easy to start using AI-powered processes by making sure that tools like AWS Comprehend are pre-trained, out of the box. This means you can immediately start to get results from your earliest efforts, and then improve.