Textual content analytics on AWS: implementing an information lake structure with OpenSearch

Textual content information is a typical kind of unstructured information present in analytics. It’s typically saved with out a predefined format and will be exhausting to acquire and course of.

For instance, internet pages comprise textual content information that information analysts accumulate by way of internet scraping and pre-process utilizing lowercasing, stemming, and lemmatization. After pre-processing, the cleaned textual content is analyzed by information scientists and analysts to extract related insights.

This weblog publish covers the way to successfully deal with textual content information utilizing an information lake structure on Amazon Net Providers (AWS). We clarify how information groups can independently extract insights from textual content paperwork utilizing OpenSearch because the central search and analytics service. We additionally talk about the way to index and replace textual content information in OpenSearch and evolve the structure in the direction of automation.

Structure overview

This structure outlines using AWS companies to create an end-to-end textual content analytics resolution, ranging from the information assortment and ingestion as much as the information consumption in OpenSearch (Determine 1).

Determine 1. Knowledge lake structure with OpenSearch

Acquire information from numerous sources, similar to SaaS functions, edge units, logs, streaming media, and social networks.
Use instruments like AWS Database Migration Service (AWS DMS), AWS DataSync, Amazon Kinesis, Amazon Managed Streaming for Apache Kafka (Amazon MSK), AWS IoT Core, and Amazon AppFlow to ingest the information into the AWS information lake, relying on the information supply kind.
Retailer the ingested information within the uncooked zone of…

Learn extra right here

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Trending Tags

Trending Tags

Textual content analytics on AWS: implementing an information lake structure with OpenSearch

Structure overview

Related

Welcome Back!

Retrieve your password

What Are Cookies