Large language models (LLMs) such as OpenAI’s GPT-4 are the building blocks for an increasing number of AI applications. But some enterprises have been reluctant to adopt them, owing to their inability to access first-party and proprietary data.
It’s not an easy problem to solve, necessarily — considering that sort of data tends to sit behind firewalls and comes in formats that can’t be tapped by LLMs. But a relatively new startup, Unstructured.io, is trying to remove the roadblocks with a platform that extracts and stages enterprise data in a way that LLMs can understand and leverage.
Brian Raymond, Matt Robinson and Crag Wolfe co-founded Unstructured in 2022 after working together at Primer AI, which was focused on building and deploying natural language processing (NLP) solutions for business customers.
“While at Primer, time and again, we encountered a bottleneck ingesting and pre-processing raw customer files containing NLP data (e.g., PDFs, emails, PPTX, XML,