Knowledge Mining – Explore What’s Possible

Does your organization own and produce documents that contain valuable information? Are you able to easily search and analyze the data they contain and extract value from them, or do they instead end up hidden away somewhere on a drive or in an archive, never to be seen again? If so, you aren’t alone.

According to a report from Harvard Business Review, 82% of organizations experience significant difficulty in exploring and understanding content in a timely manner.[i] That’s no surprise, since knowledge workers spend roughly a quarter of their time searching for documents and another quarter creating documents.[ii]

The challenge in getting full value from all of this effort is compounded by the fact that 80% of data exists in unstructured formats (PDFs, spreadsheets, images, audio files, etc.),[iii] spread across multiple systems, and cannot therefore be easily captured by most databases or analytics platforms.

Considering the number of new documents being created daily, it’s only getting harder to search through them. However, in the aggregate, the value of the insights that they might provide only grows. If one document can tell you something worthwhile, then it’s likely that 500, 1,000, 100,000 documents can tell you much more – about trends, changing market forces, customer needs, etc. They might even help you discover untapped opportunities or organizational blind spots that are holding you back.

Just how might the insights buried within thousands of documents be unearthed? The answer is Knowledge Mining.

What is Knowledge Mining?

Knowledge Mining is primarily about extracting value from existing information. This “value” is subjective and varies by organization, but in a general sense, the goal is to produce something useful, profitable or beneficial. This could mean generating efficiencies or highlighting organizational weaknesses, cutting waste or identifying new opportunities.

Knowledge Mining has been described as the “next-wave of AI-led transformation”[iv], and uses a combination of AI services to provide understanding about unstructured, semi-structured and structured data. This type of data is usually found in Word or PDF documents, but also in audio or video recordings, photos, scanned forms and archival documents. If it isn’t a spreadsheet, a database or a table with rows and columns, it’s what we’d typically consider “unstructured”.

Unlike the first wave of AI, which focused on narrow applications of machine learning to address a specific task by training a single model, Knowledge Mining leans on pre-trained AI services to discover patterns in unstructured data and extract valuable meaning from them. Developers can still apply their own custom AI models, but they no longer need to start from scratch, piecing together various technologies, to provide a deep understanding of your content.

Step 1 – Extraction and Indexing

The first step in Knowledge Mining involves pulling information from the documents and putting it into a search index. This index is fronted by a polished, responsive UI and is what we’d consider the “first win”. This search often includes facets based on entities within the documents, like people, places and organizations. Custom facets might be created as well. Some organizations stop here since their main goal was making the documents searchable.

Step 2 – Enrichment

The next step is to take the information we’ve surfaced and enrich it by cross-referencing it against data within the broader document set or from other systems. A document’s data can be fed to trained models and used to populate a custom metadata schema. This additional information and the broader context it reflects can make for a very powerful, increasingly customized index. This step can be repeated multiple times, each one adding something new to the data mix.

Steps 3 + 4 – Iteration and Elevation

At this point, we have a friendly, faceted search UI on top of a fully populated index. We can continue to “Iterate” on this tool, improving the index by adding support for more document types, pulling additional information from the underlying documents or by further enriching that information. The search UI can also be extended to include additional facets and features.

“Elevation” involves taking the information we’re pulling and feeding it to a parallel datastore (called a “Knowledge Store”) that can be used by data processes outside our initial search. This makes our enriched data useful to custom applications, dashboards (e.g. Power BI) or highly interactive user experiences and rich visualizations.

“The basis for some of the greatest future innovations may be locked, unseen, in spreadsheets, PDFs, presentations and documents.”[v]

Explore the Possibilities of Knowledge Mining

Your documents typically follow a lifecycle of usefulness. From creation to disposal, they serve a purpose. With Knowledge Mining, your documents can find a new purpose and extend their ROI window.

Inception to Disposal graph

Before Knowledge Mining
Document life is finite.

Useful life span of data graph

After Knowledge Mining
Document life is extended indefinitely.

Raw data (think thousands of files on a shared drive) isn’t enough. You need to identify and extract value from that data. Start by asking yourself the following questions:

  • What is the meaning of the information that we have?
  • What can it tell us?
  • What can it tell us in the aggregate?
  • What can it tell us in the aggregate, over time?
  • What can it tell us in the aggregate, across space and circumstances?

By applying Knowledge Mining, you can condense, transform and refine unstructured information into something more valuable.

Do You Have a Knowledge Mining Opportunity?

Ask yourself these questions. Do you have:

  • Answers that are locked inside your documents?
  • Human workflows that involve inefficient content (re-)creation?
  • Strategic or tactical decisions that would benefit from hindsight, insight and foresight?
  • Onerous human processes for retrieving historical information from documents?
  • Regulatory processes that require the creation and storage of large amounts of document files?
  • Human bottlenecks to gaining insights from your unstructured data?
  • Critical human intuitions/one-of-a-kind people with unique organizational knowledge that is reflected in documents that are now too numerous to read, understand and memorize?
  • Decisions that are re-made, over and over that are reflected inside of documents?

If the answer to one of these is “yes”, then perhaps Knowledge Mining should play a role in your broader applied data strategy. It might give a new purpose to idle documents and, through the valuable insights locked within them, help your organization achieve important business goals.

MNP is a go-to Microsoft partner for Knowledge Mining using Azure Cognitive Search. Our applied data team can help you unlock hidden value from a wide range of structured and unstructured data and documents. We will gladly sit down with you to identify Knowledge Mining opportunities and determine if you are sitting on an untapped goldmine.

To learn more about MNP’s Knowledge Mining solutions, contact us today.

[i] https://azure.microsoft.com/en-us/resources/knowledge-mining-the-next-wave-of-artificial-intelligence-led-transformation/

[ii] https://www.slideshare.net/PingElizabeth/the-hidden-costs-of-information-work-2005-idc-report

[iii] https://www.forbes.com/sites/traceywelsonrossman/2019/01/28/i-see-data-forge-ai-mines-the-worlds-unstructured-data/#1a0b0b6a1067

[iv] https://azure.microsoft.com/en-us/resources/knowledge-mining-the-next-wave-of-artificial-intelligence-led-transformation/

[v] https://azure.microsoft.com/en-us/resources/knowledge-mining-the-next-wave-of-artificial-intelligence-led-transformation/

Taylor Bastien

Taylor is a Solutions Architect working closely with clients to bring their true needs into focus and forming the right team of professionals to deliver quality solutions. He takes a strategic view of each client’s challenges, helping them to make informed technology investments. When he’s not on the clock, he enjoys staying fit, learning languages, and spending time with his family.