AI could help get government records on paper and online
Despite years of investing in better storage and analytics, many organizations, especially within government, are still struggling to use their data. Too often, agencies have an abundance of “dark data” – data that is undiscovered, underused, or otherwise untapped. Even though these organizations have fully embraced digitization, for example by converting all paper forms to electronic forms, one of the challenges for government agencies is that much of their valuable data is trapped in documents, such as documents. contracts, invoices, policies and meetings. minutes, and they have no efficient way to get it out and use it.
To be clear, this problem is not entirely new. Government agencies have long struggled with how to use the unstructured data found in most documents. There are two general ways organizations can solve this problem, and both have serious shortcomings.
One option is to manually extract data from traditional electronic documents, such as PDF files, Word files, or HTML documents. For structured data, such as the amount owed on an invoice, this can be simple and automated. But for unstructured data, it’s less straightforward. For example, an invoice can include a description of the services provided. To process this information, project managers should review and verify whether the services match work on an approved contract and describe the work that was actually performed. This likely involves reviewing several other documents, all of which also involve unstructured data, and may require the specialist skills of additional officials, such as lawyers or procurement officials.
The other option is to use structured documents – electronic documents where the various elements of the document have meaningful labels. The most common method would be to use a standard like XML. In XML, the creator of a document can use a schema that defines the elements of the document, the data types of those elements, and any default values or attributes of those elements. Unfortunately, as many software engineers have discovered, creating structured documents is easier said than done. The process can be tedious and technical, and changes to schematics need to be closely monitored and validated or nothing can work.
However, artificial intelligence is creating a new option for organizations to make better use of data in their documents. Using natural language processing, deep learning, and other methods, AI can help recognize and categorize data in documents, and then tag that data to create a structured document. For example, NASA and the National Science Foundation have teamed up with AI startup Docugami to explore how to use its technology to automatically scrape, structure, and categorize documents and their elements.
Again, the challenge is not just to extract data from documents, but to obtain data and metadata from them to create meaning so that the information can be understood in context. Indeed, Tim Berners-Lee (the inventor of the World Wide Web) and others promoted a vision of linked data called the “Semantic Web” more than two decades ago, and the World Wide Web Consortium (W3C) has promoted various semantic encoding standards. with data, such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL). However, much of this vision for the web has not been realized for the same reasons organizations have struggled to switch to structured data.
Using AI could help solve this problem. For example, the founder of Docugami notes that his company is focused on understanding not only “big data” but also “small data”. So, for example, if analysts look up the word “penicillin” in thousands of unstructured medical documents, they are able to distinguish between cases where the drug is listed with reference to an allergy and others where it is. listed as prescription. For government agencies, this also opens up new possibilities, as more semantic data could help an agency not only to better manage a wide variety of documents, such as invoices, contracts, and proposals, but also potentially use the technology. to answer questions using the data contained in them – a search engine on steroids.
Many government agencies still have a lot of work ahead of them to fully digitize before these new technologies are likely to be of great value to them. But these tools show the potential of the technology and the possibilities that will emerge as AI continues to make inroads into government.