11-03-2021 дата публикации
Номер: US20210073532A1
A document extraction system executed by a processor, may process documents using manual and automated systems. The document extraction system may efficiently route tasks to the manual and automated systems based on a predicted probability that the results generated by the automated system meet some baseline level of accuracy. To increase document processing speed, documents having a high likelihood of accurate automated processing may be routed to an automated system. To ensure a baseline level of accuracy, documents having a smaller likelihood of accurate automated processing may be routed to a manual system. 1. A method of improving document processing , comprising:acquiring text data from a plurality of documents;generating, by one or more base models included in a machine learning (ML) system, a document type and a set of extraction predictions for each document included in the plurality of documents, the set of extraction predictions including a plurality of text values and a plurality of labels describing the plurality of text values;training a meta model included in the ML system using a training dataset that includes the document type and the extraction predictions for each document included in the plurality of documents;acquiring input text data from a new document;generating, by the one or more base models, a set of extraction predictions for the new document;generating, by the meta model, an extraction confidence prediction for the set of extraction predictions for the new document;evaluating the accuracy of the set of extraction predictions for the new document by comparing the extraction confidence prediction to a confidence threshold; andin response to determining the extraction confidence prediction is above a confidence threshold, arranging the new document to be automatically extracted using the set of extraction predictions for the new document.2. The method of claim 1 , wherein the training further comprises:generating a ground truth dataset ...
Подробнее