Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 2100. Отображено 100.
20-06-2013 дата публикации

Systems And Methods For Locating Characters On A Document

Номер: US20130156288A1
Принадлежит: DE LA RUE NORTH AMERICA INC.

Systems and methods for locating characters on a document are provided. In one embodiment, a method for locating characters on a document includes receiving an image of a document, identifying a set of character candidate forms in the image based on image intensity data, identifying a set of characters from the set of character candidate forms based on spatial characteristics of the set of character candidate forms, and outputting a location of the set of characters. 1. A method for locating characters on a document , the method comprising:receiving an image of a document;identifying a set of character candidate forms in the image based on image intensity data;identifying a set of characters from the set of character candidate forms based on spatial characteristics of the set of character candidate forms; andoutputting a location of the set of characters.2. The method of claim 1 , further comprising:converting the image of the document into a composite image;wherein identifying the set of character candidate forms in the image based on image intensity data comprises identifying the set of character candidate forms in the composite image based on image intensity data.3. The method of claim 2 , wherein the composite image is a grayscale image.4. The method of claim 1 , further comprising:cropping the image to a character analysis region;wherein identifying the set of character candidate forms in the image based on image intensity data comprises identifying the set of character candidate forms in the character analysis region based on image intensity data.5. The method of claim 1 , wherein identifying the set of character candidate forms in the image based on image intensity data comprises identifying the set of character candidate forms in the image using a set of threshold parameters.6. The method of claim 5 , wherein the image intensity data comprises grayscale intensity values;wherein the set of threshold parameters comprises at least one of a minimum grayscale ...

Подробнее
27-06-2013 дата публикации

Detecting Separator Lines in a Web Page

Номер: US20130163873A1
Принадлежит:

A system and method of detecting separator lines in a web page may include determining coordinates of visible web elements on a web page, generating an edge image of the web page based on the coordinates of the web elements, filtering edges belonging to non-separator line elements within the edge image, detecting horizontal lines within the edge image, detecting vertical lines within the edge image, and filtering short lines within the edge image. A system for detecting separator lines in a web page may include a memory device, and a processor communicatively coupled to the memory, in which the processor determines coordinates of visible web elements on a web page, generates an edge image of the web page based on the coordinates of the web elements, filters edges belonging to non-separator line elements within the edge image, detects horizontal lines within the edge image, detects vertical lines within the edge image, and filters short lines within the edge image. 1. A method performed by a physical computing device comprising at least one processor for detecting separator lines in a web page comprising:determining, with the computing system, coordinates of visible web elements on a web page;generating, with the computing system, an edge image of the web page based on the coordinates of the web elements;filtering, with the computing system, edges belonging to non-separator line elements within the edge image;detecting, with the computing system, horizontal lines within the edge image;detecting, with the computing system, vertical lines within the edge image; andfiltering, with the computing system short lines within the edge image to provide an indication of the separator lines.2. The method of claim 1 , in which determining coordinates of visible web elements on a web page comprises:querying each of a plurality of text nodes of a data structure representing the web page;wrapping each of the plurality of ext nodes in a pair of mark-up language tags;obtaining the co- ...

Подробнее
03-10-2013 дата публикации

Digitizing apparatus

Номер: US20130258419A1
Автор: Kozo Shimazu
Принадлежит: Kyocera Document Solutions Inc

A page numbering unit assigns an electronic document page number to each of image data of a plurality of pages stored in a storage unit. An image analysis unit extracts a page number described in each of the image data of the plurality of pages stored in the storage unit. The image analysis unit identifies image data that describes page numbers for searching for other pages, from among the image data of the plurality of pages stored in the storage unit. A page number comparator compares the assigned electronic document page number with the extracted page number, for each of the image data of the plurality of pages. A page number conversion unit converts the page numbers for searching into the corresponding electronic document page numbers, based on a result of comparison by the page number comparator.

Подробнее
26-12-2013 дата публикации

DOCUMENT UNBENDING AND RECOLORING SYSTEMS AND METHODS

Номер: US20130343609A1
Принадлежит: Polyvision Corporation

According to one aspect, a system for processing a document image is disclosed. In an exemplary embodiment, the system includes an edge-detection unit configured to identify an edge of a document from a document image. The system also includes a keystone-correction unit and a flattening unit. The keystone-correction unit is configured to correct keystone distortion in the document image. The flattening unit is configured to flatten content of the document in the document image. 1. A system for processing a document image , comprising:an edge-detection unit configured to identify an edge of a document from the document image, wherein identifying the edge of the document comprises at least one of applying an edge-finding filter to the document image and selecting a ranked potential edge of the document image;a keystone-correction unit configured to correct keystone distortion in the document image; anda flattening unit configured to flatten content of the document in the document image.2. The system of claim 1 , wherein the edge of the document image corresponds to an edge of content of the document in the document image claim 1 , the content comprising at least one of text and objects.3. The system of claim 1 , wherein applying the edge-finding filter to the document image comprises applying a convolution filter to a single channel of pixel color values of the document image.4. The system of claim 1 , wherein a potential edge is ranked based on at least one of:length of the potential edge;smoothness of the potential edge;vertical or horizontal orientation of the potential edge; andintersection of the potential edge with at least one other potential edge.5. The system of claim 1 , wherein correcting the keystone distortion comprises:determining a field of view of an image capture device used to capture the document image;based on the field of view, determining at least one rotation to apply to the document image to render the document flat in the document image; ...

Подробнее
06-03-2014 дата публикации

IMAGE PROCESSING APPARATUS FOR PROCESSING PHOTOGRAPHED IMAGES

Номер: US20140063569A1
Принадлежит: CASIO COMPUTER CO., LTD.

An image processing apparatus of the present invention acquires a photographed image of an object on which a plurality of indicators have been arranged, recognizes an indicator in the photographed image, detects the position of the indicator, judges the state of the object based on the detected position of the indicator, and judges the state of the photographed image based on the detected positional relationship of the plurality of indicators, when the judged state of the object is a predetermined state. 1. An image processing apparatus comprising:an acquisition section which acquires a photographed image of an object on which a plurality of indicators have been arranged;a detection section which recognizes an indicator in the photographed image acquired by the acquisition section and detects a position of the indicator;a first judgment section which judges a state of the object based on the position of the indicator detected by the detection section; anda second judgment section which, when the state of the object judged by the first judgment section is a predetermined state, judges a state of the photographed image based on a positional relationship of the plurality of indicators detected by the detection section.2. The image processing apparatus according to claim 1 , wherein the first judgment section judges a state where a photographing distance or a photographing angle with respect to the object is within a predetermined range claim 1 , as the state of the object claim 1 , andwherein the second judgment section judges a status of geometrical distortion of the photographed image which is partly or wholly distorted, as the state of the photographed image.3. The image processing apparatus according to claim 1 , wherein the first judgment section judges timing at which the state of the object is changed to the predetermined state claim 1 , based on a change of the position of the indicator sequentially detected by the detection section with respect to respective ...

Подробнее
06-03-2014 дата публикации

METHOD FOR OPERATING A MEDICATION DISPENSER, AND MEDICATION DISPENSER

Номер: US20140067114A1
Принадлежит: Evondos Oy

The present invention relates to a method for obtaining information from a medication package. In the method, an image of the medication package is provided, the image is analysed to determine positions and formats of patterns in the image, a layout, which has similar pattern formats in the same positions as the image, is selected from a set of layouts stored in the medication dispenser, the selected layout defining the type of information for each pattern in the image, and the information contained in at least one of the patterns of the image is interpreted by linking the content of the pattern to the type of information defined in the selected layout. The invention also relates to a medication dispenser. 2. The method according to claim 1 , wherein the step of providing an image of the medication package comprises:capturing with a camera at least one image of the medication package, andcombining the at least one image into a single image.3. The method according to claim 2 , wherein the method comprises moving the medication package across an imaging area of the camera between the capturing of the images.4. The method according to claim 2 , wherein the method comprises illuminating with at least one light source the medication package.5. The method according to claim 4 , wherein the method comprises changing the intensity and/or the direction and/or the wavelength of the lighting.62. The method according to claim 4 , wherein the method comprises changing the position of the camera.7. The method according to claim 1 , wherein the step of providing an image of the medication package comprises receiving the image from a server over a communications network.8. The method according to claim 1 , wherein claim 1 , if an error occurs in the step of selecting a layout claim 1 , the method comprises:sending the image to a server over a communications network,generating, at the server, a new layout based on the image, andsending the new layout to the medication dispenser over ...

Подробнее
20-03-2014 дата публикации

SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR DETERMINING DOCUMENT VALIDITY

Номер: US20140079294A1
Принадлежит: KOFAX, INC.

In one embodiment, a method includes receiving an image of a document; performing optical character recognition (OCR) on the image; extracting an address of a sender of the document from the image based on the OCR; comparing the extracted address with content in a first database; identifying complementary textual information in a second database based on the address; and at least one of: extracting additional content from the image of the document; correcting one or more OCR errors in the document using the complementary textual information, and normalizing data from the document prior to determining a validity of the document using at least one of the complementary textual information and predefined business rules. At least one of the aforementioned operations is performed using a processor of a mobile device. Exemplary systems and computer program products are also disclosed. 1. A method , comprising:receiving an image of a document;performing optical character recognition (OCR) on the image;extracting an address of a sender of the document from the image based on the OCR;comparing the extracted address with content in a first database;identifying complementary textual information in a second database based on the address; and extracting additional content from the image of the document;', 'correcting one or more OCR errors in the document using the complementary textual information, and', 'normalizing data from the document prior to determining a validity of the document using at least one of the complementary textual information and predefined business rules,, 'at least one ofwherein at least one of performing the OCR, extracting the address, comparing the extracted address, identifying the complementary textual information, extracting the additional content, correcting the one or more OCR errors and normalizing the data is performed using a processor of a mobile device.2. The method as recited in claim 1 , further comprising validating textual information in ...

Подробнее
10-04-2014 дата публикации

SYSTEMS FOR MOBILE IMAGE CAPTURE AND REMITTANCE PROCESSING

Номер: US20140099001A1
Принадлежит: MITEK SYSTEMS, INC.

The present invention relates to automated document processing and more particularly, to methods and systems for document image capture and processing using mobile devices. In accordance with various embodiments, methods and systems for document image capture on a mobile communication device are provided such that the image is optimized and enhanced for data extraction from the document as depicted. These methods and systems may comprise capturing an image of a document using a mobile communication device; transmitting the image to a server; and processing the image to create a bi-tonal image of the document for data extraction. Additionally, these methods and systems may comprise capturing a first image of a document using the mobile communication device; automatically detecting the document within the image; geometrically correcting the image; binarizing the image; correcting the orientation of the image; correcting the size of the image; and outputting the resulting image of the document. 1. A computer implemented method for mobile image capture and remittance processing , where one or more processors are programmed to perform steps comprising:receiving a mobile image of a remittance coupon captured using a mobile device;detecting the remittance coupon within the mobile image of the remittance coupon;generating a document subimage that includes a portion of the mobile image that corresponds to the remittance coupon;geometrically correcting the mobile document image of the remittance coupon to generate a geometrically corrected image;processing the geometrically corrected image to generate a processed image;executing the one or more mobile image quality assurance tests on the processed image to assess the quality of the processed image; andexecuting one or more remittance processing steps on the processed image if the processed image passes the mobile image quality assurance tests.2. The method of claim 1 , wherein processing the geometrically corrected to ...

Подробнее
10-04-2014 дата публикации

Information processing apparatus, information processing method, and non-transitory computer readable medium

Номер: US20140099038A1
Принадлежит: Fuji Xerox Co Ltd

An information processing apparatus includes a reading unit, a recognition unit, a table-of-contents analysis unit, a main-body analysis unit, and a creation unit. The reading unit reads a table of contents page and a main body page as images. The recognition unit performs character recognition on the images of the table of contents and main body pages. The table-of-contents analysis unit analyzes the image of the table of contents page, and acquires at least a heading item in accordance with a result of character recognition. The main-body analysis unit analyzes the image of the main body page, and associates an image including the heading item with the heading item in accordance with a result of character recognition. The creation unit creates electronic bookmarked information in which bookmark information for associating the heading item with the image of the main body page is added to electronic information of the read images.

Подробнее
06-01-2022 дата публикации

Document search method, document search system, program, and non-transitory computer readable storage medium

Номер: US20220004570A1
Автор: Shoko SAITO, Tatsuya Okano
Принадлежит: Semiconductor Energy Laboratory Co Ltd

A similar document is retrieved in units of blocks of a document. Highly accurate document search is performed. A specific text block is searched for in a plurality of text blocks created by dividing each of a plurality of search target documents. A first search text block, which is a part of a search document, is prepared; full-text search is performed by using at least some of the plurality of text blocks as a first target and using the first search text block as a search criterion to calculate first relevance of each text block included in the first target to the first search text block; a second target is determined from the first target depending on a level of the first relevance; first similarities of each sentence included in the first search text block to sentences included in the second target are calculated; and at least one text block similar to the first search text block is retrieved using the first similarities.

Подробнее
06-01-2022 дата публикации

OPTICAL CHARACTER RECOGNITION (OCR) INDUCTION FOR MULTI-PAGE CHANGES

Номер: US20220004755A1
Принадлежит:

Provided are techniques for OCR induction for multi-page changes. A plurality of documents of a document type are processed to generate text area data for a text area in one or more documents of the plurality of documents, where the text area data includes coordinate locations of a zone for the text area based on expansion and direction of shift of the text area. A page flow model is trained using the plurality of documents and the text area data. In response to receiving a new document comprising the text area, a scanning script is received from the page flow model, where the page flow model identifies a new zone for the text area in the new document and determines how to adjust another zone for an element in the new document. The scanning script is used to scan the new document to generate digital text. 1. A computer-implemented method , comprising operations for:processing a plurality of documents of a document type to generate text area data for a text area in one or more documents of the plurality of documents, wherein the text area data includes coordinate locations of a zone for the text area based on expansion and direction of shift of the text area, and wherein the expansion is based on a font type and a font size of text values in the text area;training a page flow model using the plurality of documents and the text area data; and receiving a scanning script from the page flow model, wherein the page flow model identifies a new zone for the text area in the new document and determines how to adjust another zone for an element in the new document, and wherein the page flow model generates the scanning script to describe new coordinate locations of the new zone and the another zone in the new document; and', 'using the scanning script to scan the new document to generate digital text., 'in response to receiving a new document comprising the text area,'}2. The computer-implemented method of claim 1 , further comprising operations for:generating a plurality of ...

Подробнее
01-01-2015 дата публикации

Paper sheet handling machine and paper sheet handling method

Номер: US20150003718A1
Принадлежит: Glory Ltd

A paper sheet handling machine according to the present invention includes: an imaging unit configured to take an image of a paper sheet and generate a paper sheet image; an identification unit configured to identify a character of each digit position included in a serial number, from a serial number region of the paper sheet image, the identification unit further configured to, when a plurality of serial numbers are printed on the paper sheet, determine a serial number of the paper sheet based on character-identification results on the plurality of serial numbers.

Подробнее
07-01-2016 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM

Номер: US20160004682A1
Принадлежит:

There is provided an information processing apparatus. When an information processing apparatus receives a selection of a position under a situation where an image including a plurality of recognized cells is displayed on a display unit, the information processing apparatus displays an editing region for allowing a user to edit a text included in a recognized cell including the position, and a handle for changing the position of the recognized cell including the position which has been received a selection. 1. An information processing apparatus comprising:a control unit configured to display an image including a plurality of recognized cells on a display unit; anda receiving unit configured to receive a selection of a position in the image,wherein, when the receiving unit receives a selection of a position, the control unit displays on the display unit an editing region for allowing a user to edit a text included in a recognized cell including the position that has received the selection, and a handle for changing the position of the recognized cell including the position that has received the selection.2. The information processing apparatus according to claim 1 , wherein a result of character recognition performed on the text is displayed on the editing region claim 1 , andwherein, when the receiving unit receives an instruction of changing the result of the character recognition to another text on the editing region, the control unit deletes from the displayed image the text included in the recognized cell in which the position has been changed by using the handle and, instead, displays a text corresponding to the another text in the recognized cell in which the position has been changed by using the handle.3. The information processing apparatus according to claim 2 , wherein claim 2 , when the receiving unit receives an instruction of changing the result of the character recognition of the text to another text on the editing region claim 2 , the control unit ...

Подробнее
07-01-2016 дата публикации

Bulletin Board Data Mapping and Presentation

Номер: US20160004705A1
Принадлежит: Bitvore Corporation

A computer-implemented method performed at a server system having one or more processors and memory, the method comprising receiving a set of curated documents comprising one or more documents identified as being relevant to a sector, analyzing the set of curated documents to determine one or more words and a count of each of the one or more words for all documents of the curated set of documents, further analyzing the set of curated documents, by analyzing one or more n-grams based on the one or more words, determining a first score based on a term frequency and a global document frequency of each of the one or more words of each of the one or more n-grams, determining a document vector based on averages of the first score, where the document vector comprises a perfect document for the sector, and storing the document vector in the data store.

Подробнее
07-01-2021 дата публикации

AUTOMATIC DETECTION AND EXTRACTION OF WEB PAGE DATA BASED ON VISUAL LAYOUT

Номер: US20210004431A1
Принадлежит: Microsoft Technology Licensing, LLC

A system and method for automatically detecting and extracting entity data from a web page is provided. The method may include detecting a pattern for an entity based on a visual layout of the web page. A region of the webpage corresponding to the pattern may be identified as including the entity data, where the entity data is in a semi-structured form. Within the region, properties associated with the entity may be detected, annotations for the properties may be determined, and a category for the entity may be identified, where the properties, annotations, and category may be used to construct a schema for a structured form of the entity data. A template may be generated based on the schema and applied to the web page to extract the entity data in the structured form. 1. A system to automatically detect entity data within a web page , the system comprising:at least one processor; and detect a pattern for an entity based on a visual layout of the web page;', 'identify a region of the web page corresponding to the pattern, the region including the entity data;', 'within the region, detect a property associated with the entity;', 'determine an annotation for the property; and', 'identify a category for the entity based on the annotation., 'at least one memory including instructions which when executed by the at least one processor, causes the at least one processor to2. The system of claim 1 , wherein the entity data is in a semi-structured form within the web page.3. The system of claim 2 , wherein the instructions further cause the at least one processor to determine a schema for a structured form of the entity data based on the property claim 2 , the annotation claim 2 , and the category.4. The system of claim 3 , wherein the instructions further cause the at least one processor to generate a template for the web page based on the schema.5. The system of claim 4 , wherein the template is a visual layout based template.6. The system of claim 4 , wherein the template ...

Подробнее
07-01-2016 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM

Номер: US20160005203A1
Принадлежит:

In a case where a position of a recognized cell is shifted from a position of a ruled line of an actual cell, if the recognized cell is deleted, a part of the ruled line of the actual cell is deleted. According to an aspect of the present invention, straight lines are detected from regions around four sides constituting the recognized cell, and an inside of a region surrounded by the detected straight lines is deleted. 1. An information processing apparatus comprising:a detection unit that detects straight lines from regions around four sides that constitute a recognized cell; anda deletion unit that deletes color information of an inside of a region surrounded by the detected four straight lines.2. The information processing apparatus according to claim 1 , wherein the regions around the four sides constituting the recognized cell are regions enlarged in orthogonal directions to respective sides while the respective sides are set as references.3. The information processing apparatus according to claim 1 , wherein the straight line is an edge of a ruled line of an original cell corresponding to the recognized cell.4. The information processing apparatus according to claim 3 ,wherein the detection unit includesan edge detection unit that detects edge pixels from the regions around the four sides, anda ruled line detection unit that detects the straight lines based on a number of duplications of lines passing through the detected respective edge pixels.5. The information processing apparatus according to claim 4 ,wherein, in a case where a plurality of lines are detected from the region around one side, the ruled line detection unit detects an innermost line as the straight line among the plurality of lines while a center position of the recognized cell is set as a reference.6. The information processing apparatus according to claim 1 , wherein the recognized cell is a circumscribed rectangle of a white pixel block detected from a table region included in a scanned ...

Подробнее
02-01-2020 дата публикации

TEXT ENTITY DETECTION AND RECOGNITION FROM IMAGES

Номер: US20200004815A1
Принадлежит: Microsoft Technology Licensing, LLC

Named entity recognition can be performed on an image to classify any text in an image. A boundary that encompasses the classified entity may be predicted. Subsequently, upon request, optical character recognition (OCR) can be performed on just the region inside the boundary. The disclosed implementations conserve computer resources such as processing power and battery compared to performing OCR on the entire image. 1. A system , comprising:at least one computer readable device storing instructions; receiving an image comprising a plurality of entities;', 'determining, using a neural network, a boundary of one of the plurality of entities of the image that comprises text;', 'predicting a classification of the text of the one of the plurality of entities of the image;', 'outputting the classification of the text;', 'receiving a request to perform an action based upon the classification of the text; and', 'performing the action in accordance with the request., 'one or more hardware processors that are coupled to the at least computer readable device and that are configured to execute the instructions to cause the system to perform operations comprising2. The system of claim 1 , wherein the operations further comprise performing optical character recognition on only a region within the boundary.3. The system of claim 1 , wherein the optical character recognition is performed subsequent to the request.4. The system of claim 1 , wherein the action is selected from the group consisting of: making a telephone call claim 1 , adding contact information claim 1 , storing information to the computer readable device; searching the Internet claim 1 , preparing an email message claim 1 , navigating to a home address claim 1 , preparing a text message claim 1 , and opening a web browser to a web page.5. The system of claim 1 , wherein the operation further comprise visually indicating the boundary of the one of the plurality of entities of the image; or visually indicating the one ...

Подробнее
13-01-2022 дата публикации

IDENTIFICATION OF TABLE PARTITIONS IN DOCUMENTS WITH NEURAL NETWORKS USING GLOBAL DOCUMENT CONTEXT

Номер: US20220012486A1
Автор: Semenov Stanislav
Принадлежит:

Aspects of the disclosure provide for mechanisms for identification of table partitions in documents using neural networks. A method of the disclosure includes obtaining a plurality of symbol sequences of a document having at least one table, determining a plurality of vectors representative of symbol sequences having at least one alphanumeric character or a table graphics element, processing the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors, determining an association between a first recalculated vector and a second recalculated vector, wherein the first recalculated vector is representative of an alphanumeric sequence and the second recalculated vector is associated with a table partition, and determining, based on the association between the first recalculated vector and the second recalculated vector, an association between the alphanumeric sequence and the table partition. 1. (canceled)2. A method , comprising:obtaining a plurality of symbol sequences of a document, the document having one or more tables, wherein each of the plurality of symbol sequences of the document comprises at least one i) a table graphics element or ii) an alphanumeric sequence that includes one or more alphanumeric characters;determining, by a processing device, a plurality of vectors, wherein each vector of the plurality of vectors is representative of one of the plurality of symbol sequences; anddetermining, by the processing device applying one or more neural networks to the plurality of vectors, an association between a first alphanumeric sequence and a table partition of the one or more tables of the document.3. The method of claim 2 , wherein determining the association between the first alphanumeric sequence and the table partition comprises:obtaining, using the one or more neural networks, a plurality of recalculated vectors, wherein each of the plurality of recalculated vectors is based on the plurality of vectors; ...

Подробнее
07-01-2021 дата публикации

METHODS AND SYSTEMS FOR FINDING ELEMENTS IN OPTICAL CHARACTER RECOGNITION DOCUMENTS

Номер: US20210004579A1

Embodiments for finding elements in optical character recognition (OCR) documents are provided. An indication of a selected portion of document is received. Salient pixels in the selected portion of the document are determined. Properties of the salient pixels in the selected portion of the document are identified. The properties of the salient pixels in the selected portion of the document are compared to properties of pixels in each of a plurality of portions of an OCR-converted version of the document. A cognitive analysis is utilized to select at least some of the plurality of portions of the OCR-converted version of the document as suspected matches to the selected portion of the document. 1. A method for finding elements in optical character recognition (OCR) documents comprising:receiving an indication of a selected portion of document;determining salient pixels in the selected portion of the document;identifying properties of the salient pixels in the selected portion of the document;comparing the properties of the salient pixels in the selected portion of the document to properties of pixels in each of a plurality of portions of an OCR-converted version of the document; andutilizing a cognitive analysis to select at least some of the plurality of portions of the OCR-converted version of the document as suspected matches to the selected portion of the document.2. The method of claim 1 , wherein said selection of the at least some of the plurality of portions of the OCR-converted version of the document as suspected matches to the selected portion of the document is performed utilizing a similarity metric.3. The method of claim 1 , wherein the identifying of the properties of the salient pixels in the selected portion of the document includes determining a relationship between the salient pixels in the selected portion of the document to other pixels in the document.4. The method of claim 3 , wherein the determining of the relationship between the salient ...

Подробнее
07-01-2021 дата публикации

Revealing Content Reuse Using Fine Analysis

Номер: US20210004582A1
Принадлежит: Microsoft Technology Licensing LLC

Systems and methods for managing content provenance are provided. A network system accesses a document of a plurality of documents to be analyzed. The network system extracts text fragments from the document including a first fragment and a second fragment. A determination is made whether each of the text fragments match an entry in a hash table. Based on a first fragment not matching any entries in the hash table, the network system creates a new entry in the hash table, whereby the first fragment is used to generate a key in the hash table. Based on a second fragment matching an entry of the hash table, the network system associates the document with a key of the matching entry in the hash table, whereby the associating comprising updating the hash table with an identifier of the document.

Подробнее
07-01-2021 дата публикации

Revealing Content Reuse Using Coarse Analysis

Номер: US20210004583A1
Принадлежит:

Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime. 1. A method comprising:accessing, by a network system, a plurality of documents;hashing, by the network system, each of the plurality of documents to identify one or more content features;comparing, by a processor of the network system, the content features of the plurality of documents to determine a similarity score between each of the plurality of documents;clustering, by the network system, the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents; andstoring clustering information associated with the one or more clusters in a data store.2. The method of claim 1 , wherein hashing each of the plurality of documents comprises performing a MinHash.3. The method of claim 1 , wherein the comparing the content features comprises determining Jaccard scores between each pair of documents of the plurality of documents claim 1 , the Jaccard score indicating a ratio of overlapping content features.4. The method of claim 3 , further comprising generating a distance matrix between each of the ...

Подробнее
07-01-2021 дата публикации

METHOD OF AUTOMATICALLY EXTRACTING INFORMATION OF A PREDEFINED TYPE FROM A DOCUMENT

Номер: US20210004584A1
Принадлежит:

Method and system of automatically extracting information of a predefined type from a document is provided. The method comprises using an object detection algorithm to identify at least one segment of the document that is likely to comprise the information of the predefined type. The method further comprises building at least one bounding box corresponding to the at least one segment and if the bounding box is likely to comprise the information of the predefined type extracting the information comprised by the bounding box from the at least one bounding box. 1. A method comprising:identifying, by an object detection algorithm, at least one segment of a document that is likely to comprise information of a predefined type;building at least one bounding box corresponding to the at least one segment;identifying that the at least one bounding box likely comprises the information of the predefined type; andextracting the information from the at least one bounding box.2. The method of wherein a character identification algorithm is used for extracting the information of the predefined type from the at least one bounding box.3. The method of wherein the character identification algorithm is configured to utilize characteristics of the information of the predefined type in order to recognize the information of the predefined type.4. The method of wherein the characteristics of the information of the predefined type comprise a number format and at least one of a comma or a decimal point.5. The method of wherein the characteristics of the information of the predefined type are identified by a multilayer neural network.6. The method of wherein the multilayer neural network includes a first layer configured to differentiate between empty regions and non-empty regions of the document claim 5 , the first layer of the neural network is further configured to identify basic patterns present on the document claim 5 , and the neural network further includes a second layer configured to ...

Подробнее
07-01-2021 дата публикации

METHOD AND APPARATUS FOR DETERMINING (RAW) VIDEO MATERIALS FOR NEWS

Номер: US20210004602A1
Автор: Lu Daming, Tian Hao
Принадлежит:

The present disclosure discloses a method and apparatus for determining video material of news. The method for determining video material of news comprises: acquiring a weighted score value of a score of a keyword of a news text in a plurality of dimensions; filtering a keyword set of news based on the weighted score value of the score of the keyword; searching a pre-selected video using the keyword set of the news; and determining video material of the news based on the pre-selected video. The present disclosure improves the consistency between the video material of the news and the news text. 1. A method for determining video material of news , comprising:acquiring a weighted score value of a score of a keyword of a news text in a plurality of dimensions, the score of the keyword of the news text in the plurality of dimensions including: a score of the keyword determined based on a correlation between a word obtained by segmenting the news text and a news title;filtering a keyword set of news based on the weighted score value of the score of the keyword;searching a pre-selected video using the keyword set of the news; anddetermining video material of the news based on the pre-selected video.2. The method according to claim 1 , wherein the acquiring a weighted score value of a score of a keyword of a news text in a plurality of dimensions includes at least one of:acquiring the score of the keyword of the news text using an attention model of extracting the keyword, the acquiring the score of the keyword of the news text using an attention model of extracting the keyword including: acquiring, using the attention model of extracting the keyword, the score of the keyword determined based on the correlation between the word obtained by segmenting the news text and the news title;acquiring the score of the keyword of the news text using a TF-IDF; oracquiring the score of the keyword of the news text using a domain dictionary of a different granularity.3. The method ...

Подробнее
07-01-2021 дата публикации

TRAINING DIGITAL CONTENT CLASSIFICATION MODELS UTILIZING BATCHWISE WEIGHTED LOSS FUNCTIONS AND SCALED PADDING BASED ON SOURCE DENSITY

Номер: US20210004670A1
Принадлежит:

Methods, systems, and non-transitory computer readable storage media are disclosed for training a machine-learning model utilizing batchwise weighted loss functions and scaled padding based on source density. For example, the disclosed systems can determine a density of words or phrases in digital content from a digital content source that indicate an affinity towards one or more content classes. In some embodiments, the disclosed systems can use the determined source density to split digital content from the source into segments and pad the segments with padding characters based on the source density. The disclosed systems can also generate document embeddings using the padded segments and then train the machine-learning model using the document embeddings. Furthermore, the disclosed system can use batchwise weighted cross entropy loss for applying different class weightings on a per-batch basis during training of the machine-learning model. 1. A non-transitory computer readable storage medium comprising instructions that , when executed by at least one processor , cause a computer system to:determine, for a digital content source of a plurality of digital content sources, a source density based on comparing words in text documents from the digital content source to a word corpus;divide a text document from the digital content source into a plurality of segments, wherein each segment of the plurality of segments has a length determined based on the source density;generate, from the plurality of segments, a plurality of padded segments by adding a set of padding characters to each segment of the plurality of segments for the text document from the digital content source, wherein each padded segment of the plurality of padded segments has a padded segment length; andtrain a machine-learning model to classify documents based on the plurality of padded segments.2. The non-transitory computer readable storage medium as recited in claim 1 , further comprising ...

Подробнее
04-01-2018 дата публикации

METHOD FOR RECOGNIZING TABLE, FLOWCHART AND TEXT IN DOCUMENT IMAGES

Номер: US20180005029A1
Автор: MING Wei
Принадлежит: KONICA MINOLTA LABORATORY U.S.A., INC.

A method for recognizing a binary document image as a table, pure text, or flowchart includes calculating a side profile of the image for each of the four sides, calculating a boundary removal size N corresponding to each side based on widths of lines or strokes closest to that side, and for each side, removing a boundary of size N from the document image, and re-calculating the side profile for each side after the removal. Then, based on a comparison of the side profiles and the re-calculated side profiles, the input document image is recognized as a table if all side profiles change from smooth to non-smooth, as pure text if the side profile changes are small, and as a flowchart if the original side profiles contain multiple sharp changes and wide flat regions and if the side profile changes significantly in the previously wide flat regions. 1. A method implemented in a data processing apparatus for recognizing an input document image as a table , pure text , or flowchart , the document image being a binary image where each pixel is a background pixel having a background pixel value or a content pixel having a content pixel value , the method comprising:(a) calculating a side profile of the image for each of a top, bottom, left and right sides, each side profile being, for each pixel position along that side of the document image, a distance from that side to a first content pixel along a direction perpendicular to that side;(b) calculating a boundary removal size N corresponding to each side based on line widths at a plurality of pixel positions along that side using content pixels closest to that side;(c) for each side, removing a boundary of the size N from the document image by setting N consecutive pixels, starting from the first content pixel and going in the direction perpendicular to that side, to the background pixel value, to generate a boundary-removed image for that side;(d) re-calculating a side profile for each of the top, bottom, left and right ...

Подробнее
02-01-2020 дата публикации

COMMUNICATIONS SYSTEM

Номер: US20200005031A1
Автор: DeWitt Joshua Wesley
Принадлежит: COIN LION LLC

An electronic communications method, includes receiving, by a computing device, first electronic information associated with generated a graphical feature in a graphical user interface. The electronic communications method further includes generating, by the computing device, the graphical feature. The electronic communications method further includes sending, by the computing device, the graphical feature to another computing device. The electronic communications method further receiving, by the computing device, second electronic information to classify the graphical feature as public information. The electronic communications method further includes sending, by the computing device, the graphical feature to a third computing device based on the classification of the graphical feature as public information. 1. An electronic communications method , comprising:receiving, by a computing device, first electronic information associated with generated a graphical feature in a graphical user interface;generating, by the computing device, the graphical feature;sending, by the computing device, the graphical feature to another computing device;receiving, by the computing device, second electronic information to classify the graphical feature as public information; andsending, by the computing device, the graphical feature to a third computing device based on the classification of the graphical feature as public information.2. The electronic communications method , further comprising:receiving, by the computing device, third electronic information from the third computing device to change the graphical feature; and 'the third electronic information changes the graphical feature as displayed by the other computing device.', 'sending, by the computing device, the third electronic information to the other computing device,'}3. The electronic communications method of claim 1 , where the graphical feature includes an aggregation of different types of electronic information.4. ...

Подробнее
02-01-2020 дата публикации

CLASSIFYING DIGITAL DOCUMENTS IN MULTI-DOCUMENT TRANSACTIONS BASED ON EMBEDDED DATES

Номер: US20200005032A1
Принадлежит:

A classifier receives a document from a multi-document transaction. The classifier analyzes the document to identify one or more embedded dates in the content of the document and context of one or more positions of the one or more embedded dates in the document. The classifier evaluates each of the one or more embedded dates based on the separate context of each of the one or more positions within the document and a relative age of the one or more embedded dates in view of temporal characteristics of multiple categories of documents of a transaction to select a particular category associated with the document from among the multiple categories. The classifier classifies the document within the transaction as a particular logical type identified by the particular category from among multiple logical types. 1. A method comprising:receiving, by a computer system, a document;analyzing, by the computer system, the document to identify one or more embedded dates in the content of the document and context of one or more positions of the one or more embedded dates in the document;evaluating, by the computer system, each of the one or more embedded dates based on the separate context of each of the one or more positions within the document and a relative age of the one or more embedded dates in view of a plurality of temporal characteristics of a plurality of categories of documents of a transaction to select a particular category associated with the document from among the plurality of categories; andclassifying, by the computer system, the document within the transaction as a particular logical type identified by the particular category from among a plurality of logical types for the transaction.2. The method according to claim 1 , wherein analyzing claim 1 , by the computer system claim 1 , the document to identify one or more embedded dates in the content of the document and context of one or more positions of the one or more embedded dates in the document further ...

Подробнее
02-01-2020 дата публикации

Column Inferencer

Номер: US20200005033A1
Принадлежит: KONICA MINOLTA LABORATORY U.S.A., INC.

A method for processing an electronic document (ED) to infer columns in the ED, where the ED comprises a plurality of characters. The method includes generating a mark-up version of the ED having text-layout attributes of the characters in the ED, where the characters are grouped into paragraphs based on the text-layout attributes, and each paragraph corresponds to a paragraph bounding box surrounding a corresponding paragraph, generating border pieces by initiating a pair of left scan and right scan from each paragraph bounding box to identify any adjacent paragraph bounding box, and generating, based at least on the border pieces, column borders for use in inferring the columns in the ED, where at least one column has a vertically aligned portion of the paragraphs. 1. A method for processing an electronic document (ED) to infer columns in the ED , wherein the ED comprises a plurality of characters , the method comprising: the characters are grouped into a plurality of paragraphs based on the text-layout attributes, and', 'each of the plurality of paragraphs corresponds to a paragraph bounding box surrounding a corresponding paragraph;, 'generating a mark-up version of the ED comprising text-layout attributes of the characters in the ED, wherein'}generating a plurality of border pieces by initiating a pair of left scan and right scan from each of the plurality of paragraph bounding boxes to identify any adjacent paragraph bounding box; andgenerating, based at least on the plurality of border pieces, a plurality of column borders for use in inferring the columns in the ED,wherein at least one column comprises a vertically aligned portion of the plurality of paragraphs.2. The method of claim 1 , wherein generating the plurality of column borders comprises:generating a sorted list of the plurality of border pieces based on respective locations of the plurality of border pieces;generating a plurality of potential column borders by initiating a pair of forward traversal ...

Подробнее
02-01-2020 дата публикации

SYSTEMS AND METHODS FOR IMAGE DATA PROCESSING

Номер: US20200005034A1
Принадлежит: Capital One Services, LLC

Systems and methods for processing image data representing a document to remove deformations contained in the document are disclosed. A system may include one or more memory devices storing instructions and one or more processors configured to execute the instructions. The instructions may instruct the system to provide, to a machine learning system, a training dataset representing a plurality of documents containing a plurality of training deformations. The instructions may also instruct the system to use the machine learning system to process image data representing a target document containing a target document deformation. The machine learning system may generate restored image data representing the target document with the target document deformation removed. The instructions may further instruct the system to provide the restored image data to at least one of a graphical user interface, an image storage device, or a computer vision system. 120-. (canceled)21. A system , comprising:one or more memory devices storing instructions; and transforming training image data to generate transformation image data representing images of training documents with training deformations;', 'training a machine learning system, using the training image data and the generated transformation image data, to process image data representing a document containing a document deformation;', 'providing, to the machine learning system, image data representing a target document including a target document deformation; and', 'processing the image data representing the target document containing the target document deformation, using the machine learning system, to generate the target document with the target document deformation corrected., 'one or more processors configured to execute the instructions to perform operations comprising22. The system of claim 21 , wherein the transformation image data comprises a series of partially transformed images.23. The system of claim 22 , wherein: ...

Подробнее
02-01-2020 дата публикации

RANGE AND/OR POLARITY-BASED THRESHOLDING FOR IMPROVED DATA EXTRACTION

Номер: US20200005035A1
Принадлежит:

Computerized techniques for improved binarization and extraction of information from digital image data are disclosed in accordance with various embodiments. The inventive concepts include rendering a digital image using a plurality of binarization thresholds to generate a plurality of binarized digital images, wherein at least some of the binarized digital images are generated using one or more binarization thresholds that are determined based on a priori knowledge regarding an object depicted in the digital image; identifying one or more connected components within the plurality of binarized digital images; and identifying one or more text regions within the digital image based on some or all of the connected components. Systems and computer program products are also disclosed. 1. A computer program product comprising a computer readable storage medium having embodied thereon computer readable program instructions configured to cause a processor , upon execution of the computer readable program instructions , to:render a digital image using a plurality of binarization thresholds to generate a plurality of binarized digital images, wherein at least some of the binarized digital images are generated using one or more binarization thresholds that are determined based on a priori knowledge regarding an object depicted in the digital image;identify one or more connected components within the plurality of binarized digital images; andidentify one or more text regions within the digital image based on some or all of the connected components.2. The computer program product as recited in claim 1 , further comprising computer readable program instructions configured to cause the processor claim 1 , upon execution of the computer readable program instructions claim 1 , to:identify one or more connected components of black pixels within the plurality of binarized digital images;identify one or more connected components of white pixels within the plurality of binarized digital ...

Подробнее
03-01-2019 дата публикации

SYSTEMS AND METHODS FOR NATURAL LANGUAGE PROCESSING OF STRUCTURED DOCUMENTS

Номер: US20190005029A1
Принадлежит:

Systems and methods for natural language processing of structured documents. In another embodiment, in an information processing apparatus comprising at least one computer processor, a method for processing a structured document may include: (1) receiving a document; (2) parsing the document into a plurality of components using a statistical parser; (3) extracting a plurality of entities from each component; (4) identifying a potential relationship between two of the plurality of entities; (5) generating a numeric representation for the potential relationship; (6) confirming the potential relationship with a logical regression model; and (7) generating and storing a unified structured file for the document. 1. A method for processing a structured document , comprising: receiving a document;', 'parsing the document into a plurality of components using a statistical parser;', 'extracting a plurality of entities from each component;', 'identifying a potential relationship between two of the plurality of entities;', 'generating a numeric representation for the potential relationship;', 'confirming the potential relationship with a logical regression model; and', 'generating and storing a unified structured file for the document., 'in an information processing apparatus comprising at least one computer processor2. The method of claim 1 , wherein the statistical parser comprises a neural network.3. The method of claim 1 , wherein the plurality of components comprise at least one of a participating party claim 1 , an article claim 1 , a section claim 1 , a subsection claim 1 , and a subsubsection.4. The method of claim 1 , wherein the statistical parser parses the document based on a first vector of word embeddings and a second vector of orthographic properties of words in the document.5. The method of claim 1 , wherein the step of parsing the document into a plurality of components comprises identifying a relationship among the plurality of components.6. The method of ...

Подробнее
07-01-2021 дата публикации

ANOMALY AND FRAUD DETECTION WITH FAKE EVENT DETECTION USING LINE ORIENTATION TESTING

Номер: US20210004949A1
Принадлежит:

The present disclosure involves systems, software, and computer implemented methods for transaction auditing. One example method includes receiving a request to authenticate a document image. The image is preprocessed to prepare the image for line orientation analysis. The preprocessed image is analyzed to determine lines in the preprocessed image. The determined lines are automatically analyzed by performing line orientation test(s) on the determined lines to generate line orientation test result(s) for the preprocessed image. The line orientation test result(s) are evaluated to determine whether the image is authentic. In response to determining that at least one line orientation test result matches a predefined condition corresponding to an unauthentic document, a determination is made that the image is not authentic. In response to determining that none of the line orientation test results match any predefined condition corresponding to an unauthentic document, a determination is made that the image is authentic. 1. A computer-implemented method , comprising:receiving a request to authenticate an image of a document;preprocessing the image of the document to prepare the image of the document for line orientation analysis;automatically analyzing the preprocessed image to determine lines in the preprocessed image;automatically analyzing the determined lines including performing at least one line orientation test on the determined lines to generate at least one line orientation test result for the preprocessed image; and in response to determining that at least one line orientation test result matches a predefined condition corresponding to an unauthentic document, determining that the image of the document is not authentic; and', 'in response to determining that none of the line orientation test results match any predefined condition corresponding to an unauthentic document, determining that the image of the document is authentic., 'evaluating the at least one ...

Подробнее
03-01-2019 дата публикации

METHOD AND SYSTEM FOR GENERATING PARSED DOCUMENT FROM DIGITAL DOCUMENT

Номер: US20190005322A1
Принадлежит:

A method and system for generating a parsed document from a digital document. The method includes segmenting the digital document into at least one section; classifying the at least one section of the digital document into at least one of a class: text class, table class, figure class, noise class; identifying a reading order of the digital document; and processing each of the at least one section of the digital document. Furthermore, processing each of the at least one section of the digital document comprises extracting content from each of the at least one section based on the class; and structuring the extracted content based on the reading order to generate the parsed document. 1. A method of generating a parsed document from a digital document , wherein the method comprises:segmenting the digital document into at least one section;classifying the at least one section of the digital document into at least one of a class: text class, table class, figure class, noise class;identifying a reading order of the digital document; and extracting content from each of the at least one section based on the class; and', 'structuring the extracted content based on the reading order to generate the parsed document., 'processing each of the at least one section of the digital document, wherein the processing comprises2. The method of claim 1 , wherein the method further comprises determining an importance factor for each of the at least one section of the digital document.3. The method of claim 1 , wherein the identifying the reading order of the digital document comprises:identifying layout of at least one section of the digital document; anddetermining a sequential order of the at least one section based on the layout.4. The method of claim 1 , wherein extracting content from each of at least one section having a text class comprises:identifying one or more text blocks and text block features from the at least one section having text class; andextracting text and text ...

Подробнее
03-01-2019 дата публикации

Information processing apparatus for tracking processing

Номер: US20190005323A1
Автор: Mitsuo Kimura
Принадлежит: Canon Inc

An apparatus obtains first transformation information, such as a first transformation matrix, to be used for coordinate transformation between a coordinate system in an overall image prepared beforehand and a coordinate system in a first captured image, by comparing a feature point extracted from the overall image and a feature point extracted from the first captured image. In a case where the first transformation information is updated, the apparatus generates a partial image from the overall image based on an image-taking position of a just preceding image, and compares a feature point extracted from the partial image with a feature point extracted from a captured image to be used for updating of the first transformation information, and accordingly obtains transformation information for updating. The apparatus updates the first transformation information by using the obtained transformation information for updating. Thus, accuracy of tracking processing is improved.

Подробнее
03-01-2019 дата публикации

METHOD AND APPARATUS FOR SEPARATING TEXT AND FIGURES IN DOCUMENT IMAGES

Номер: US20190005324A1
Принадлежит:

A method and apparatus for separating a text and figure of a document image are provided. The method of separating the text and the figure of the document image includes acquiring a document image, dividing the document image into a plurality of regions of interest, acquiring a feature vector by using a two-dimensional (2D) histogram by resizing the regions of interest and extracting a connection component of the regions of interest, acquiring a transformation vector of the feature vector by using a kernel, obtaining a cluster center of the transformation vector, and performing clustering on the cluster center to acquire a supercluster, and classifying the supercluster into one of a text class and a figure class, based on the number of superclusters. 1. A method of separating text and a figure of a document image , the method comprising:acquiring the document image;dividing the document image into a plurality of regions of interest;acquiring a feature vector by using a two-dimensional (2D) histogram obtained by resizing one of the regions of interest among the plurality of the regions of interest, and extracting a connection component of the resized region of interest;acquiring a transformation vector of the feature vector by using a kernel;acquiring a cluster center of the transformation vector;acquiring a supercluster by performing clustering on the cluster center; andclassifying the supercluster into one of a text class and a figure class, based on the number of superclusters.2. The method of claim 1 , wherein the dividing of the document image into the plurality of regions of interest comprises:filling horizontal background pixels having a length equal to or less than a preset first threshold value, with a foreground color;filling vertical background pixels having a length equal to or less than a preset second threshold value, with the foreground color;performing a logical OR operation on a plurality of images including the horizontal background pixels filled ...

Подробнее
03-01-2019 дата публикации

IDENTIFICATION OF EMPHASIZED TEXT IN ELECTRONIC DOCUMENTS

Номер: US20190005325A1
Автор: MING Wei
Принадлежит:

To identify emphasized text, bounding boxes are based on clusters resulting from horizontal compression and horizontal morphological dilation. The bounding boxes are processed to determine if any contain words or characters in bold. A bounding box is eliminated based on a comparison of its density and an average density across all bounding boxes. If its density is greater, text elements within the bounding box are evaluated to determine whether the text element is bold. 1. A method of identifying emphasized text , the method comprising:performing horizontal compression on an input image to generate a horizontally compressed image, the input image comprising lines of text, each line of text comprising a plurality of words or characters;performing horizontal morphological dilation on the compressed image to form a horizontally dilated image, the horizontally dilated image comprising clusters, each cluster corresponding to a different one of the lines of text;calculating a bounding box for each cluster, resulting in a plurality of bounding boxes;calculating a first average density, the first average density calculated across all the bounding boxes;for each of the bounding boxes, comparing the first average density to a density of the bounding box; andidentifying a specific bounding box, from among the plurality of bounding boxes, as having a word or character in bold, the identifying based on the comparison of the first average density to the density of the specific bounding box.2. The method of claim 1 , wherein the bounding boxes claim 1 , for calculating the first average density claim 1 , are on the horizontally compressed image.3. The method of claim 1 , wherein:each bounding box includes upper and lower zones, at least one of which contains a fractional part of a word or a character, andnone of the upper zones and none of the lower zones are used in the calculating of the first average density.4. The method of claim 1 , further comprising detecting an underline ...

Подробнее
03-01-2019 дата публикации

INFORMATION PROCESSING APPARATUS, PROGRAM, AND INFORMATION PROCESSING METHOD

Номер: US20190005347A1
Автор: Kishimoto Ryo
Принадлежит:

First coordinate transformation information between an entire image and a first captured image is calculated by a feature point comparing process. Second coordinate transformation information between the first captured image and a second captured image is calculated by a feature point tracing process, the second captured image being a captured image at a timing when the first coordinate transformation information is calculated. Third coordinate transformation information between an immediately previous captured image and a third captured image is calculated by a feature point tracing process. A data input area in the entire image is mapped on the third captured image based on the first to the third coordinate transformation information pieces. Updates of the first and the second coordinate transformation information pieces may be suppressed where a change amount exceeds a predetermined threshold. 1. An information processing apparatus comprising:a memory that stores instructions; anda processor that executes the instructions to perform:feature point comparison processing for comparing a feature point extracted from an entire image prepared in advance and a feature point extracted from a first captured image, and calculating first transformation information used for coordinate transformation between a coordinate system of the entire image and a coordinate system of the first captured image;feature point trace processing for tracing a position of a feature point in a second captured image when the first transformation information is calculated based on the feature point extracted from the first captured image, and calculating second transformation information used for coordinate transformation between the coordinate system of the first captured image and a coordinate system of the second captured image;feature point trace processing for tracing a position of a feature point in a third captured image based on a feature point extracted from a captured image immediately ...

Подробнее
03-01-2019 дата публикации

Relevance Management System

Номер: US20190005436A1
Автор: Alexander Mike
Принадлежит:

A relevance management system for summarizing relevance of a plurality of project description (PD) documents with respect to a proposal outline comprised of outline sections, the proposal outline associated with a request for proposal (RFP) document comprised of RFP segments; receiving an outline-to-RFP matrix; determining a PD-document-to-RFP-segment relevance for a PD document from the plurality of PD documents and the RFP segments using document similarity processing and a metric; producing a proposal writing plan comprising a first column that represents the outline sections, a second column representing the RFP segments, and a third column representing PD-document-to-RFP-segment relevance; and transmitting the proposal writing plan to an originator of the outline-to-RFP matrix. 1. A non-transitory computer readable medium including code that is executed by a computer system comprising one or more processors , a main memory , a secondary storage , a communications bus , one or more input or output devices , and a network interface to perform a method of summarizing relevance of a plurality of project description (PD) documents with respect to a proposal outline comprised of J outline sections , said proposal outline associated with a request for proposal (RFP) document that is comprised of M RFP segments , said plurality of PD documents stored in a data store , the method comprising:{'sub': 1', '1', '1, 'receiving using at least one of said input devices, said communications bus, or said network interface an outline-to-RFP matrix Oof dimension J by M, where each element O[j, m] of said matrix Ois an indicator of whether an outline section j is associated with an RFP segment m;'}receiving using at least one of said input devices, said communications bus, or said network interface said RFP document comprised of said MRFP segments;{'sub': 1', '1', '1, 'determining using said one or more processors a relevance, wherein said relevance comprises a PD-document-to-RFP- ...

Подробнее
05-01-2017 дата публикации

INFORMATION PROCESSING APPARATUS, SETTING CONDITION SPECIFICATION METHOD FOR IMAGE FORMING APPARATUS

Номер: US20170006172A1
Автор: OKAZAKI Yusuke
Принадлежит:

An information processing apparatus can be worn by a user of an image forming apparatus, and includes a camera, a visual line detecting portion, a target image recognizing portion, a positional condition determining portion, and a wireless communication portion. The target image recognizing portion, when a positional condition among setting conditions for image-formation-related processes is specified, recognizes an image of a specification target that is included in a visual field image photographed by the camera, the specification target being a target of specification of the positional condition. The positional condition determining portion determines a specified positional condition based on a position in the image of the specification target recognized by the target image recognizing portion, the position corresponding to the visual line direction detected by the visual line detecting portion. The wireless communication portion wirelessly transmits information of the specified positional condition to the image forming apparatus. 1. An information processing apparatus configured to be worn by a user of an image forming apparatus , the information processing apparatus comprising:a camera configured to photograph a visual field of the user;a visual line detecting portion configured to detect a visual line direction of the user;a target image recognizing portion configured to, when a positional condition among setting conditions for image-formation-related processes is specified, recognize an image of a specification target that is included in a visual field image photographed by the camera, the specification target being a target of specification of the positional condition;a positional condition determining portion configured to determine a specified positional condition based on a position in the image of the specification target recognized by the target image recognizing portion, the position corresponding to the visual line direction detected by the visual ...

Подробнее
20-01-2022 дата публикации

METHOD FOR PROCESSING A NOTE PAGE OF A NOTEBOOK, COMPUTER DEVICE AND STORAGE MEDIUM

Номер: US20220019783A1
Автор: CHENG Chao
Принадлежит:

The present disclosure relates to a method for processing a note page of a notebook, a computer device and a storage medium. The method includes: acquiring a note page identification of the note page selected from the original notebook; reading a note page configuration file corresponding to the note page identification; parsing the note page configuration file to obtain the handwritten contents of the note page corresponding to the note page identification; creating a new notebook based on the handwritten contents. 1. A method for processing a note page of a notebook , comprising:acquiring a note page identification of the note page selected from an original notebook;reading a note page configuration file corresponding to the note page identification;parsing the note page configuration file to obtain handwritten contents of the note page corresponding to the note page identification; andcreating a new notebook based on the handwritten contents.2. The method according to claim 1 , wherein the acquiring a note page identification of the note page selected from an original notebook comprises:presenting thumbnails of note pages in response to a preset trigger operation for the original notebook;determining a thumbnail selected from the presented thumbnails by a touch operation; andacquiring the note page identification of the note page corresponding to the selected thumbnail.3. The method according to or claim 1 , wherein the reading a note page configuration file corresponding to the note page identification comprises:reading a basic information file of the original notebook;searching a file identification array of the basic information file for a file identification corresponding to the note page identification; andreading the note page configuration file based on the file identification.4. The method according to claim 1 , wherein the method further comprises:creating a new resource table;importing a picture identification of a picture resource of the note page into ...

Подробнее
02-01-2020 дата публикации

USER FEEDBACK FOR REAL-TIME CHECKING AND IMPROVING QUALITY OF SCANNED IMAGE

Номер: US20200007720A1
Автор: ILIC Alexander
Принадлежит: ML Netherlands C.V.

A smartphone may be freely moved in three dimensions as it captures a stream of images of an object. Multiple image frames may be captured in different orientations and distances from the object and combined into a composite image representing an image of the object. The image frames may be formed into the composite image based on representing features of each image frame as a set of points in a three dimensional point cloud. Inconsistencies between the image frames may be adjusted when projecting respective points in the point cloud into the composite image. Quality of the image frames may be improved by processing the image frames to correct errors. Reflections and shadows may be detected and removed. Further, optical character recognition may be applied. As the scan progresses, a direction for capturing subsequent image frames is provided to a user as a real-time feedback. 1. A method of forming a composite image of a scene using a portable electronic device , the method comprising:capturing a stream of image frames of a scene with the portable electronic device;extracting one or more image features from image frames of the stream of image frames;associating sets of points with the one or more image features;determining correspondences between sets of points associated with the one or more image features from multiple image frames of the stream of image frames;sequentially incorporating image frames of the stream of image frames into a three dimensional point cloud based on the determined correspondences, wherein the image frames are incorporated into initial positions in the three dimensional point cloud; andadjusting the points in the point cloud based on a bundle adjustment for a plurality of the sets of points.2. The method of claim 1 , further comprising associating the sets of points with a common frame of reference.3. The method of claim 2 , wherein associating the sets of points with a common frame of reference comprises projecting the sets of points into ...

Подробнее
08-01-2015 дата публикации

APPARATUS AND METHOD FOR SCANNING AND DECODING INFORMATION IN AN IDENTIFIED LOCATION IN A DOCUMENT

Номер: US20150009542A1
Автор: Zhao Ming-Xi
Принадлежит:

A imaging scanner identifies first and second locations in a first and second captured image of a document, analyzes each character in the identified locations, and produces a first and second string, each including a character and a confidence value. The device determines that a first measurement of the confidence values in each of the first and second string is beyond a range of a first threshold. The device compares the confidence value for each character in the first string with a corresponding confidence value in the second string, selects a character from one of the first or second string with a higher confidence value; and produces a combined string including the selected characters and the confidence value associated with each selected character. 1. A method comprising:sequentially, in an imaging scanner, capturing two or more images of a single document;identifying, in the imaging scanner, a first location in a first captured image where information to be decoded is located, analyzing each character in the first location, and producing a first string including both a corresponding character and a first confidence value for each character in the first location;determining, in the imaging scanner, that a first measurement of the confidence values in the first string is beyond a range associated with a first confidence threshold and;identifying, in the imaging scanner, a second location in a second captured image where information to be decoded is located, analyzing each character in the second location, and producing a second string including both a corresponding character and a second confidence value for each character in the second location;comparing, in the imaging scanner, the first confidence value for each character in an identified location in the first string with the second confidence value for the character in the same identified location in the second string;selecting, in the imaging scanner, a character from one of the first string or the second ...

Подробнее
14-01-2016 дата публикации

DATA PROCESSING DEVICE AND SCRIPT MODEL CONSTRUCTION METHOD

Номер: US20160012040A1
Автор: Hamada Shinichiro
Принадлежит:

According to an embodiment, a data processing device includes an extractor, a generator, and a constructor. The extractor is configured to extract, from a document having been subjected to predicate argument structure analysis and anaphora resolution, an element sequence including elements each being a combination of predicate having a shared argument and case type information of the shared argument, together with the shared argument. The generator is configured to produce case example data expressed by a feature vector for each attention element which is one of the elements. The feature vector includes feature value(s) about a sub-sequence having the attention element and feature value(s) about a sequence of the shared argument corresponding to the sub-sequence. The constructor is configured to construct a script model for estimating the elements each following antecedent context by performing machine learning based on a discriminative model using the case example data. 1. A data processing device comprising:an extractor configured to extract, from a document having been subjected to predicate argument structure analysis and anaphora resolution, an element sequence in which a plurality of elements are arranged in order of appearances of predicates in the document, the elements each being a combination of the predicate having a shared argument and case type information indicating a type of a case of the shared argument, together with the shared argument;a case example generator configured to produce case example data expressed by a feature vector for each attention element, the attention element being one of the elements included in the element sequence, the feature vector including at least one of one or more feature values about a sub-sequence having the attention element as a last element of the sub-sequence in the element sequence and one or more feature values about a sequence of the shared argument corresponding to the sub-sequence; anda model constructor ...

Подробнее
11-01-2018 дата публикации

SYSTEM AND METHOD FOR MATCHING TRANSACTION ELECTRONIC DOCUMENTS TO EVIDENCING ELECTRONIC DOCUMENTS

Номер: US20180011846A1
Автор: GUZMAN Noam, SAFT Isaac
Принадлежит: Vatbox, Ltd.

A system and method for matching a second electronic document to a first electronic document, the first electronic document including at least partially unstructured data of a transaction. The method includes: analyzing the at least partially unstructured data to determine at least one transaction parameter; creating a template for the first electronic document, wherein the template is a structured dataset including the determined at least one transaction parameter; determining, based on the template, a portion of the first electronic document requiring evidence; searching, based on the template, for a second electronic document, wherein the second electronic document indicates of the evidence-requiring portion; and associating the second electronic document with the first electronic document. 1. A method for matching a second electronic document to a first electronic document , the first electronic document including at least partially unstructured data of a transaction , comprising:analyzing the at least partially unstructured data to determine at least one transaction parameter;creating a template for the first electronic document, wherein the template is a structured dataset including the determined at least one transaction parameter;determining, based on the template, a portion of the first electronic document requiring evidence;searching, based on the template, for a second electronic document, wherein the second electronic document indicates of the evidence-requiring portion; andassociating the second electronic document with the first electronic document.2. The method of claim 1 , wherein determining the at least one transaction parameter further comprises:identifying, in the first electronic document, at least one key field and at least one value;creating, based on the first electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value; andanalyzing the created dataset, wherein the at least one ...

Подробнее
14-01-2016 дата публикации

CHARACTER RECOGNITION METHOD AND CHARACTER RECOGNITION SYSTEM

Номер: US20160012288A1
Принадлежит: Glory Ltd.

A character recognition method for recognizing a character printed over a background pattern on a valuable medium includes acquiring a character image captured by capturing the character printed on the valuable medium; evaluating a degree of similarity between the character image and each template image, the template image having been obtained beforehand by capturing each character that had a possibility of having been printed on the valuable medium and of which background area is evenly filled; and determining that the character corresponding to the template image showing the highest degree of similarity at the evaluating is the character included in the character image. 1. A character recognition method for recognizing a character printed over a background pattern on a valuable medium , comprising:acquiring a character image captured by capturing the character printed on the valuable medium;evaluating a degree of similarity between the character image and each template image, the template image having been obtained beforehand by capturing each character that had a possibility of appearing on the valuable medium and of which background has been filled evenly; anddetermining that the character corresponding to the template image showing the highest degree of similarity at the evaluating is the character included in the character image.2. The character recognition method according to claim 1 , wherein the template image is an image having been obtained beforehand by separating the background area and the character by using a character-background separation threshold value determined from intensity distribution of an image having been obtained beforehand by capturing the character printed on a valuable medium claim 1 , and of which all pixel values in the background area have been filled and replaced with a predetermined pixel value determined based on distribution of the pixel values of the pixels included in the background area.3. The character recognition method ...

Подробнее
10-01-2019 дата публикации

METHOD FOR INFORMATION ASSOCIATION, ELECTRONIC BOOKMARK, AND SYSTEM FOR INFORMATION ASSOCIATION

Номер: US20190012539A1
Автор: HUANG Jiawei, Liu Chang
Принадлежит:

Disclosed are a method for information association, a system for information association, and an electronic bookmark. The method may include the follows. An electronic bookmark receives annotation information input for a book reading page covered by the electronic bookmark. The electronic bookmark acquires page information of the covered book reading page, and associates the annotation information with the page information. 1. A method for information association , comprising:receiving, by an electronic bookmark, annotation information input for a book reading page covered by the electronic bookmark;acquiring page information of the covered book reading page; andassociating the annotation information with the page information.2. The method for information association of claim 1 , wherein the acquiring page information of the covered book reading page comprises:acquiring, by receiving a page number of the book reading page input by a user, the page information of the book reading page corresponding to the page number; oracquiring, by sensing a sensing unit preset on the book reading page, the page number of the book reading page, and acquiring the page information of the book reading page according to the page number; oracquiring, through an input device electrically coupled to the electronic bookmark, the page number of the book reading page, and acquiring the page information of the book reading page according to the page number.3. The method for information association of claim 2 , wherein the receiving annotation information input for the book reading page comprises:receiving a touch input operation;acquiring position information of the touch input operation on the electronic bookmark, wherein when the electronic bookmark covers the book reading page, positions of the electronic bookmark are corresponding to positions of the book reading page, respectively; andacquiring content information input through the touch input operation, and associating the content ...

Подробнее
10-01-2019 дата публикации

Mobile terminal, image processing method, and computer-readable recording medium

Номер: US20190012560A1
Принадлежит: PFU Ltd

A mobile terminal includes a memory, and a processor coupled to the memory, wherein the processor is configured to execute first acquiring a frame obtained through photographing, second acquiring document image data of a document from the frame, first determining whether a form partial feature in a registered form and a document partial feature at a position corresponding to a position of the partial feature match, the document partial feature being in the document, and third acquiring a frame obtained through re-photographing when it is determined that the form partial feature and the document partial feature do not match.

Подробнее
14-01-2021 дата публикации

Systems and Methods For Automatic Data Extraction From Document Images

Номер: US20210012102A1
Принадлежит:

Described systems and methods allow the automatic extraction of structured information from images of structured text documents such as invoices and receipts. Some embodiments employ optical character recognition (OCR) technology to extract individual text tokens (e.g., words) and token bounding boxes from a document image. A feature vector of each text token comprises a first part determined according to a character content of the text token, and a second part determined according to an image content of the token's bounding box. A neural network classifier produces a label indicative of a type of information (e.g. “billing address”, “due date”, etc.) carried by each text token. In some embodiments, documents are linearized by ordering text tokens in a sequence according to a reading order of a natural language (e.g., English, Arabic) in which the respective document is formulated. Token feature vectors are fed to the classifier in the order indicated by the token sequence. 1. A method comprising employing at least one hardware processor of a computer system to:receive a text token extracted from a document image, the text token comprising a sequence of characters, the document image comprising an encoding of an image of a structured paper document, the structured paper document partitioned into a plurality of fields and having a plurality of text tokens distributed among the plurality of fields, each field of the plurality of fields having a distinct field type characterizing a distinct category of information represented by text tokens located within the each field;receive a token box indicator comprising an indicator of a polygon enclosing a region of the document image, the region containing an image of the text token;determine a text feature vector characterizing the text token as a whole, the text feature vector determined according to the character sequence;determine an image feature vector characterizing the image of the text token as a whole, the image ...

Подробнее
14-01-2021 дата публикации

SYSTEMS AND METHODS FOR INFORMATION EXTRACTION FROM TEXT DOCUMENTS WITH SPATIAL CONTEXT

Номер: US20210012103A1
Принадлежит: American International Group, Inc.,

Performing information extraction from an electronic document is disclosed. A method comprises: receiving a semi-structured input document; retrieving an entity model that provides one or more domain variable definitions for one or more domain variables, wherein the entity model and the input document correspond to a common domain; determining that the input document includes an entity that satisfies a first domain variable definition corresponding to a first domain variable; retrieving a relational model that provides, for the first domain variable, one or more relational definitions comprising spatial restrictions for one or more values corresponding to the first domain variable; extracting one or more data elements from the input document that satisfy the one or more relational definitions; and generating an information graph having a structured data format, wherein the one or more data elements extracted from the input document correspond to the first domain variable in the structured data format. 1. A method for information extraction from an electronic document , the method comprising:receiving, by a processor, an input document, wherein the input document is a semi-structured document;retrieving, by the processor, an entity model from an entity model storage, wherein the entity model provides one or more domain variable definitions for one or more domain variables, wherein the entity model and the input document correspond to a common domain;determining, by the processor, that the input document includes an entity that satisfies a first domain variable definition corresponding to a first domain variable;retrieving, by the processor, a relational model from a relational model storage, wherein the relational model provides, for the first domain variable, one or more relational definitions for one or more values corresponding to the first domain variable, wherein the one or more relational definitions for the one or more values corresponding to the first domain ...

Подробнее
14-01-2021 дата публикации

IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND RECORDING MEDIUM

Номер: US20210012104A1
Принадлежит: NEC Corporation

An image processing device includes: an identifying unit that identifies a plurality of character strings that are candidates for a recording character string among a plurality of character strings acquired by recognizing characters included in a document image; an output unit that outputs a checking screen that represents positions of the plurality of character strings; and a feature quantity extracting unit that extracts a feature quantity of a character string corresponding to a position identified by a user on the checking screen as a feature quantity of the recording character string. 1. An image processing device comprising:a memory configured to store instructions; identify a plurality of character strings that are candidates for a recording character string among a plurality of character strings acquired by recognizing characters included in a document image;', 'cause a checking screen to be output, the checking screen representing positions of the plurality of character strings; and', 'extract a feature quantity of a character string corresponding to a position identified by a user on the checking screen as a feature quantity of the recording character string., 'a processor configured to execute the instructions to2. The image processing device according to claim 1 , wherein the checking screen represents positions of the plurality of character strings by displaying the document image to which a display for identifying the plurality of character strings is added.3. The image processing device according to claim 2 , wherein extracting the feature quantity comprises extracting the feature quantity of the character string corresponding to the position identified by the user on the document image to which the display for identifying the plurality of character strings is added.4. The image processing device according to claim 1 ,wherein the processor is configured to execute the instructions to: extract, by using the feature quantity of the recording character ...

Подробнее
14-01-2021 дата публикации

CONTEXTUAL INFORMATION INSERTION IN CONTEXT WITH CONTENT

Номер: US20210012390A1
Принадлежит:

Context associated with content may be received. An additional content may be generated for insertion into the content. The additional content can be created to be within the context associated with the content and based on a likely responsiveness of the user to the additional content, the additional content referring to an item. The additional content preserves continuity and/or semantics in the context of the content. 1. A computer-implemented method comprising:receiving context associated with content for presenting to a user;creating an additional content to insert into the content, the additional content created to be inserted within the context associated with the content and based on a likely responsiveness of the user to the additional content, the additional content referring to an item,wherein the additional content preserves continuity in the context of the content.2. The computer-implemented method of claim 1 , wherein the item includes at least a product.3. The computer-implemented method of claim 1 , wherein the item includes at least a service.4. The computer-implemented method of claim 1 , wherein the additional content is created based at least on the context associated with the content and a characteristic associated with the user.5. The computer-implemented method of claim 1 , wherein the content includes at least an audio content claim 1 , and the method further comprises causing a change in cadence of reading the audio content during a period of time the additional content is read.6. The computer-implemented method of claim 1 , wherein the content includes at least a visual content claim 1 , and the method further comprises causing the additional content to be highlighted from rest of the content in presenting the visual content.7. The computer-implemented method of claim 1 , further comprising identifying a location in the content to insert the additional content claim 1 , the location identified based at least on the likely responsiveness of ...

Подробнее
12-01-2017 дата публикации

IMAGE PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM

Номер: US20170013163A1
Автор: Yamada Masahiro
Принадлежит: FUJI XEROX CO., LTD.

In an information processing device, a reading unit reads in a filled-in document. A recognition unit recognizes a standard document type and an information structure of the document. A storage unit stores content filled in on the document. If the read-in document is not a most recent version, a comparing unit compares the information structure of the most recent version of a preregistered standard document type corresponding to the recognized standard document type to the recognized information structure. A generation unit generates a differential document of the modified information structure, according to a result of the comparison. An output unit outputs the generated differential document. If a filled-in differential document is read in, a merge unit merges filled-in content on the differential document with filled-in content on the stored document, and the storage unit replaces the already-stored content with the merged content. 1. An information processing device comprising:a reading unit that reads in a filled-in document;a recognition unit that recognizes a standard document type and an information structure of the read-in document;a storage unit that stores content filled in on the read-in document;a comparison unit that, if the read-in document is not a most recent version, compares an information structure of the most recent version of a preregistered standard document type corresponding to the recognized standard document type to the recognized information structure;a generation unit that generates a differential document corresponding to a modification between the information structure of the most recent version and the recognized information structure, according to a result of the comparison;an output unit that outputs the generated differential document; anda merge unit that, if the differential document filled in by a user is read in, merges filled-in content on the differential document with the filled-in content on the read-in document stored in ...

Подробнее
09-01-2020 дата публикации

SYSTEM FOR REAL-TIME EXPRESSION OF SEMANTIC MIND MAP, AND OPERATION METHOD THEREFOR

Номер: US20200012722A1
Автор: GENG YIBING

Disclosed is a system for real-time expression of a semantic mind map and its operation method there for. The system includes an association matrix and a focus associated operation module, the association matrix is connected to the focus associated operation module. The association matrix includes a start node, a direct associated module, an indirect associated module, a weakly associated module, a superscript module, or the like. The focus associated operation module includes a focused node and focus associated nodes, or the like. When the present disclosure is applied to a search engine including a cross-database search engine, a search result service interface is in real time expanded, thus being used to help a user better identify and discover relevant documents of interest. 1. A system for real-time expression of a semantic mind map , comprising:an association matrix, connected to a focus associated operation module, wherein the association matrix comprises a start node, a direct associated module, an indirect associated module, a weakly associated module, and a superscript module, whereinthe start node, connected to the direct associated module, and configured to mark a first left column of the association matrix on a display interface, the start node is 1 to M featured text words, an initial value is featured user query words, and M is a natural number;the direct associated module, connected to the indirect associated module, and configured to mark a node in subsequent right columns of the start node in the association matrix which has a text in-sentence co-occurrence relationship with featured text words;the indirect associated module, connected to the weakly associated module, and configured to mark a node in subsequent right columns of a direct associated node in the association matrix which has a text in-sentence co-occurrence relationship with a featured text word of a previous direct associated node but not previously marked;the weakly associated module ...

Подробнее
11-01-2018 дата публикации

METHOD AND DEVICE FOR PROVIDING A TRUSTED ENVIRONMENT FOR EXECUTING AN ANALOGUE-DIGITAL SIGNATURE

Номер: US20180013563A1
Принадлежит:

The invention relates to the field of providing a trusted environment for executing an analogue-digital signature. The claimed document-signing device in the form of a stylus includes a protective compartment, in which the following are disposed: a microcontroller with a programme code; a memory with a secret digital signature key; and additionally inertial sensors, which are connected to the microcontroller; a lens; and a camera, which is also connected to the microcontroller. A wireless interface is used in order to communicate with a computer. The inertial sensors serve to verify the handwritten signature of the user, while the lens and camera serve to carry out a comparison with the text of an electronic document uploaded via the wireless interface. In this way it is ensured that verified information enters the trusted environment of the stylus. 114.-. (canceled)16. The method of claim 15 , wherein the second comparing is made by comparing of the stylus movement data with the digitized handwritten signature video on a common timeline.17. The method of claim 15 , wherein the second comparing renders the positive outcome if the stylus movement data and the digitized handwritten signature video match with an error being within a predetermined error margin.18. The method according to claim 15 , wherein the first comparing is performed by overlaying the digitized text of the electronic document over image frame data claim 15 , taking into account respective locations in the document image window of the digitized text of the electronic document and of a text in the image frame data.19. The method according to claim 15 , wherein requesting to display the image of the electronic document on the computer screen and controlling of the camera and the lens for capturing the image at the computer screen is made by the same computer-executable program code.20. The method according to claim 19 , further comprising claim 19 , at the microcontroller claim 19 , identifying ...

Подробнее
14-01-2016 дата публикации

Electronic Document Generation System, Electronic Document Generation Apparatus, and Recording Medium

Номер: US20160014299A1
Автор: SAKA Masaaki
Принадлежит:

An electronic document generation apparatus extracts a processing target area including a row area from a scanned image of an original document and detects the dimensions of the row area, the row area being an area of either a whole or partial range of a row of character string arranged in one direction in the scanned image. The apparatus determines an arrangement-direction character size on the basis of the dimensions of the row area, and sends out image data of the processing target area and an instruction to perform OCR processing on the processing target area to an external apparatus. The apparatus then receives a processing result of the OCR processing from the external apparatus, and arranges a character string of the processing result in the electronic document on the basis of the arrangement-direction character size to generate an electronic document. 1. An electronic document generation system comprising:a first apparatus configured to generate an electronic document on the basis of a scanned image of an original document; anda second apparatus configured to execute optical character recognition processing on the scanned image upon a request received from the first apparatus and send out a processing result of the optical character recognition processing to the first apparatus,the first apparatus including:an extraction unit configured to extract a processing target area from the scanned image, the processing target area including a row area that is an area of a whole or partial range of a row of character string arranged in one direction in the scanned image;a detection unit configured to detect dimensions of the row area;a determination unit configured to determine an arrangement-direction character size on the basis of the dimensions of the row area, the arrangement-direction character size being a character size of characters in the processing target area and being a character size in an arrangement direction of the row of character string;an ...

Подробнее
03-02-2022 дата публикации

AUTOMATED DOCUMENT TAGGING IN A DIGITAL MANAGEMENT PLATFORM

Номер: US20220035990A1
Принадлежит:

An auto-tagging engine receives a training set of data comprising documents including a set of tagged fields with each tagged field corresponding to a portion of the document. The auto-tagging engine trains a machine learned model using the training set of data. The trained machine learned model, when applied to a target document in a document management environment, identifies portions of the target document each corresponding to fields of the target document. For each field of the target document, the auto-tagging engine identifies text of the target document associated with the identified potions of the target document corresponding to fields. Natural language processing is performed on the identified text in order to identify field types for the fields. The target document is automatically modified to include a tag identifying the portion of the target document corresponding to each field and identifying a field type of the field. 1. A method for automatically tagging fields of a target document , comprising:accessing a training set of data comprising documents each with a set of tagged fields within the document, each tagged field corresponding to a portion of the document;training a machine learned model using the training set of data, the machine learned model configured to identify, for each of one or more fields within a document, a portion within the document corresponding to the field;applying the machine learned model to the target document to identify portions of the target document corresponding to the fields of the target document; and identifying text of the target document associated with the identified portion of the target document corresponding to the field;', 'performing natural language processing on the identified text to identify a field type of the field; and', 'automatically modifying viewable content of the target document to include a tag identifying the portion of the target document corresponding to the field and the identified field ...

Подробнее
03-02-2022 дата публикации

AUTOMATED DOCUMENT HIGHLIGHTING IN A DIGITAL MANAGEMENT PLATFORM

Номер: US20220035993A1
Принадлежит:

A highlighting engine modifies a target document by identifying and highlighting a set of text passages. The highlighting engine receives a training set of data including documents that each include a set of highlighted text passages. The highlighting engine trains a machine learned model using the training set of data. The trained machine learned model, when applied to subsequent identified candidate sets of text passages within the target document, identifies the set of text passages to highlight. The highlighting engine modifies the target document with the highlighted set of text passages and provides the modified target document for display via an interface. The highlighted set of text passages enable a user to quickly read and understand the target document. 1. A method of automatically highlighting text passages within a target document comprising:accessing a training set of data comprising documents each with a set of highlighted text passages;training a machine learned model using the training set of data, the machine learned model configured to identify a set of text passages of a document to highlight; converting text of the target document into word vectors,', 'generating a matrix representation of the target document, each row of the matrix representation corresponding to a text passage of the target document, the values of each row corresponding to the word vectors, and', 'identifying a threshold portion of the text passages of the target document that are most relevant to the target document using an unweighted graph generated based on the generated matrix representation;, 'identifying a first candidate set of text passages of the target document byidentifying a second candidate set of text passages within the target document based on characteristics of text of the second candidate set of text passages;identifying a third candidate set of text passages within the target document based on feedback from a user;identifying a target set of text passages ...

Подробнее
03-02-2022 дата публикации

Document information extraction for computer manipulation

Номер: US20220036063A1
Принадлежит: Intuit Inc

Systems and apparatuses are disclosed for extracting information from document images. An example method includes segmenting a document image into multiple segments and determining formatting information for each segment. Determining formatting information for a segment includes determining one or more features of the segment and comparing the one or more features of the segment to one or more clusters of features associated with different document types. The formatting information for the segment is based on the comparison. The method also includes, for each segment, storing the formatting information in a data structure associated with the segment. The method further includes, for each segment including text to be identified during information extraction, applying OCR to the segment to generate machine-encoded text and storing the machine-encoded text in the associated data structure.

Подробнее
19-01-2017 дата публикации

CHARACTER SEGMENTING APPARATUS, CHARACTER RECOGNITION APPARATUS, AND CHARACTER SEGMENTING METHOD

Номер: US20170017836A1
Автор: Nakamura Hiroshi
Принадлежит:

A character segmenting apparatus may include a character segmenting position detecting unit to detect a segmenting position of characters. The character segmenting position detecting unit may include an area setting unit to set up an area for detecting the segmenting position; a projection creating unit configured to create a projection of pixel values, with respect to pixels arranged in a character placement direction, at least in a specified area set up by the area setting unit; a binarizing threshold value obtaining unit to calculate a moving average on the basis of minimum pixel values of the projection, in order to specify the moving average as a binarizing threshold value for the specified area; and a position detecting unit to calculate a character segmenting position on the basis of the binarizing threshold value. 1. A character segmenting apparatus for segmenting each character out of a character string , by processing an image datum obtained by way of imaging the character string positioned on a recording medium , the character segmenting apparatus comprising:a character segmenting position detecting unit configured to detect a segmenting position of characters constituting the character string;wherein, the character segmenting position detecting unit comprises:an area setting unit configured to set up an area for detecting the segmenting position of characters;a projection creating unit configured to create a projection of pixel values, with respect to pixels arranged in a character placement direction in which the characters are placed, at least in a specified area set up by the area setting unit;a binarizing threshold value obtaining unit configured to calculate a moving_average on the basis of minimum pixel values of the projection, in order to specify the moving average as a binarizing threshold value for the specified area; anda position detecting unit configured to calculate a character segmenting position on the basis of the binarizing threshold ...

Подробнее
19-01-2017 дата публикации

HANDHELD DEVICE FOR CAPTURING TEXT FROM BOTH A DOCUMENT PRINTED ON PAPER AND A DOCUMENT DISPLAYED ON A DYNAMIC DISPLAY DEVICE

Номер: US20170017837A1
Принадлежит:

A device for capturing rendered text is described. The device incorporates one or more visual sensors that receive visual information as a part of capturing rendered text. The visual sensors are collectively capable of capturing both text that is permanently printed on a page, and text that is displayed transitorily on a dynamic device. The device further incorporates a visual information disposition subsystem for disposing of visual information received by the visual sensors. The device further incorporates a package that bears the visual sensors and the visual information disposition subsystem, and is suitable to be held in a human hand. 120-. (canceled)21. A computer-implemented method comprising:receiving an image of at least a portion of a rendered document;determining word lengths of words from the image of the at least a portion of the rendered document;generating a code based on the determined word lengths of the words from the image and a sequence of the determined word lengths that corresponds to a sequence of words from the image;searching an index of electronic documents using the generated code, wherein the index of electronic documents includes data indexing the electronic documents based on word lengths of words included in the electronic documents; andidentifying, based on the generated code, an electronic document indexed in the index and that corresponds to the rendered document.22. The method of claim 21 , wherein the image of the at least a portion of the rendered document is captured at a handheld user device that includes an integrated visual sensor and an integrated visual display.23. The method of claim 22 , further comprising providing the handheld user device with access to the identified electronic document that corresponds to the rendered document.24. The method of claim 21 , wherein determining word lengths of words from the image is done without resolving individual characters of the words.25. The method of claim 21 , wherein ...

Подробнее
18-01-2018 дата публикации

SYSTEM AND METHOD FOR MONITORING ELECTRONIC DOCUMENTS

Номер: US20180018312A1
Автор: GUZMAN Noam, SAFT Isaac
Принадлежит: Vatbox, Ltd.

A system and method for monitoring electronic documents. The method includes analyzing a first electronic document to determine at least one transaction parameter, wherein the first electronic document includes at least partially unstructured data; creating a template for the first electronic document, wherein the created template is a structured dataset including the determined at least one transaction parameter; and comparing data of the created template to data associated with a plurality of second electronic documents to identify at least one abnormality in the first electronic document. 1. A method for monitoring electronic documents , comprising:analyzing a first electronic document to determine at least one transaction parameter, wherein the first electronic document includes at least partially unstructured data;creating a template for the first electronic document, wherein the created template is a structured dataset including the determined at least one transaction parameter; andcomparing data of the created template to data associated with a plurality of second electronic documents to identify at least one abnormality in the first electronic document.2. The method of claim 1 , wherein determining the at least one transaction parameter further comprises:identifying, in the first electronic document, at least one key field and at least one value;creating, based on the first electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value; andanalyzing the created dataset, wherein the at least one transaction parameter is determined based on the analysis.3. The method of claim 2 , wherein identifying the at least one key field and the at least one value further comprises:analyzing the first electronic document to determine data in the first electronic document; andextracting, based on a predetermined list of key fields, at least a portion of the determined data, wherein the at least a portion of the ...

Подробнее
18-01-2018 дата публикации

SYSTEM AND METHOD FOR IMPROVED ANALYSIS OF TRAVEL-INDICATING UNSTRUCTURED ELECTRONIC DOCUMENTS

Номер: US20180018338A1
Автор: GUZMAN Noam, SAFT Isaac
Принадлежит: Vatbox, Ltd.

A system and method for refund analysis of travel-indicating unstructured electronic documents. The method includes determining, based on data of a first electronic document, a mileage value-added tax (VAT) refund amount, wherein the first electronic document indicates at least one travel transaction; analyzing at least one second electronic document to determine at least one transaction parameter of each second electronic document, wherein each second electronic document includes at least partially unstructured data; creating a template for each of the at least one second electronic document, wherein each template is a structured dataset including the at least one transaction parameter determined for the respective electronic document; determining, based on the created at least one template, a fuel VAT refund amount; and determining, based on the mileage VAT refund amount and the fuel VAT refund amount, an entitled VAT refund amount. 1. A method for improved analysis of travel-indicating unstructured electronic documents , comprising:determining, based on data of a first electronic document, a mileage value-added tax (VAT) refund amount, wherein the first electronic document indicates at least one travel transaction;analyzing at least one second electronic document to determine at least one transaction parameter of each second electronic document, wherein each second electronic document includes at least partially unstructured data;creating a template for each of the at least one second electronic document, wherein each template is a structured dataset including the at least one transaction parameter determined for the respective electronic document;determining, based on the created at least one template, a fuel VAT refund amount; anddetermining, based on the mileage VAT refund amount and the fuel VAT refund amount, an entitled VAT refund amount.2. The method of claim 1 , wherein determining the at least one transaction parameter for an electronic document further ...

Подробнее
18-01-2018 дата публикации

METHOD, APPARATUS, SYSTEM, AND STORAGE MEDIUM FOR DETECTING INFORMATION CARD IN IMAGE

Номер: US20180018512A1
Автор: LI Jilin, NI Hui, WANG Chengjie
Принадлежит:

Method, apparatus, system, and storage medium for detecting an information card in an image are provided. The method includes performing a line detection to obtain two endpoints of a line segment corresponding to each of four sides of the information card; generating, a linear equation of the side; obtaining coordinates of four intersection points of the four sides of the information card; mapping the coordinates of the four intersection points to four corners of a rectangular box of the information card, to obtain a perspective transformation matrix; performing perspective transformation on image content encircled by four straight lines represented by the four linear equations to provide transformed image content; forming a gradient template according to a layout of information content on the information card; and using the gradient template to match with the transformed image content and determining whether the image content is a correct information card. 1. A method for detecting an information card in an image , comprising:performing a line detection in an information card image, to obtain two endpoints of a line segment corresponding to each of four sides of the information card;generating, according to the two endpoints of the line segment corresponding to each side, a linear equation corresponding to the side;obtaining coordinates of four intersection points according to the linear equations corresponding to the four sides of the information card;mapping the coordinates of the four intersection points to four corners of a rectangular box of the information card, to obtain a perspective transformation matrix;performing perspective transformation on image content encircled by four straight lines represented by the four linear equations according to the perspective transformation matrix to provide transformed image content;forming a gradient template according to a layout of information content on the information card; andusing the gradient template to match ...

Подробнее
17-01-2019 дата публикации

AUTOMATIC CLAIM-WRITING DEVICE

Номер: US20190018825A1
Автор: Tsai Hong-Shin
Принадлежит: INTEGRAL SEARCH INTERNATIONAL LIMITED

The invention provides an automatic claim-writing device which automatically prompts an organization word block in a claim to be generated or automatically generates an organization word block in the claim to be generated, the automatic claim-writing device comprising: an organization word block type determining module, an interrelationship sub-type determining module, a statistic module and a generating module, the organization word block type determining module determining an organization word block type of existing organization word blocks, the interrelationship sub-type determining module determining an interrelationship sub-type of existing organization word blocks which belongs to the interrelationship type, the generating module automatically prompting or automatically generating the organization word block in a claim to be generated. 1. An automatic claim-writing device which automatically prompts an organization word block in a claim to be generated or automatically generates an organization word block in the claim to be generated , the automatic claim-writing device comprising:an organization word block type determining module which determines an organization word block type of existing organization word blocks according to part-of-speech of words belonging to the existing organization word blocks, wherein the existing organization word blocks are read from a plurality of existing claims, and the existing organization word block type is one selected from a group comprising a subject element noun type, an interrelationship type and an object element noun type;an interrelationship sub-type determining module which determines an interrelationship sub-type of existing organization word blocks which belongs to the interrelationship type, the existing organization word block being read from a plurality of existing claims;a statistic module which statistically counts the occurrence times of organization word position information which belongs to the existing ...

Подробнее
17-01-2019 дата публикации

SIMULATING IMAGE CAPTURE

Номер: US20190019021A1
Принадлежит:

The present disclosure relates to simulating the capture of images. In some embodiments, a document and a camera are simulated using a three-dimensional modeling engine. In certain embodiments, a plurality of images are captured of the simulated document from a perspective of the simulated camera, each of the plurality of images being captured under a different set of simulated circumstances within the three-dimensional modeling engine. In some embodiments, a model is trained based at least on the plurality of images which determines at least a first technique for adjusting a set of parameters in a separate image to prepare the separate image for optical character recognition (OCR). 1. A computer-implemented method for simulating the capture of images , comprising:simulating a document and a camera using a three-dimensional modeling engine;capturing a plurality of images of the simulated document from a perspective of the simulated camera, each of the plurality of images being captured under a different set of simulated circumstances within the three-dimensional modeling engine;training a model based at least on the plurality of images, wherein the trained model determines at least a first technique for adjusting a set of parameters in a separate image to prepare the separate image for optical character recognition (OCR).2. The computer-implemented method of claim 1 , wherein the simulated circumstances include at least one of: lighting; background; and camera pose.3. The computer-implemented method of claim 2 , wherein the camera pose includes yaw claim 2 , pitch claim 2 , roll claim 2 , and height.4. The computer-implemented method of claim 1 , further comprising:determining, based on the trained model, whether a quality of the separate image can be improved to an acceptable level for the OCR, wherein the quality of the separate image is based on one or more of the set of parameters.5. The computer-implemented method of claim 4 , wherein determining whether the ...

Подробнее
17-01-2019 дата публикации

Character/graphics recognition device, character/graphics recognition method, and character/graphics recognition program

Номер: US20190019049A1

The controller applies a lighting pattern to the illumination unit and controls a timing to capture the image by the imaging unit, a lighting pattern being a combination of turning on and off of the plurality of illumination lamps.

Подробнее
16-01-2020 дата публикации

Connecting a Printed Document to Related Digital Content

Номер: US20200019575A1
Принадлежит:

Aspects described herein relate to a computer device connecting a physical printed document to relevant contextual digital content through a visual capture of one or more sequential ordering systems in the printed document. When the computer device detects when a screen icon is aligned with a printed icon on the printed document, the computer device analyzes visual input from its scanning device to identify both the document and the page/section number being scanned and subsequently delivers relevant, contextual digital content to provide an extended experience of the printed document through supplemental textual, visual, and/or audio content presented on the computer device. 1. A method for obtaining digital content associated with printed content of a document by a computer device , the method comprising:capturing, through a scanning device, a visual image of the printed content, wherein the visual image includes a page indicium and a printed icon;generating, by the computer device, a screen icon on a display device of the computer device;detecting when a screen icon aligns with the printed icon;in response to the detecting, locating the page indicium, wherein the page indicium is located at a predetermined position relative to the printed icon;translating the page indicium to a page number, wherein the page number has a format readable by the computer device;accessing, from at least one data structure, digital content associated with the printed content at the page number; anddisplaying, on the display device, the digital content associated with the printed content.2. The method of claim 1 , wherein the printed icon comprises a unique character and the method further comprising:identifying, from the unique character, the document from a plurality of documents.3. The method of further comprising:transforming the screen icon to match the unique character of the printed icon.4. The method of claim 1 , wherein the accessing further comprises:obtaining a content ...

Подробнее
21-01-2021 дата публикации

Correction Techniques of Overlapping Digital Glyphs

Номер: US20210019365A1
Автор: Arora Aman, Mangla Pooja
Принадлежит: Adobe Inc.

Digital glyph overlap correction system implemented as part of a computing device is described. The system is configured to improve detection and correction of overlaps of digital glyphs by detecting on overlap of digital glyphs within a digital document, determining a glyph property causing the overlap, determining a change to the parameter of the glyph property that causes the overlap, generating a correction for the overlap based on the change to the parameter, and rendering the digital document as having the correction. The digital glyph overlap correction system corrects or facilitates correction of the overlap in an efficient and seamless manner, thereby improving the aesthetic appeal of content within the digital document. 1. In a digital content generating environment , a digital glyph overlap correction method , the method implemented by a computing device , the method comprising:detecting, by the computing device, an overlap of digital glyphs in a digital document;determining, by the computing device, which glyph property of a plurality of glyph properties cause the overlap of the digital glyphs in the digital document;determining, by the computing device, a change to a parameter of the determined glyph property that corrects the overlap;correcting, by the computing device, the overlap of the digital glyphs by making the change to the parameter of the determined glyph property; andrendering, by the computing device, the digital document as having the correction in a user interface.2. The method as described in claim 1 , wherein the detecting the overlap of digital glyphs includes:selecting a seek-overlap feature for the digital document; andapplying the seek-overlap feature to the digital document.3. The method as described in claim 1 , further comprising:generating, automatically, an indication of the overlap of digital glyphs in a user interface of the computing device.4. The method as described in claim 1 , wherein detecting the overlap of digital ...

Подробнее
16-01-2020 дата публикации

Providing Nutritional Information From Recipe Images

Номер: US20200019597A1
Автор: Leeser Kenneth
Принадлежит:

Systems and methods for providing nutritional information from recipe images are provided. The system comprises one or more of: a database of recipe disambiguation data, a processor, a database of nutritional information, a memory, an input/output for communicating with external databases and with user devices, and computer-readable instructions for carrying out the inventive methods. The present invention improves on current art for providing nutritional information from recipe images. The present invention solves technical problems that exist in assessing nutritional information of a recipe, by providing for user input of recipe content from any image, optical character recognition of recipe content from any image, and disambiguation of language used in recipes. 1. A method , stored in non-transitory computer-readable media , for a system providing nutritional information from recipe images , by acquiring images with a system image acquisition method module , assessing images with a system image assessment method module , and assessing nutritional information with a system nutritional assessment method module , the method comprising:the system acquires an image, and displays the image to a user device used by a user; thenthe system carries out first a marking step, in which the system accepts some marking of an area of the image; thenthe system performs OCR on the relevant area, generating a plurality of ingredients as recipe information; thenthe system performs proofing on the recipe information; thenthe system presents the recipe information to the user device; and thenthe system accepts edits to the recipe information from the user device.2. The method of claim 1 , in which the system conducts conversions to change the amounts of each ingredient in the plurality of ingredients between one or more systems of measurement.3. The method of claim 1 , in which the system disambiguates amounts which are relative terms.4. The method of claim 1 , in which the user ...

Подробнее
16-01-2020 дата публикации

VECTORIZATION OF DOCUMENTS

Номер: US20200019618A1
Принадлежит:

Embodiments of the invention include method, systems and computer program products for document vectorization. Aspects include receiving, by a processor, a plurality of documents each having a plurality of word. The processor utilizing a vector embeddings engine generates a vector to represent each of the plurality of words in the plurality of documents. An image representation for each document in the plurality of documents is created and a word probability for each of the plurality of words in the plurality of documents is generated. A position for each word probability is determined in the image based on the vector associated with each word and a compression operation on the images is performed to produce a compact representation for the plurality of documents. 1. A computer-implemented method for document vectorization , the method comprising:receiving, by a processor, a plurality of documents each having a plurality of words;generating, by the processor utilizing a vector embeddings engine, a vector to represent each of the plurality of words in the plurality of documents;creating an image representation for each document in the plurality of documents;generating a word probability for each of the plurality of words in the plurality of documents;determining a position for each word probability in the image based on the vector associated with each word; andperforming a compression operation on the images to produce a compact representation for the plurality of documents.2. The computer-implemented method of further comprising removing stop words from each of the plurality of documents prior to producing the plurality of vectors.3. The computer-implemented method of claim 1 , wherein each vector corresponds to an encoded representation of a word within the plurality of documents.4. The computer-implemented method of claim 1 , wherein the compression operation is performed by a convolutional auto-encoder.5. The computer-implemented method of claim 1 , wherein ...

Подробнее
21-01-2021 дата публикации

Systems and methods for extracting data from an image

Номер: US20210019511A1
Принадлежит: SAP SE

Embodiments of the present disclosure pertain to systems and method for extracting data from an image. In one embodiment, a method of extracting data from an image comprises receiving, from an optical character recognition (OCR) system, OCR text in response to sending an image to the OCR system. The OCR text comprises a plurality of lines of text. Each line of text is classified as either a line item or not a line item using a machine learning algorithm, and a plurality of data fields are extracted from each line of text classified as a line item.

Подробнее
21-01-2021 дата публикации

SYSTEMS AND METHODS FOR OBTAINING PRODUCT INFORMATION IN REAL-TIME

Номер: US20210019514A1
Автор: MOGHTADAI Mehran
Принадлежит: The Toronto-Dominion Bank

A processor-implemented method is disclosed. The method includes: receiving, from a first client device, a signal representing image data depicting at least one first document containing a product specification for a first product; performing text recognition on the image data to identify text in the at least one first document; determining at least one first value associated with the first product based on the recognized text; identifying a second product based on determining that product specification for the second product satisfies one or more predetermined criteria relating to the at least one first value; determining at least one second value associated with the second product; generating first display data including a graphical representation of the at least one second value; and transmitting, to the first client device via a communications module, a signal representing the first display data. 1. A computing system , comprising:a communications module communicable with an external network;a memory; and receive, from a first client device, a signal representing image data depicting at least one first document containing a product specification for a first product;', 'perform text recognition on the image data to identify text in the at least one first document;', 'determine at least one first value associated with the first product based on the recognized text;', 'identify a second product based on determining that product specification for the second product satisfies one or more predetermined criteria relating to the at least one first value;', 'determine at least one second value associated with the second product;', 'generate first display data including a graphical representation of the at least one second value; and', 'transmit, to the first client device via the communications module, a signal representing the first display data., 'a processor coupled to the communications module and the memory, the processor being configured to2. The computing system ...

Подробнее
21-01-2021 дата публикации

Enhanced Item Validation and Image Evaluation System

Номер: US20210019515A1
Принадлежит:

Systems for item validation and image evaluation are provided. In some examples, a system may receive an instrument and associated data. The instrument may be received and at least one of a bill pay profile and a user profile may be retrieved. The bill pay profile and user profile may each include a plurality of previously processed instruments that have been determined to be valid and/or authentic. The instrument may be compared to the plurality of previously processed instruments to determine whether one or more elements of the instrument being evaluated match one or more corresponding elements of the plurality of previously processed instruments. Matching or non-matching elements may be identified. In some examples, one or more user interfaces may be generated displaying the instruments and including any highlighting or enhancements identifying matching or non-matching elements. 1. A system , comprising:a user device;an image processing computing device; a processor;', 'a communication interface communicatively coupled to the processor; and', 'memory storing computer-readable instructions that, when executed by the processor, cause the computing device to:', 'identify, from instrument data extracted from a received image of a first instrument and data associated with the image of the first instrument, an identifier, wherein the image is received from the image processing computing device;', 'retrieve, based on a match between the identifier and a predefined criterion, a payment profile associated with a provider of the first instrument, the payment profile including images of a plurality of previously processed instruments;', 'compare the received image of the first instrument to the of images of the plurality of previously processed instruments in the payment profile;', 'identify, based on the comparing, at least one of: an element of the first instrument that matches a corresponding element from at least one instrument of the plurality of previously processed ...

Подробнее
16-01-2020 дата публикации

MULTI-MODAL ELECTRONIC DOCUMENT CLASSIFICATION

Номер: US20200019769A1
Автор: Bali Adam, LEIBOVITZ Guy
Принадлежит:

A method comprising operating at least one hardware processor for: receiving, as input, a plurality of electronic documents, training a machine learning classifier based, at least on part, on a training set comprising: (i) labels associated with the electronic documents, (ii) raw text from each of said plurality of electronic documents, and (iii) a rasterized version of each of said plurality of electronic documents, and applying said machine learning classifier to classify one or more new electronic documents. 1. A method comprising:receiving, at a computer, an electronic document on which to train a machine learning classifier;applying, by the computer, a first neural network to raw text extracted from the electronic document to determine a textual data representation of the electronic document;applying, by the computer, a second neural network to a raster image extracted from the electronic document to determine a visual data representation of the electronic document;generating, by the computer, a fusion representation based on the textual data representation and the visual data representation of the electronic document; andapplying, by the computer, the machine learning classifier based on the fusion representation to classify one or more new electronic documents.2. The method of claim 1 , wherein the generating is further based on a label associated with the electronic document claim 1 , the label denoting a document category.3. The method of claim 1 , wherein the applying the first neural network further comprises:generating, by the computer, the textual data representation of said extracted text as a fixed length vector.4. The method of claim 1 , wherein the generating further comprises:generating, by the computer, the fusion representation based on a correlation between the textual data representation and the visual data representation, the textual data representation, and the visual data representation.5. The method of claim 1 , wherein the first neural ...

Подробнее
16-01-2020 дата публикации

AUTOMATIC NOTE REFINEMENT, DATA CAPTURE, AND EXPORT

Номер: US20200019771A1
Принадлежит:

A system for capturing, using a digital camera, a Scrum board note and processing the image of the note to extract textual content of the note for use by a collaboration and project management application program. The processing of the image includes removing image imperfections and formatting the image such that it is optimized for scanning, thereby reducing spelling or scanning errors. 1. A system for capturing textual content of notecards , the system comprising:a processor;a shared storage device in communication with the processor; receive a digital image that comprises a notecard;', 'store the digital image in the shared storage device;', 'process the image to improve the textual content present in the digital image, where the textual content relates to one or more aspects of a software development cycle; and', 'process the image to extract the textual content and store it as a text file., 'a memory in electronic communication with the processor, the memory comprising software instructions, which, when executed, cause the processor to2. The system of claim 1 , the system further comprising software instructions that cause the processor to transmit the text file to a collaboration and project management software application.3. The system of claim 1 , wherein the digital image received by the processor is transmitted via email.4. The system of claim 1 , wherein processing the digital image to improve the textual content present in the digital image comprises:aligning the image such that text content found in the image is oriented primarily parallel with regard to a boundary of the image;converting the image into a black and white format; andstoring the converted image in a storage format that differs from that of its original storage format.5. The system of claim 4 , further comprising processing the converted image to remove image imperfections.6. The system of claim 5 , wherein the image imperfections comprise shadows claim 5 , reflections claim 5 , glare ...

Подробнее
21-01-2021 дата публикации

Item Validation and Image Evaluation System

Номер: US20210019517A1
Принадлежит:

Systems for item validation and image evaluation are provided. In some examples, a system may receive an instrument and associated data. The instrument may be received and a user profile may be retrieved. The user profile may include a plurality of previously processed instruments that have been determined to be valid and/or authentic. The instrument may be compared to the plurality of previously processed instruments to determine whether one or more elements of the instrument being evaluated match one or more corresponding elements of the plurality of previously processed instruments. Matching or non-matching elements may be identified. In some examples, one or more user interfaces may be generated displaying the instruments and including any highlighting or enhancements identifying matching or non-matching elements. 1. A computing platform , comprising:at least one processor;a communication interface communicatively coupled to the at least one processor; and receive an image of a first instrument and associated data from an image processing computing device;', 'extract user data from the received image of the first instrument and associated data;', 'retrieve a user profile associated with a user of the first instrument, the user profile including images of a plurality of previously processed instruments;', 'compare the received image of the first instrument to the images of the plurality of previously processed instruments in the user profile;', 'identify, based on the comparing, at least one of: an element of the first instrument that matches a corresponding element from at least one instrument of the plurality of previously processed instruments and an element of the first instrument that does not match a corresponding element from at least one instrument of the plurality of previously processed instruments;', generate a first user interface; and', 'generate a selectable option to add the first instrument to the user profile, wherein the generated first user ...

Подробнее
21-01-2021 дата публикации

Enterprise Profile Management and Control System

Номер: US20210019518A1
Принадлежит:

Systems for profile management and control are provided. A system may receive an instrument or image of an instrument. In some examples, data may be extracted from the instrument or image of the instrument and a document profile may be retrieved based on the extracted data. Images within the document profile may be evaluated to identify a type of document for each document. In some examples, a total number of documents of each type may be determined or identified. The total number of documents may be compared to a threshold. If the total number of documents is below the threshold, the documents or images in the profile may be maintained. If the total number of documents is at or above the threshold, in some examples, each document may be further evaluated to determine or identify documents or document images for deletion. In some arrangements, the profile may be refreshed and documents or images identified for deletion may be deleted. 1. A computing platform , comprising:at least one processor;a communication interface communicatively coupled to the at least one processor; and retrieve, from a user profile database, a document profile associated with a user and including images of a plurality of checks associated with the user, each check of the plurality of checks having a check type;', 'determine a number of checks of a first check type in the document profile;', 'compare the number of checks of the first check type to a threshold number;', 'responsive to determining that the number of checks of the first check type is below the threshold number, store the images in the document profile;', further evaluate each check of the first check type to identify one or more checks for deletion; and', 'refresh the document profile including deleting the identified one or more checks for deletion., 'responsive to determining that the number of checks of the first check type is above the threshold number], 'memory storing computer-readable instructions that, when executed by ...

Подробнее
17-04-2014 дата публикации

DEVICES, SYSTEMS AND METHODS FOR TRANSCRIPTION SUGGESTIONS AND COMPLETIONS

Номер: US20140105502A1
Автор: Jensen Lee Samuel
Принадлежит: Ancestry.com Operations Inc.

Methods, devices and systems are described for transcribing text from artifacts to electronic files. A computer system is provided, wherein the computer system comprises a computer-readable storage device. An image of the artifact is received wherein text is present on the artifact. A first portion of the text is analyzed. Characters representing the first portion of the text are identified at a first confidence level equal to or greater than a threshold confidence level. The characters representing the first portion of the text are stored. A second portion of the text appearing on the artifact is analyzed. A plurality of candidates to represent the second portion of the text are identified at a second confidence level below the threshold confidence level. Finally, the plurality of candidates to a user for selection are presented. 1. (canceled)2. A method for transcribing text from an artifact to an electronic file , the method comprising:providing a computer system, wherein the computer system comprises a computer-readable storage device;receiving, at the computer system, an image, wherein text is present on the image;analyzing, at the computer system, a first portion of the text;identifying, at the computer system, at a first confidence level equal to or greater than a threshold confidence level, characters representing the first portion of the text;storing, at the computer-readable storage device, the characters representing the first portion of the text;analyzing, at the computer system, a second portion of the text;identifying, at the computer system, at a second confidence level below the threshold confidence level, a plurality of candidates to represent the second portion of the text, wherein at least one candidate of the plurality of candidates to represent the second portion of text is identified based on characters within the record; andpresenting, at the computer system, the plurality of candidates to a user for selection.3. The method of claim 2 , ...

Подробнее
16-01-2020 дата публикации

METHODS AND SYSTEMS FOR ELECTRONIC CREDENTIAL MANAGEMENT

Номер: US20200020439A1
Принадлежит: VISCARE TECHNOLOGIES, LLC

In one embodiment, a computer-implemented method includes receiving, from a computing device of a clinician, an image of a credential issued to the clinician, where the credential pertains to healthcare. The method also includes extracting information from the image of the credential, where the information includes an expiration date of the credential. The method also includes storing the image of the credential and the expiration date of the credential in a database, receiving, from the computing device of the clinician, a selection to join a home health agency, and transmitting, to a computing device of the home health agency, a notification pertaining to the credential issued to the clinician, where the notification indicates whether the credential is active based on the expiration date. 1. A computer-implemented method comprising:receiving, from a computing device of a clinician, an image of a credential issued to the clinician, wherein the credential pertains to healthcare;extracting information from the image of the credential, wherein the information comprises an expiration date of the credential;storing the image of the credential and the expiration date of the credential in a database;receiving, from the computing device of the clinician, a selection to join a home health agency; andtransmitting, to a computing device of the home health agency, a notification pertaining to the credential issued to the clinician, wherein the notification indicates whether the credential is active based on the expiration date.2. The computer-implemented method of claim 1 , wherein extracting the information further comprises:performing object character recognition on the image of the credential to extract the expiration date of the credential.3. The computer-implemented method of claim 1 , wherein extracting the information further comprises:inputting the image of the credential into a machine learning model trained to extract the expiration date from the image of the ...

Подробнее
21-01-2021 дата публикации

READING ORDER SYSTEM FOR IMPROVING ACCESSIBILITY OF ELECTRONIC CONTENT

Номер: US20210020159A1
Принадлежит: Microsoft Technology Licensing, LLC

A reading order extrapolation and management system and process for facilitating auditory comprehension of electronic documents. As an example, a user may access contents of an electronic document via an application and request a speech-synthesized recitation of any media in the electronic document. The application may make use of a reading order that has been specifically generated and improved by reference to eye tracking data from users reading the document. A reading order can be assigned to a document and implemented when, for example, a screen reader is engaged for use with the document. Such systems can be of great benefit to users with visually impairments and/or distracted users seeking a meaningful audio presentation of textual content. 1. A system comprising:a processor; and present, via a first client system, a first electronic content item;', 'identify a first plurality of content portions of the first electronic content item, each content portion associated with a different region of the first electronic content item as presented;', 'receive, from the first client system, first eye gaze data generated during the presentation of the first electronic content item, the first eye gaze data including a first distribution of gaze points;', 'detect a first series of fixation clusters in the first eye gaze data, each fixation cluster comprising an aggregation of gaze points within the first distribution of gaze points that occur closely in time and space;', 'identify which region of the first electronic item as presented corresponds to each fixation cluster and assigning each fixation cluster a content portion of the first plurality of content portions associated with that region;', 'produce and store a first user attention sequence identifying each content portion according to an order in which the aggregation of gaze points for each matching fixation cluster of the first series was generated;', 'calculate a first reading order for the first electronic ...

Подробнее
28-01-2016 дата публикации

PROCESSING IMAGE TO IDENTIFY OBJECT FOR INSERTION INTO DOCUMENT

Номер: US20160026613A1
Автор: VOGEL MATTHEW
Принадлежит:

An image is processed to identify an object for insertion into a document. The image is captured or retrieved from a data store. The image is processed to identify the object associated with a document type, within a portion of the image. The object types include a chart, a table, a shape, among others. The portion of the image is converted into the object. A control is provided to export the object into the document associated with the document type. 1. A method executed on a computing device to process an image to identify an object for insertion into a document , the method comprising:capturing the image;processing the image to identify the object associated with a document type, within a portion of the image;converting the portion into the object; andproviding a control to export the object into the document associated with the document type.2. The method of claim 1 , further comprising:processing the image through an image identification module that includes augmented optical character recognition (OCR) to identify text based data as the object in a tabular format within the portion.3. The method of claim 1 , further comprising:identifying a table as the object.4. The method of claim 3 , further comprising:presenting the control with an icon of the document including one from a set of: a spreadsheet document, a text based document, and a presentation document.5. The method of claim 3 , further comprising:including an operation in the control to export the table into a spreadsheet document as the document in response to an activation of the control.6. The method of claim 3 , further comprising:generating a chart from the table; andpresenting another control that includes an operation to export the chart and the table into a spreadsheet document as the document in response to an activation of the other control.7. The method of claim 3 , further comprising:presenting another control to export the table as a chart;detecting an activation of the other control; ...

Подробнее
25-01-2018 дата публикации

System and Method of Document and Signature Management

Номер: US20180024807A1
Принадлежит:

A system and method for document review and signature management that generally allows users to amend or alter a document image digitally and to sign, initial, or otherwise verify the amendments or alterations as they would a paper document, but digitally, remotely, and in real time. The system and method generally allow users to selectively open, close, scroll, shrink, expand, highlight, or otherwise manipulate a digital document image without digitally writing upon it by interacting with some document locations, while interaction with other document locations will selectively result in digital writing, all of which being reflected substantially in real time on one or more selected secondary devices also capable of interacting with the document image in a manner similar to the primary device, with any manipulation or writing inputted on a secondary device reflected substantially in real time on the primary device. 1. A method of document and signature management comprising the steps of:a. providing a primary user with a primary device;b. providing one or more secondary users with one or more secondary devices, wherein said primary device and said one or more secondary devices each include a computer processor, an operator interface, a screen display, an input interface, and a communication means, and wherein said primary device and said one or more secondary devices are each connected to the other through said communication means; and i. selectively displays a selected document image on said screen displays of said primary device and said one or more secondary devices;', 'ii. permits a manipulation by said primary user of said document image on said primary device by inputting commands using said input interface of said primary device;', 'iii. selectively communicates said manipulation of said document image by said primary user to said one or more secondary devices and selectively replicates said manipulation of said document image displayed by said one or more ...

Подробнее
25-01-2018 дата публикации

SYSTEM AND METHOD FOR REPORTING BASED ON ELECTRONIC DOCUMENTS

Номер: US20180024983A1
Автор: GUZMAN Noam, SAFT Isaac
Принадлежит: Vatbox, Ltd.

A system and method for reporting based on a first electronic document and at least one second electronic document. The method includes analyzing the first electronic document to determine at least one transaction parameter for each of at least one expense item, the first electronic document indicating the at least one expense item, wherein the first electronic document includes at least partially unstructured data; creating at least one template for the first electronic document, wherein each first electronic document template is a structured dataset including the determined at least one transaction parameter; retrieving, based on the at least one first electronic document template, the at least one second electronic document; and generating a report when the at least one second electronic document matches the at least one expense item, wherein the report indicates the at least one expense item and includes the at least one second electronic document. 1. A method for reporting based on a first electronic document and at least one second electronic document , comprising:analyzing the first electronic document to determine at least one transaction parameter for each of at least one expense item, the first electronic document indicating the at least one expense item, wherein the first electronic document includes at least partially unstructured data;creating at least one template for the first electronic document, wherein each first electronic document template is a structured dataset including the determined at least one transaction parameter;retrieving, based on the at least one first electronic document template, the at least one second electronic document; andgenerating a report when the at least one second electronic document matches the at least one expense item, wherein the report indicates the at least one expense item and includes the at least one second electronic document.2. The method of claim 1 , wherein determining the at least one transaction parameter ...

Подробнее
25-01-2018 дата публикации

SYSTEM AND METHOD FOR OBTAINING REISSUES OF ELECTRONIC DOCUMENTS LACKING REQUIRED DATA

Номер: US20180024984A1
Автор: GUZMAN Noam, SAFT Isaac
Принадлежит: Vatbox, Ltd.

A system and method for obtaining a reissue of an electronic document lacking required data. The method includes creating a template for the electronic document, wherein the template is a structured dataset including at least one transaction parameter determined based on the at least partially unstructured data; querying at least one data source for at least one requirement based on the template; determining, based on the template and the at least one requirement, whether the electronic document lacks at least a portion of the required data; retrieving completion data when it is determined that the electronic document lacks at least a portion of the required data; generating a reissue request electronic document including the electronic document and indicates a request to reissue the electronic document with respect to the completion data; and sending the reissue request electronic document to a reissuer server. 1. A method for obtaining a reissue of an electronic document lacking required data , the electronic document including at least partially unstructured data , comprising:creating a template for the electronic document, wherein the template is a structured dataset including at least one transaction parameter determined based on the at least partially unstructured data;querying at least one data source for at least one requirement based on the template, wherein the at least one requirement defines required data for the electronic document;determining, based on the template and the at least one requirement, whether the electronic document lacks at least a portion of the required data;retrieving completion data when it is determined that the electronic document lacks at least a portion of the required data, wherein the completion data completes the required data;generating a reissue request electronic document, wherein the reissue request electronic document includes the electronic document and indicates a request to reissue the electronic document with respect ...

Подробнее
25-01-2018 дата публикации

METHOD FOR GENERATING SEARCH INDEX AND SERVER UTILIZING THE SAME

Номер: US20180024988A1
Автор: LEE Hsin-Chen
Принадлежит: AVISION INC.

A method for generating a search index, applicable for a database system having a first database and a second database, includes the follow steps: receiving an access instruction corresponding to a first document, analyzing the first document to obtain a plurality of key character strings, writing the first document into the first database or the second database based on the access instruction and generating address information corresponding to the first document accordingly, and generating a search index corresponding to the first document based on the address information and the key character strings. 1. A method for generating a search index , applicable for a database system having a first database and a second database , wherein the method comprises:receiving an access instruction corresponding to a first document;analyzing the first document to obtain a plurality of first key character strings corresponding to the first document;storing the first document into the first database or/and the second database according to the access instruction and generating a first address information corresponding to the first document; andgenerating a first search index corresponding to the first document according to the first address information and the plurality of first key character strings.2. The method according to claim 1 , wherein the step of analyzing the first document to obtain a plurality of first key character strings corresponding to the first document comprises:when the first document is a text file, capturing contents of the text file to obtain the plurality of first key character strings; andwhen the first document is not the text file, performing an image recognition to the first document to generate the contents of the first document and to obtain the plurality of first key character strings accordingly.3. The method according to claim 2 , wherein the image recognition is Pattern Recognition.4. The method according to claim 1 , further comprising:receiving ...

Подробнее
25-01-2018 дата публикации

SYSTEMS AND METHODS FOR ASSESSING STANDARDS FOR MOBILE IMAGE QUALITY

Номер: US20180025223A1
Принадлежит:

Methods and systems are provided for defining and determining a formal and verifiable mobile document image quality and usability (MDIQU) standard, or Standard for short. The Standard ensures that a mobile image can be used in an appropriate mobile document processing application, for example an application for mobile check deposit. In order to quantify the usability, the Standard establishes 5 quality and usability grades. A mobile image capture device can capture images. A mobile device can receive information associated with one or more image quality assurance (IQA) criteria; evaluating the images to select an image satisfying an image quality criteria based on the received information; and in response to the image satisfying the image quality score, sending the selected image to determine a set of image quality assurance (IQA) scores. 1. A non-transitory computer-readable medium comprising instructions which , when executed by a processor , cause the processor to: receive information associated with one or more image quality assurance (IQA) criteria;', 'evaluate the images to select an image satisfying an image quality criteria based on the received information; and', 'in response to the image satisfying the image quality score, send the selected image to determine a set of image quality assurance (IQA) scores., 'capture images using a mobile image capture device;'}2. The non-transitory computer-readable medium of claim 1 , wherein to satisfy an image quality criteria claim 1 , content is at least extractable from the selected image based on the received information.3. The non-transitory computer-readable medium of claim 1 , wherein the processor is further to determine a false selection in response to an IQA score associated with the selected image of the set of images not satisfying the predetermined standard of image quality.4. The non-transitory computer-readable medium of claim 3 , wherein the processor is further to calculate a false selection rate of the ...

Подробнее
25-01-2018 дата публикации

SYSTEM AND METHOD FOR IDENTIFYING UNCLAIMED ELECTRONIC DOCUMENTS

Номер: US20180025224A1
Автор: GUZMAN Noam, SAFT Isaac
Принадлежит: Vatbox, Ltd.

A system and method for identifying unclaimed electronic documents among at least one electronic document, each electronic document including at least partially unstructured data. The method includes: analyzing each electronic document to determine at least one transaction parameter of the electronic document; creating a template for each electronic document, wherein each template is a structured dataset including the at least one transaction parameter determined for the electronic document; and determining whether each electronic document is unclaimed, wherein determining whether an electronic document is unclaimed further comprises comparing at least a portion of the template created for the electronic document to identifying data of a plurality of previous reclaims. 1. A method for identifying unclaimed electronic documents among at least one electronic document , each electronic document including at least partially unstructured data , comprising:analyzing each electronic document to determine at least one transaction parameter of the electronic document;creating a template for each electronic document, wherein each template is a structured dataset including the at least one transaction parameter determined for the electronic document; anddetermining whether each electronic document is unclaimed, wherein determining whether an electronic document is unclaimed further comprises comparing at least a portion of the template created for the electronic document to identifying data of a plurality of previous reclaims.2. The method of claim 1 , wherein determining the at least one transaction parameter of each electronic document further comprises:identifying, in the electronic document, at least one key field and at least one value;creating, based on the electronic document, a dataset, wherein the created dataset includes the at least one key field and the at least one value; andanalyzing the created dataset, wherein the at least one transaction parameter is ...

Подробнее
25-01-2018 дата публикации

SYSTEM AND METHOD FOR GENERATING CONSOLIDATED DATA FOR ELECTRONIC DOCUMENTS

Номер: US20180025225A1
Автор: GUZMAN Noam, SAFT Isaac
Принадлежит: Vatbox, Ltd.

A system and method generating consolidated data based on electronic documents. The method includes analyzing a first electronic document to determine at least one transaction parameter, the first electronic document indicating a transaction including at least one expense, wherein the first electronic document includes at least partially unstructured data; creating a template for the first electronic document, wherein the template is a structured dataset including the determined at least one transaction parameter; retrieving, based on the template, a second electronic document, wherein the second electronic document indicates evidence of the transaction; determining at least one deductible expense of the at least one expense based on at least one deduction rule, the template, and the second electronic document; and generating consolidation metadata based on the determined at least one deductible expense. 1. A method for generating consolidated data based on electronic documents , comprising:analyzing a first electronic document to determine at least one transaction parameter, the first electronic document indicating a transaction including at least one expense, wherein the first electronic document includes at least partially unstructured data;creating a template for the first electronic document, wherein the template is a structured dataset including the determined at least one transaction parameter;retrieving, based on the template, a second electronic document, wherein the second electronic document indicates evidence of the transaction;determining at least one deductible expense of the at least one expense based on at least one deduction rule, the template, and the second electronic document; andgenerating consolidation metadata based on the determined at least one deductible expense.2. The method of claim 1 , wherein determining the at least one transaction parameter further comprises:identifying, in the first electronic document, at least one key field and at ...

Подробнее
25-01-2018 дата публикации

LIVE DOCUMENT DETECTION IN A CAPTURED VIDEO STREAM

Номер: US20180025251A1
Принадлежит:

The present disclosure is directed toward systems and methods to quickly and accurately identify boundaries of a displayed document in a live camera image feed, and provide a document boundary indicator within the live camera image feed. For example, systems and methods described herein utilize different display document detection processes in parallel to generate and provide a document boundary indicator that accurately corresponds with a displayed document within a live camera image feed. Thus, a user of the mobile computing device can easily see whether the document identification system has correctly identified the displayed document within the camera viewfinder feed. 1. A method comprising:receiving an image feed comprising a displayed document;analyzing, by at least one processor, a first image frame from the image feed using a first process to determine a boundary of the displayed document;providing, for presentation to a user, the first image frame with a document boundary indicator corresponding to the boundary of the displayed document;analyzing, based on the boundary of the displayed document in the first image frame, a second image frame from the image feed using a second process to determine an updated boundary of the displayed document; andproviding, for presentation to the user, the second image frame with an updated document boundary indicator corresponding to the updated boundary of the displayed document.2. The method as recited in claim 1 , wherein receiving the image feed comprises receiving the image feed from a camera associated with a mobile computing device.3. The method as recited in claim 1 , wherein analyzing the first image frame using the first process comprises analyzing the first image frame using a robust detection process.4. The method as recited in claim 3 , wherein analyzing the first image frame using the robust detection process comprises:generating an edge map comprising edges identified within the first image frame;identifying, ...

Подробнее
25-01-2018 дата публикации

Capturing Product Details of Purchases

Номер: US20180025314A1
Принадлежит: MetaBrite, Inc.

Systems, methods and computer-readable media are disclosed for capturing purchase information regarding purchased items of a consumer. Upon receiving an image of a receipt (a receipt image) regarding a list of purchased items, receipt text is generated. The receipt text is processed to identify the purchased items in the receipt image. Accordingly, an iteration is begun to iterate through the item blocks of the receipt text. An item block corresponds to a discrete item in the receipt text. The processing comprises extracting textual elements from the item block and matching the textual elements to a known product. Upon matching the textual elements to a known product, the consumer inventory associated with the consumer is updated with regard to the purchase of the known product. 1. A computer implemented method for capturing information of purchased items to a consumer inventory of a consumer , the method comprising each of the following as implemented on a computing device:receiving information describing a first set of purchased items from a consumer and updating a consumer inventory corresponding to the consumer with the received information, and for each purchased item of the first set of purchased items, indicating that the item is a non-specific item in the consumer inventory corresponding to the consumer;receiving a receipt image from the consumer of a purchase receipt regarding at least some of the first set of purchased items;identifying a first item block from the receipt image;matching subject matter of the first item block to a product item of a set of known product items;updating a consumer inventory of the consumer with regard to the purchase of the product item.2. The method of claim 1 , wherein the method further comprises repeatedly identifying a next item block of the receipt text and processing the next item block until at least a subset of the item blocks of the receipt text are processed claim 1 , wherein processing a next item block comprises: ...

Подробнее
10-02-2022 дата публикации

SUPPORT APPARATUS, GENERATION APPARATUS, ANALYSIS APPARATUS, SUPPORT METHOD, GENERATION METHOD, ANALYSIS METHOD, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM

Номер: US20220043847A1
Принадлежит:

A support apparatus includes a generation apparatus and an analysis apparatus. The generation apparatus executes (a-1) to (a-5) with I=1 to n, and generates pieces of process information. The generation apparatus extracts material words from a document i in (a-1), extracts a treatment word i from the document i in (a-2), extracts a synthesis condition i from the document i in (a-3), extracts a characteristic value i related to a target material from the document i in (a-4), and associates the material words, the treatment word i, the synthesis condition i, and the characteristic value i with each other to generate process information i in (a-5). The analysis apparatus includes a combiner that generates composite process information including a common part common to the pieces of process information and different parts different among the pieces of process information, and an outputter that outputs the composite process information. 1. A support apparatus comprising:a generation apparatus; andan analysis apparatus, whereinthe generation apparatus executes (a-1) to (a-4) with i=1 to n, and generates process information 1 to process information n which are pieces of process information, the i being a natural number, the n being a natural number greater than or equal to 2,in the (a-1), a material word extractor included in the generation apparatus extracts material words from a document i, the material words including a starting material word indicating a starting material and a target material word indicating a target material,in the (a-2), a treatment word extractor included in the generation apparatus extracts a treatment word i from the document i, the treatment word i indicating a treatment i of generating the target material from the starting material,in the (a-3), a condition extractor included in the generation apparatus extracts a synthesis condition i from the document i, the synthesis condition i being a condition i of the treatment i,in the (a-4), a ...

Подробнее
10-02-2022 дата публикации

DOCUMENT PROCESSING PROGRAM AND INFORMATION PROCESSING APPARATUS

Номер: US20220043849A1
Принадлежит:

A document processing program and an information processing apparatus that present a contract status of an organization based on the contents of contract documents. The document processing program including instructions that causes the information processing apparatus to: accept a condition for analyzing a contract document by an acceptance unit; extract a contract document by an analysis target extraction unit, wherein the contract document containing extraction information matching the condition accepted by the acceptance unit from a contract document database which includes a plurality of contract documents and in which information indicating a contract status of the plurality of contract documents is extracted as extraction information; analyze the contract document extracted by the analysis target extraction unit based on the condition accepted by the acceptance unit, by an analysis unit; and display and output an analysis result of the analysis unit by the output unit. 1. A non-transitory computer-readable medium storing a program including instructions that , when executed by a processor , causes an information processing apparatus connected to a document processing apparatus through a communication interface , to:accept a condition for analyzing a contract document by an acceptance unit;extract a contract document by an analysis target extraction unit, wherein the contract document containing information that matches to the condition accepted by the acceptance unit is extracted from a contract document database which contains a plurality of contract documents and information indicating a contract status of the plurality of contract documents;analyze the contract document extracted by the analysis target extraction unit based on the condition accepted by the acceptance unit, by an analysis unit; anddisplay and output an analysis result of the analysis unit by the output unit.2. A non-transitory computer-readable medium storing a program including instructions ...

Подробнее
10-02-2022 дата публикации

SYSTEM AND METHOD FOR ASSOCIATION OF DATA ELEMENTS WITHIN A DOCUMENT

Номер: US20220043858A1
Принадлежит:

A system for association of data elements within a document is disclosed. An input data receiving subsystem receives an input data source of the document. A feature generation subsystem obtains one or more lists of personal data, generates one or more personal data features representing a relationship between one or more personal data elements. An affinity computation subsystem assesses each of the one or more personal data features, computes affinity score between the one or more personal data elements, generates one or more affinities. A personal data relationship identification subsystem assigns the one or more personal data elements to corresponding one or more identification stages, derives a set of identities corresponding to the one or more personal data elements. An identity filtration subsystem receives the one or more affinities and the set of identities, determines a validation of the set of identities, filters out the set of identities. 1. A system for association of data elements within a document comprising:an input data receiving subsystem configured to receive an input data source of the document in one or more formats; obtain one or more lists of personal data extracted from the input data source upon scanning the input data source of the document using a data source scanning technique; and', 'generate one or more personal data features representing a relationship between one or more personal data elements obtained from the one or more lists of the personal data;, 'a feature generation subsystem operatively coupled to the input data receiving subsystem, wherein the feature extraction subsystem is configured to assess each of the one or more personal data features generated from the one or more personal data elements at a predetermined time interval based on consideration of one or more levels of affinity;', 'compute an affinity score between the one or more personal data elements using at least one type of affinity function upon assessment of each ...

Подробнее
10-02-2022 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT

Номер: US20220044012A1
Принадлежит: RICOH COMPANY, LTD.

An information processing apparatus includes a processor configured to perform operations including extracting information indicative of an issuer of a form from form image data, and obtaining fixed information corresponding to the information indicative of the issuer of the form by referring to a storage storing the fixed information associated with the issuer of the form, and outputting, to a terminal apparatus, screen data including the information indicative of the issuer of the form and the obtained fixed information. 1. An information processing apparatus comprising a processor configured to perform operations including:extracting information indicative of an issuer of a form from form image data, and obtaining fixed information corresponding to the information indicative of the issuer of the form by referring to a storage storing the fixed information associated with the issuer of the form; andoutputting, to a terminal apparatus, screen data including the information indicative of the issuer of the form and the obtained fixed information.2. The information processing apparatus according to claim 1 , wherein in a case where the fixed information displayed on the terminal apparatus is edited claim 1 , the fixed information stored in the storage is updated.3. The information processing apparatus according to claim 2 , wherein in response to determining that the fixed information corresponding to the information indicative of the issuer of the form is not obtained claim 2 , a character string extracted from the form image data is registered to the storage as the fixed information corresponding to the information indicative of the issuer of the form.4. The information processing apparatus according to claim 1 , wherein the processor is further configured to perform operations including:identifying a character string determined to be an extraction reference point, based on a relationship in position with a particular character string, from among a plurality of ...

Подробнее
10-02-2022 дата публикации

ENHANCING ELECTRONIC DOCUMENTS FOR CHARACTER RECOGNITION

Номер: US20220044013A1
Принадлежит:

Techniques for desirably translating a document image to an editable electronic textual document are presented. Utilizing respective applications, a document processing management component (DPMC) can convert the document image to a grayscale document image, remove noise from such image, rotate such image to reduce or eliminate any skewing of such image, and perform character recognition on the rotated grayscale document image to extract the textual information from such document to generate an electronic textual document. DPMC can associate a document identifier with the electronic textual document, and such document and document identifier can be stored in a data store. When such document is related to a device or other item, a code or textual string can be associated with the device or item, wherein a communication device can scan the code or textual string. In response, DPMC can retrieve such document, or information relating thereto, from the data store. 1. A method , comprising:in response to determining an amount of skew of textual information presented in an image of a document, rotating, by a system comprising a processor, the image of the document, based on a rotation parameter, to reduce the amount of the skew of the textual information to generate a rotated image of the document, in accordance with a defined document processing criterion relating to skew reduction; andperforming, by the system, character recognition on the rotated image of the document to determine characters of the textual information to generate an electronic textual document comprising the characters of the textual information.2. The method of claim 1 , further comprising:receiving, by the system, a captured image of the document, wherein the textual information presented in the captured image of the document is at a first angle with respect to a defined axis, wherein the textual information of the document presented in the image of the document is at the first angle, wherein the ...

Подробнее
10-02-2022 дата публикации

Template-Based Key-Value Extraction for Inferring OCR Key Values Within Form Images

Номер: US20220044058A1
Принадлежит:

The system has a form analysis module that receives an image of a form into which values have been filled for the possible fields of information on the form, such as first name, address, age, and the like. By using a library of form templates, a form analysis module allows both flexibility of form processing and simplicity for the user. That is, the techniques used by the form analysis module allow the processing of any form image for which the library has a form template example. The form image need not precisely match any form template, but rather may be scaled or shifted relative to a corresponding template. Additionally, the user need only provide the form image itself, without providing any additional exemplars, metadata for training, or the like. 1. A computer-implemented method for location of text of key-value pairs on a form image , the computer-implemented method comprising:obtaining, at one or more processors, a form image originating from a client device;identifying, by the one or more processors using a neural network, regions of text within the form image;selecting, by the one or more processors, a form template from a database storing a plurality of form templates according to the identified regions of text;computing, by the one or more processors, a spatial mapping between the selected form template and the obtained form image;computing, by the one or more processors, spatial relationships between key regions and value regions of the selected form template;determining, by the one or more processors, for each key of a plurality of keys of the selected form template, using the computed spatial mappings and spatial relationships, an estimated region on the form image for the value corresponding to the key; andextracting, by the one or more processors, from the estimated regions on the form image based on overlap of the identified text regions and the estimated regions, text corresponding to values of the plurality of keys of the selected form template.2 ...

Подробнее
10-02-2022 дата публикации

INTELLIGENT DELIVERY SYSTEM

Номер: US20220044194A1
Автор: Walsh Dale
Принадлежит: RICOH COMPANY, LTD.

Intelligent tools are provided to enable a mailcenter in a campus to provide mail service to mail service customers and perform processing of mail, in an automated manner as much as possible. Such processing may be performed include scanning and obtaining mail information from a mailpiece, and then sending such mail information to the mail service customer (as specified addressee of the mailpiece) by electronic notification via a corresponding electronic address. 1. An intelligent delivery system comprising:one or more processors,one or more memories storing instruction which, when processed by the one or more processors, cause:receiving a digital image scanned from an address-bearing face of a piece of mail,determining whether the piece of mail corresponds to transactional mail based on at least one of: a logo included in the digital image, a barcode placed by a sender of the piece of mail included in the digital image, or addressee information indicating an addressee of the piece of mail extracted from the digital image, 'processing the piece of mail according to a workflow corresponding to the metadata.', 'in response to determining that the piece of mail corresponds to transactional mail, generating a metadata indicating at least one of: a transaction mail indicator or a department code, and'}2. The intelligent delivery system as claimed in claim 1 , further comprising:a customer database that stores electronic addresses for plural customers;wherein the one or more memories store additional instructions which, when processed by the one or more processors, cause:in response to determining that the piece of mail corresponds to correspondence mail, selecting an electronic address from the customer database for the addressee, andtransmitting a notification to the selected electronic address.3. The intelligent delivery system as claimed in claim 1 , wherein the one or more memories store additional instructions which claim 1 , when processed by the one or more ...

Подробнее
10-02-2022 дата публикации

DYNAMIC CATEGORIZATION OF IT SERVICE TICKETS USING NATURAL LANGUAGE DESCRIPTION

Номер: US20220044254A1
Принадлежит:

An embodiment for dynamic categorization of information technology (IT) service tickets is provided. The embodiment may include logging an IT service ticket when text is entered into a description field. The embodiment may also include creating a filtered description field by processing the text entered into the description field. The embodiment may further include computing a set of exponential weights and assigning the set of exponential weights to each word in the filtered description field. The embodiment may also include multiplying the set of exponential weights by the word's TF-IDF score to determine an IT service ticket category for placement of the IT service ticket into the IT service ticket category. The embodiment may further include generating features for machine learning, utilizing the generated features to build a supervised machine learning model, and evaluating the supervised machine learning model through analyzation of data from historical IT service tickets. 1. A computer-based method of categorizing information technology (IT) service tickets , the method comprising:logging an IT service ticket when text is entered into a description field;creating a filtered description field by removing triggers from the text entered into the description field;computing a set of exponential weights based on the text in the filtered description field;assigning the set of exponential weights to each word in the filtered description field;multiplying the set of exponential weights by a TF-IDF score associated with each word; anddetermining an IT service ticket category based on a result generated by the multiplying.2. The method of claim 1 , wherein the IT service ticket may be logged via a medium selected from a group consisting of phone call claim 1 , email claim 1 , chat claim 1 , text message claim 1 , walk-in claim 1 , web services claim 1 , mobile app claim 1 , and direct input.3. The method of claim 1 , wherein triggers are characters selected from a ...

Подробнее
24-01-2019 дата публикации

SYSTEM AND METHOD FOR IDENTIFYING CANDIDATES FOR BACK-OF-BOOK INDEX

Номер: US20190026270A1
Принадлежит:

A method, computer program product, and computer system for analyzing one or more existing book indexes to build a statistical model of term-to-text pairs. A document may be analyzed, wherein the document may include at least a portion of an instruction in a book. A term in the document may be identified. Whether the term is a candidate for an index of the book may be identified based upon, at least in part, the statistical model of term-to-text pairs. 1. A computer-implemented method comprising:analyzing, by a computing device, one or more existing book indexes to build a statistical model of term-to-text pairs;analyzing a document, wherein the document includes at least a portion of an instruction in a book;identifying a term in the document; andidentifying whether the term is a candidate for an index of the book based upon, at least in part, the statistical model of term-to-text pairs, wherein the statistical model includes one or more positive examples and one or more negative examples based upon, at least in part, a location of the term in the document and presence of the term in the one or more existing book indexes.2. (canceled)3. The computer-implemented method of further comprising determining a location of the term in the document.4. (canceled)5. The computer-implemented method of wherein the location of the ingredient in the document includes a title of the instruction.6. The computer-implemented method of wherein the location of the term in the document includes a preamble of the instruction.7. The computer-implemented method of wherein the term includes at least one of an ingredient claim 1 , a cooking method claim 1 , and a cooking style.8. A computer program product residing on a non-transitory computer readable storage medium having a plurality of instructions stored thereon which claim 1 , when executed by a processor claim 1 , cause the processor to perform operations comprising:analyzing one or more existing book indexes to build a statistical ...

Подробнее
24-01-2019 дата публикации

DATA PROCESSING APPARATUS AND METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Номер: US20190026549A1
Автор: Okamura Atsushi
Принадлежит:

In accordance with an embodiment, a data processing apparatus includes a memory configured to store a determination table, a camera configured to image a commodity package, and a processor. The processor is configured to extract an area having a predetermined shape from the imaged commodity package, recognize characters in the extracted area; determine whether the recognized characters meet a criterion described in the determination table, and output the recognized characters to a display or a printer connected to the data processing apparatus when the recognized characters meet the criterion. 1. A data processing apparatus , comprising:a memory configured to store a determination table;a camera configured to image a commodity package; and extract an area having a predetermined shape from the imaged commodity package;', 'recognize characters in the extracted area;', 'determine whether the recognized characters meet a criterion described in the determination table; and', 'when the recognized characters meet the criterion, output the recognized characters to a display or a printer connected to the data processing apparatus., 'a processor configured to2. The data processing apparatus according to claim 1 , whereinthe predetermined shape is a rectangle.3. The data processing apparatus according to claim 2 , whereinthe processor is configured to recognize only the characters arranged in parallel with a side of the extracted area.4. The data processing apparatus according to claim 1 , whereinthe processor is configured to detect a predetermined pattern in the extracted area and recognize only the characters arranged within a predetermined range from the detected pattern.5. The data processing apparatus according to claim 4 , whereinthe processor is configured to output the detected pattern together with the recognized characters.6. The data processing apparatus according to claim 1 , further comprising:a scanner configured to scan a code printed on the commodity package, ...

Подробнее