Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 6149. Отображено 199.
02-06-2004 дата публикации

Information retrieval

Номер: GB0002395806A
Принадлежит:

An information retrieval apparatus for searching a set of information items and displaying the results of the search, the information items each having a set of characterising information features. The apparatus comprises a search processor operable to search the information items in accordance with a user defined characterising information feature and to identify information items having characterising information features corresponding to that user defined characterising information feature. A mapping processor is operable to generate data representative of a map of information items from a set of information items identified in the search. The map provides the identified information items with respect to positions in an array in accordance with a mutual similarity of the information items, similar information items mapping to similar positions in the array. The apparatus includes a graphical user interface for displaying a representation of at least some of the identified items, and ...

Подробнее
29-10-2014 дата публикации

Resolving similar entities from a database

Номер: GB0002513472A
Принадлежит:

A plurality of record sets where each record set includes one or more common attribute values is retrieved from the database. An exemplar record set associated with a first entity record set is selected and a classifier 255 determines a probability that the record set includes records associated with the first entity. The determined probability is then compared to a threshold to assess whether the record set includes records associated with the first entity. The classifier is preferably a random forest classifier and the record sets are preferably financial transaction records processed for a financial institution by a merchant.

Подробнее
29-11-2017 дата публикации

Systems and methods for electronic document review

Номер: GB0002550533A
Принадлежит:

Systems and methods enable convenient and accurate searching, filtering, reviewing, and classification of electronic documents without the loss of metadata. A communication data source file is parsed into conversation-specific files that include message content and metadata. The message content and metadata are displayed on a computing device operated by a reviewer. To streamline the review process, the reviewer can filter display of the message content according to various metadata categories as well as search conversation-specific files using the metadata categories.

Подробнее
08-11-2006 дата публикации

Method and apparatus for categorising a text document

Номер: GB0000619040D0
Автор:
Принадлежит:

Подробнее
15-08-2007 дата публикации

PROCEDURE AND DEVICE FOR SIMILARITY SEARCH AND GROUPING

Номер: AT0000366964T
Принадлежит:

Подробнее
07-10-2021 дата публикации

INCREMENTAL LEARNING OF POINTWISE MUTUAL INFORMATION (PMI) WORD-VECTOR EMBEDDING FOR TEXT/LANGUAGE MODELING

Номер: AU2019200085B2
Принадлежит:

Online dictionary extension of word vectors techniques and systems are described that are configured to provide online extension of existing word vector dictionaries and thus overcome the failures of conventional techniques. In one example, a dictionary extension system is employed by a computing system to extend a word vector dictionary to incorporate a new word in an online manner. Co-occurrence information is estimated for the new word with respect to the words in the existing dictionary. This is done by estimating co occurrence information with respect to a large word set based on the existing dictionary and sparse co-occurrence information for the new word. The estimated co-occurrence information is utilized to estimate a new word vector associated with the new word by projecting the estimated co-occurrence information onto the existing word vector dictionary. An extended dictionary is created incorporating the original dictionary and the estimated new word vector. Computing System ...

Подробнее
20-02-2003 дата публикации

Information clasification and retrieval using concept lattices

Номер: AU2003900520A0
Автор:
Принадлежит:

Подробнее
28-05-2020 дата публикации

Entity resolution from documents

Номер: AU2014253497B2
Принадлежит: Spruson & Ferguson

ENTITY RESOLUTION FROM DOCUMENTS The present subject matter relates to entity resolution, and in particular, relates to providing an entity resolution from documents. The method comprises obtaining (202) the plurality of documents from at least one data source. The plurality of documents is blocked (204) into at least one bucket based on textual similarity and inter-document references among the plurality of documents. Further, within each bucket, a merged document for each entity may be created (206) based on an iterative match-merge technique. The iterative match merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity. The merged documents may be merged to generate (208) a resolved entity-document for each entity based on a graph clustering technique. i122rnI/q2Q1 rEn2 ...

Подробнее
29-06-2017 дата публикации

Weighted subsymbolic data encoding

Номер: AU2015360472A1

Described herein is a method and system of geometrically encoding data including partitioning data into a plurality of semantic classes based on a dissimilarity metric, generating a subspace formed by first and second data elements, the first and second data elements being included in first and second numbers of partitioned semantic classes, encoding the first data element with respect to the second data element such that the generated subspace formed by the first data element and the second data element is orthogonal, computing a weight distribution of the first data element with respect to the second data element, the weight distribution being performed for each of the first number of semantic classes and the second number of semantic classes, and determining a dominant semantic class corresponding to an ordered sequence of the first data element and the second data element, the dominant semantic class having a maximum weight distribution.

Подробнее
17-08-2017 дата публикации

SYSTEM AND ENGINE FOR SEEDED CLUSTERING OF NEWS EVENTS

Номер: AU2017200585A1
Принадлежит: AJ PARK

The present invention provides a seeded news event clustering and retrieval system configured to first create a candidate data set of documents, second create a set of initial clusters based on nearness or duplicate similarity status, and third create an aggregate cluster by merging initial clusters with seed documents. The invention generates top-level clusters for news events based on an editorially supplied topical label or "seed" component and generates sub-topic-focused clusters based on algorithm. The system uses an agglomerative clustering algorithm to gather and structure documents into distinct result sets. Decisions on whether to merge related documents or clusters are made according to similarity of evidence derived from two distinct sources, one, relying on a digital signature based on the unstructured text in the document, the other based on the presence of named entity tags that have been assigned to the document by an event or named entity tagger such as the Thomson Reuters ...

Подробнее
02-08-2016 дата публикации

PROVIDING A CLASSIFICATION SUGGESTION FOR CONCEPTS

Номер: CA0002773220C
Принадлежит: FTI CONSULTING, INC., FTI CONSULTING INC

A system (11) and method (40) for providing a classification (82) suggestion for concepts (13) is provided. A corpus of concepts (13) including reference concepts (14d) each associated with a classification (82) and uncoded concepts (14c) are maintained. A cluster (93) of uncoded concepts (14c) and reference concepts (14d) is provided. A neighborhood (70) of reference concepts (14c) in the cluster (93) is determined for at least one of the uncoded concepts (14c). A classification (82) of the neighborhood (70) is determined using a classifier. The classification (82) of the neighborhood is suggested as a classification for the at least one uncoded concept (14c).

Подробнее
10-02-2011 дата публикации

PROVIDING A CLASSIFICATION SUGGESTION FOR CONCEPTS

Номер: CA0002773220A1
Принадлежит:

A system (11) and method (40) for providing a classification (82) suggestion for concepts (13) is provided. A corpus of concepts (13) including reference concepts (14d) each associated with a classification (82) and uncoded concepts (14c) are maintained. A cluster (93) of uncoded concepts (14c) and reference concepts (14d) is provided. A neighborhood (70) of reference concepts (14c) in the cluster (93) is determined for at least one of the uncoded concepts (14c). A classification (82) of the neighborhood (70) is determined using a classifier. The classification (82) of the neighborhood is suggested as a classification for the at least one uncoded concept (14c).

Подробнее
15-09-2015 дата публикации

DISPLAYING RELATIONSHIPS BETWEEN ELECTRONICALLY STORED INFORMATION TO PROVIDE CLASSIFICATION SUGGESTIONS VIA NEAREST NEIGHBOR

Номер: CA0002773219C
Принадлежит: FTI CONSULTING, INC., FTI CONSULTING INC

A system (11) and method (40) for providing reference documents (14b) as a suggestion for classifying uncoded documents (14a) is provided. Reference electronically stored information items (14b) and a set of uncoded electronically stored information items (14a) are designated. Each of the reference information items are previously classified. At least one uncoded electronically stored information item (14a) is compared with the reference electronically stored information items (14b). One or more of the reference electronically stored information items (14b) similar to the at least one uncoded electronically stored information items (14a) are identified. Relationships are depicted between the at least one uncoded electronically stored information item (14a) and the similar reference electronically stored information items (14b) for classifying the at least one uncoded electronically stored information item (14a).

Подробнее
03-02-2015 дата публикации

DISPLAYING RELATIONSHIPS BETWEEN CONCEPTS TO PROVIDE CLASSIFICATION SUGGESTIONS VIA NEAREST NEIGHBOR

Номер: CA0002773319C
Принадлежит: FTI CONSULTING, INC., FTI CONSULTING INC

A system (11) and method (50) for displaying relationships between concepts (14c, 14d) to provide classification suggestions via nearest neighbor is provided. Reference concepts (14d) previously classified and a set of uncoded concepts (14c) are provided. At least one uncoded concept (14c) is compared with the reference concepts (14d). One or more of the reference concepts (14d) that are similar to the at least one uncoded concept (14c) are identified. Relationships between the at least one uncoded concept (14c) and the similar reference concept (14d) are depicted on a display for classifying the at least one uncoded concept (14c).

Подробнее
18-10-2016 дата публикации

SYSTEM AND METHOD FOR GROUPING MULTIPLE STREAMS OF DATA

Номер: CA0002777506C

A document clustering system and method of assigning a document to a cluster of documents containing related content are provided. Each cluster is associated with a cluster summary describing the content of the documents in the cluster. The method comprises: determining, at a document clustering system, whether the document should be grouped with one or more previously created cluster summaries, the previously created cluster summaries being stored in a memory in a B-tree data structure; and if it is determined that the document should not be grouped with the one or more previously created cluster summaries, then creating, at a document clustering system, a cluster summary based on the content of the document and storing the created cluster summary in the B-tree data structure.

Подробнее
07-01-2010 дата публикации

INFORMATION PROCESSING WITH INTEGRATED SEMANTIC CONTEXTS

Номер: CA0002729716A1
Принадлежит:

A system and method for generating a frame of reference for a plurality of information, the plurality of information containing text data obtained by a user through interaction with one or more information sources, including receiving selected information for analysis, the information including text data and identifying a plurality of logical units of the text data. A plurality of individual textual portions in each of the logical units is identified, and the number of logical units associated with each individual textual portion is calculated for use in identifying a pattern. Based on the pattern, a measure of importance is calculated and patterns are selected based on the measure of importance satisfying a predefined importance threshold. A plurality of information context definitions is generated based on the selected patterns and generated information context definitions are assigned as context definitions of a semantic context associated with the frame of reference. The plurality of ...

Подробнее
15-06-2017 дата публикации

AUTOMATICALLY CLASSIFYING AND ENRICHING IMPORTED DATA RECORDS TO ENSURE DATA INTEGRITY AND CONSISTENCY

Номер: CA0003007723A1
Принадлежит:

Certain example embodiments described herein relate to techniques for managing "bad" or "imperfect" data being imported into a database system. That is, certain example embodiments provide a lifecycle technology solution that helps receive data from a variety of different data sources of a variety of known and/or unknown formats, standardize it, fit it to a known taxonomy through model-assisted classification, store it to a database in a manner that is consistent with the taxonomy, and allow it to be queried for a variety of different usages. Some or all of the disclosed technology concerning auto-classification, enrichment, clustering model and model stacks, and/or the like, may be used in these and/or other regards.

Подробнее
10-05-2007 дата публикации

METHODS FOR CHARACTERIZING CONTENT ITEM GROUPS

Номер: CA0002628946A1
Принадлежит:

Published without an Abstract ...

Подробнее
08-05-2014 дата публикации

SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR WELLBORE EVENT MODELING USING RIMLIER DATA

Номер: CA0002889382A1

A data mining and analysis system which analyzes clusters of outlier data (i.e., rimliers) to detect and/or predict downhole events.

Подробнее
12-03-2018 дата публикации

SYSTEM AND METHOD FOR TEMPORAL IDENTIFICATION OF LATENT COMMUNITIES USING ELECTRONIC CONTENT

Номер: CA0002941604A1
Принадлежит:

Various embodiments are described herein for a system and method for determining a community of users with similar temporal behaviour from a plurality of users that generate electronic content during a time period by, for example, accessing the electronic content from a data store using a processing unit; determining at least one transient topic from the accessed electronic content for the time period using a topic extractor; determining contributions of the users to the identified at least one transient topic using a user community detector; determining the community of users as the users that have similar temporal contributions to the at least one identified transient topic using the user community detector; and providing a recommendation based on a determined user community.

Подробнее
18-10-2016 дата публикации

METHODS AND SYSTEMS FOR ANNOTATING ELECTRONIC DOCUMENTS

Номер: CA0002809037C

A computer-implemented method of annotating an electronic document may include receiving annotation information corresponding to a first electronic document file and creating annotation metadata that is associated with the annotation information. The method may further include storing the annotation information and associated annotation metadata in an annotation file that is separate from the first electronic document file, and anchoring the annotation information to a target electronic document file at an anchor location corresponding to the annotation metadata. The annotation metadata may be generated by assigning a target offset value to individual neighboring tokens defining an annotation neighborhood, wherein the target offset values correspond to positions of the neighboring tokens with respect to an annotation location within the first electronic document file. The annotation metadata may also comprise topographic patterns that are compared between source and target documents to ...

Подробнее
08-12-2020 дата публикации

SYSTEMS AND METHODS FOR COLLECTING, CLASSIFYING, ORGANIZING AND POPULATING INFORMATION ON ELECTRONIC FORMS

Номер: CA0002889996C
Принадлежит: FHOOSH INC, FHOOSH, INC.

Systems and methods for collecting, classifying, transmitting and updating personal information for completion and submission or supplementation of electronic forms or databases on any type of mobile or other computing device are provided. Information relating to a user is obtained from one or more sources through electronic means, and the information is then organized and securely stored in a database using field mapping and other techniques to classify the information into specific categories. The information that is obtained and organized may include contact information, financial information, health information and historical information. The organized information may then be accessed by the user to automatically and instantaneously populate or supplement an electronic document, form or web-based application without requiring the user to manually enter the information. The system automatically detects and stores updates to information and builds a database of forms and electronic documents ...

Подробнее
31-08-2018 дата публикации

Method and device for determining abnormal comment text

Номер: CN0108470065A
Принадлежит:

Подробнее
30-09-2019 дата публикации

Номер: KR1020190110428A
Автор:
Принадлежит:

Подробнее
13-01-2022 дата публикации

EMBEDDING MULTI-MODAL TIME SERIES AND TEXT DATA

Номер: US20220012274A1
Принадлежит:

Methods and systems of training and using a neural network model include training a time series embedding model and a text embedding model with unsupervised clustering to translate time series and text, respectively, to a shared latent space. The time series embedding model and the text embedding model are further trained using semi-supervised clustering that samples training data pairs of time series information and associated text for annotation.

Подробнее
11-02-2014 дата публикации

Computer-implemented system and method for generating a display of document clusters

Номер: US0008650190B2
Автор: Dan Gallivan, GALLIVAN DAN
Принадлежит: FTI Technology LLC, FTI TECHNOLOGY LLC

A computer-implemented system and method for generating a display of document clusters is described. Clusters of documents are presented in a multi-dimensional concept space. At least one document is selected from a collection of documents to be clusters. An angle theta of the document relative to a common origin of the multi-dimensional concept space is computed. The selected document is compared with each of the clusters. An angle sigma from the common origin is determined for each cluster. A difference between the angle theta for the document and the angle sigma for the cluster is determined. The difference is compared to the variance, and a new cluster is created when the difference exceeds the variance for all the clusters.

Подробнее
23-01-2003 дата публикации

Document clustering device, document searching system, and FAQ preparing system

Номер: US2003018629A1
Автор:
Принадлежит:

The present invention provides a document clustering device which evaluates similarity between each document among documents and all the documents including the concerning document itself. And according to this evaluation result, the document clustering device divides the documents into non-hierarchical clusters. Thereby, the present invention realizes a document searching system which speedily retrieves a document satisfying a search condition from an accumulation of an enormous number of documents, and a FAQ preparing system which automatically prepares FAQ from an accumulation of an enormous number of sample questions and sample answers.

Подробнее
01-07-2010 дата публикации

METHOD AND APPARATUS FOR USING A DISCRIMINATIVE CLASSIFIER FOR PROCESSING A QUERY

Номер: US20100169244A1
Принадлежит:

A method and apparatus for using a classifier for processing a query are disclosed. For example, the method receives a query from a user, and processes the query to locate one or more documents in accordance with a search engine having a discriminative classifier, wherein the discriminative classifier is trained with a plurality of artificial query examples. The method then presents a result of the processing to the user.

Подробнее
10-08-2017 дата публикации

MEASURING ACCURACY OF SEMANTIC GRAPHS WITH EXOGENOUS DATASETS

Номер: US20170228435A1
Принадлежит:

Provided is a process including: obtaining a semantic similarity graph having nodes corresponding to documents in an analyzed corpus and edges indicating semantic similarity between pairs of the documents; for at least a plurality of nodes in the graph, evaluating accuracy of the edges based on neighboring nodes and an external corpus by performing operations including: identifying the neighboring nodes based on adjacency to the respective node in the graph; selecting documents from an external corpus based on references in the selected documents to entities mentioned in the documents of the neighboring nodes; and determining how semantically similar the respective node is to the selected documents.

Подробнее
10-07-2003 дата публикации

Document categorization engine

Номер: US2003130993A1
Автор:
Принадлежит:

Automatic classification is applied in two stages: classification and ranking. In the first stage, a categorization engine classifies incoming documents to topics. A document may be classified to a single topic or multiple topics or no topics. For each topic, a raw score is generated for a document and that raw score is used to determine whether the document should be at least preliminarily classified to the topic. In the second stage, for each document assigned to a topic (i.e., for each document-topic association) the categorization engine generates confidence scores expressing how confident the algorithm is in this assignment. The confidence score of the assigned document is compared to the topic's (configurable) threshold. If the confidence score is higher than this configurable threshold, the document is placed in the topic's Published list. If not, the document is placed in the topic's Proposed list, where it awaits approval by a knowledge management expert. By modifying a topic's ...

Подробнее
08-03-2022 дата публикации

Semantic matching system and method

Номер: US0011269943B2
Принадлежит: JANZZ LTD

A computer-based system and method for determining similarity between at least two heterogenous unstructured data records and for optimizing processing performance. A plurality of occupational data records is generated and, for each of the occupational data records, a respective vector is created to represent the occupational data record. Each of the vectors is sliced into a plurality of chunks. Thereafter, semantic matching of the chunks occurs in parallel, to compare at least one occupational data record to at least one other occupational data record simultaneously and substantially in real time. Thereafter, values representing similarities between at least two of the occupational data records are output.

Подробнее
22-08-2017 дата публикации

Context labels for data clusters

Номер: US0009740773B2
Принадлежит: QUALCOMM Incorporated, QUALCOMM INC

Systems and methods for applying and using context labels for data clusters are provided herein. A method described herein for managing a context model associated with a mobile device includes obtaining first data points associated with a first data stream assigned to one or more first data sources; assigning ones of the first data points to respective clusters of a set of clusters such that each cluster is respectively assigned ones of the first data points that exhibit a threshold amount of similarity and are associated with times within a threshold amount of time of each other; compiling statistical features and inferences corresponding to the first data stream or one or more other data streams assigned to respective other data sources; assigning context labels to each of the set of clusters based on the statistical features and inferences.

Подробнее
18-08-2011 дата публикации

SYSTEM AND METHOD FOR IDENTIFYING FRESH INFORMATION IN A DOCUMENT SET

Номер: US20110202528A1
Принадлежит:

A method of identifying a fresh document in a document set is provided. The method may include obtaining a query document that is included in a document set comprising a plurality of documents. The method may also include grouping the plurality of documents into a plurality of fine clusters based on a textual similarity between the plurality of documents. The method may also include identifying a target fine cluster within the plurality of fine clusters, the target fine cluster including the query document. The method may also include ordering the documents included in the target fine cluster by time to identify the fresh document. The method may also include generating a query response that includes the fresh document.

Подробнее
28-11-2019 дата публикации

SYSTEMS AND METHODS FOR AUTO DISCOVERY OF FILTERS AND PROCESSING ELECTRONIC ACTIVITIES USING THE SAME

Номер: US2019361929A1
Принадлежит:

The present disclosure relates to systems and methods for filtering electronic activities. Exemplary implementations may include ingesting a first electronic activity; identifying an associated entity; and selecting a first filtering model based on the entity, the first filtering model trained to indicate whether to restrict further processing of ingested electronic activities. The method may further include generating a plurality of structured data tags for the first electronic activity; applying the selected first filtering model to the plurality of structured data tags for the first electronic activity to determine whether the first electronic activity satisfies a first restriction condition; and responsive to the first electronic activity satisfying the first restriction condition, restricting the first electronic activity from further processing; or responsive to the first electronic activity not satisfying the first restriction condition, further processing, by the one or more processors ...

Подробнее
20-12-2018 дата публикации

PREDICTIVE MODEL CLUSTERING

Номер: US20180365249A1
Принадлежит:

Performing data clustering in a model property vector space. Input data is received comprising a plurality of data instances in a data vector space. A model property vector specification is defined for a model vector. Information is identified from the input data, and a model property vector is created in the model property vector space for each of the plurality of data instances. A target number of clusters is identified and used to perform a data clustering procedure. An output is generated comprising a plurality of data segments and one or more clustering rules. For each data cluster, a predictive model is constructed for each data segment of the plurality of data segments.

Подробнее
23-09-2014 дата публикации

System and method for association extraction for surf-shopping

Номер: US0008843497B2

The present disclosure is directed to a computer system and method performed by a selectively programmed data processor for providing data to a Web page such that items are presented to the user in a way that imitates a real world shopping experience. Various aspects of the disclosed technology also relate to systems and methods for calculating product or category associations using associative relation extraction. Additional aspects of the disclosed technology relate to automatic topic discovery, and event and category matching.

Подробнее
26-12-2019 дата публикации

System, Device, and Method of Automatic Construction of Digital Advertisements

Номер: US20190392487A1
Принадлежит:

System, device, and method of automatic construction of digital advertisements. An Artificial Intelligence (AI) unit is configured to receive as input: digital copies of past advertisements, and data of their performance results; as well as brand guidelines and a creative brief for automatic generation of a new advertisement. The AI unit generates a set of advertisement elements, such as logo, headline, a sub-headline, call-to-action, legal content, and an image; based on analysis of the input and detection that these particular advertisement elements correspond to previous performance results that are beyond a pre-defined threshold. An automatic advertisement generation unit generates a new advertisement by digitally placing the set of advertisement elements onto a canvas. Optionally, the system automatically generates on-the-fly in real-time a user-tailored advertisement, that is based on analysis of past performance of advertisements that were shown by the same advertiser to this particular ...

Подробнее
30-12-2004 дата публикации

Method and platform for term extraction from large collection of documents

Номер: US2004267709A1
Автор:
Принадлежит:

A method and platform for statistically extracting terms from large sets of documents is described. An importance vector is determined for each document in the set of documents based on importance values for words in each document. A binary document classification tree is formed by clustering the documents into clusters of similar documents based on the importance vector for each document. An infrastructure is built for the set of documents by generalizing the binary document classification tree. The document clusters are determined by dividing the generalized tree of the infrastructure into two parts and cutting away the upper part. Statistically significant individual key words are extracted from the clusters of similar documents. Key words are treated as seeds and terms are extracted by starting from the seeds and extending to their left or right contexts.

Подробнее
03-12-2019 дата публикации

Methods and apparatus for ranking documents

Номер: US0010496652B1
Принадлежит: Google LLC, GOOGLE LLC

Methods and apparatus are described for scoring documents in response, in part, to parameters related to the document, source, and/or cluster score. Methods and apparatus are also described for scoring a cluster in response, in part, to parameters related to documents within the cluster and/or sources corresponding to the documents within the cluster. In one embodiment, the invention may detect at least one document within the cluster; analyze a parameter corresponding to the document; and compute a cluster score based, in part, on the parameter, wherein the cluster score corresponds with at least one document within the cluster.

Подробнее
21-03-2019 дата публикации

TEXT CLASSIFICATION METHOD AND APPARATUS

Номер: US20190087490A1
Принадлежит:

A text classification apparatus determines a word vector corresponding to a keyword according to a word vector model, and determines a potential extended word of the keyword based on the word vector. Then, when an extension rule input is received, and an adding instruction is detected, the apparatus adds the potential extended word to the keyword library, and adds the extension rule to a matching rule library. The apparatus determines a first probability that a text belongs to each of multiple preset classes, wherein the determination is based on the keyword library and the matching rule library. The apparatus determines a class to which the text belongs from the multiple preset classes.

Подробнее
19-11-2009 дата публикации

METHOD FOR STABLE AND LINEAR UNSUPERVISED CLASSIFICATION UPON THE COMMAND ON OBJECTS

Номер: US2009287723A1
Принадлежит:

A method of linear unsupervised classification allowing a database composed of objects and of descriptors to be structured, which is stable on the order of the objects, comprises an initial step for transformation of the qualitative, quantitative or textual data into presence-absence binary data. A structural threshold alphas function is determined of the n2 agreements between the objects to be classified with the structural threshold defining an optimization criterion adapted to the data. The descriptors are used as structuring and construction generators of a partition or set of classes. A class generated by a descriptor and a partition (40, 41, 42) progressively merged. For an optimization criterion involving a function f(Cii,Ci'i')=Min(Cii,Ci'i'), sums of Minimum functions are linearized.

Подробнее
02-07-2020 дата публикации

DOCUMENT CLASSIFICATION USING ATTENTION NETWORKS

Номер: US20200210526A1
Принадлежит:

A system comprising at least one hardware processor; and a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to: receive, as input, a plurality of electronic documents, apply a trained machine learning classifier to automatically classify at least some of said plurality of electronic documents, wherein said machine learning classifier comprises two or more attention layers, and wherein at least one of the attention layers comprises an adjustable parameter which controls a distribution of attention weights assigned by said attention layer.

Подробнее
01-10-2020 дата публикации

SYSTEM AND METHOD FOR LANGUAGE-INDEPENDENT CONTEXTUAL EMBEDDING

Номер: US20200311345A1
Принадлежит:

Disclosed is a system for language-independent contextual embedding of entities in a document. The document comprising sentences. The system comprises a database and a processing arrangement. The processing arrangement comprises a tokenizer module for tokenizing sentences to obtain tokens, an encoder module for determining character coordinate corresponding to the tokens, wherein the character coordinates corresponding to the tokens occur in a multi-dimensional hierarchical space. The system comprises a transmutation module for processing the character coordinates to generate contextual embeddings thereof in the multi-dimensional hierarchical space and a prediction module for memorizing sequential information pertaining to the contextual embeddings of the character coordinates.

Подробнее
24-09-2020 дата публикации

SYSTEMS AND METHODS FOR AUTO DISCOVERY OF FILTERS AND PROCESSING ELECTRONIC ACTIVITIES USING THE SAME

Номер: US20200302116A1
Принадлежит: People.ai, Inc.

The present disclosure relates to systems and methods for filtering electronic activities. Exemplary implementations may include ingesting a first electronic activity; identifying an associated entity; and selecting a first filtering model based on the entity, the first filtering model trained to indicate whether to restrict further processing of ingested electronic activities. The method may further include generating a plurality of structured data tags for the first electronic activity; applying the selected first filtering model to the plurality of structured data tags for the first electronic activity to determine whether the first electronic activity satisfies a first restriction condition; and responsive to the first electronic activity satisfying the first restriction condition, restricting the first electronic activity from further processing; or responsive to the first electronic activity not satisfying the first restriction condition, further processing, by the one or more processors ...

Подробнее
24-09-2020 дата публикации

KEY VALUE EXTRACTION FROM DOCUMENTS

Номер: US20200302219A1
Принадлежит:

Systems, methods, and computer-executable instructions for extracting key value data. Optical character recognition (OCR) text of a document is received. The y-coordinate of characters are adjusted to a common y-coordinate. The rows of OCR text are tokenized into tokens based on a distance between characters. The tokens are ordered based on the x,y coordinates of the characters. The document is clustered into a cluster based on the ordered tokens and ordered tokens from other documents. Keys for the cluster are determined from the first set of documents. Each key is a token from a first set of documents. A value is assigned to each kay based on the tokens for the document, and values are assigned to each key for the other documents. The values for the document and the values for the other documents are stored in an output document.

Подробнее
25-10-2007 дата публикации

Multi-directional and auto-adaptive relevance and search system and methods thereof

Номер: US2007250500A1
Автор: ISMALON EMIL
Принадлежит:

The multi-directional and auto-adaptive relevance and search methods hereof are capable of clustering information and users in ways that allow for higher quality search results to be provided to all the users of the system. As part of the operation of the search engine, both information pages and users are clustered in meaningful ways using multi-layer association graphs. Specifically, a multi-directional approach is used to allow the transfer of information from the users to the information pages in addition to the traditional transfer of data from the information pages to the user. The clustering is performed with respect to the identification of clusters of plurality of users that enables the information pages clustering in a dynamic way providing additional refinements beyond user profiles. Furthermore, the system is configured to provide personalized advisory by presenting additional search phrases tailored to the searching user.

Подробнее
03-12-2020 дата публикации

SEMANTIC ANALYSIS-BASED PLUG-IN APPLICATION RECIPE GENERATION

Номер: US20200379735A1
Принадлежит: Oracle International Corporation

Techniques for semantic analysis-based generation of plug-in application recipes (PIAR's) are disclosed. Responsive to receiving a data item that specifies (a) a desired genus of actions and/or (b) a desired genus of triggers, a PIAR management application performs semantic analysis on the data item to identify one or more candidate PIAR's. The candidate PAIR(s) is/are identified based at least in part on mapping of actions and/or triggers to the desired genus of actions and/or the desired genus of triggers. The mapping is based at least in part on metadata, associated with profiles for plug-in applications, corresponding to actions and/or triggers. The PIAR management application stores, for each plug-in application, a corresponding profile to define the plug-in application for use in one or more future PIAR's. Based on user input approving a particular PIAR in the one or more candidate PIAR's, the PIAR management application executes the particular PIAR.

Подробнее
18-05-2017 дата публикации

METHOD AND DEVICE FOR MINING AN INFORMATION TEMPLATE

Номер: US20170140026A1
Принадлежит: Xiaomi Inc.

Methods and devices for mining an information template are provided. A method may include forming a modeling information set comprising a plurality of modeling information items. The method may further include creating a plurality of encrypted information items by encrypting respective numerical information items included in the plurality of modeling information items. The method may further include clustering the plurality of encrypted information items to create at least one information template. According to the present disclosure, an information template may be mined through analysis of a plurality of modeling information items, and numerical information items included in the modeling information items may be encrypted during the template mining process, which may prevent users' private information from being disclosed by the mined template, so that a more secure method for mining an information template may be provided.

Подробнее
25-09-2012 дата публикации

Classification method and apparatus

Номер: US0008276067B2

A method for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, and said method comprising the following steps: representing each of said plurality of documents by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; representing the classification of said already classified documents into classes by separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises ...

Подробнее
02-07-2019 дата публикации

Defect record classification

Номер: US0010339170B2

An approach to classify different defect records by mapping plain language phrases to a taxonomy. The approach includes a method that includes receiving, by at least one computing device, a defect record associated with a defect. The method further includes receiving, by the least one computing device, a plain language phrase or word. The method further includes mapping, by the least one computing device, the plain language phrase or word to a taxonomy. The method further includes classifying, by the least one computing device, how the defect was at least one of detected and resolved using the taxonomy.

Подробнее
15-09-2015 дата публикации

Generating data clusters

Номер: US0009135658B2

Techniques are disclosed for prioritizing a plurality of clusters. Prioritizing clusters may generally include identifying a scoring strategy for prioritizing the plurality of clusters. Each cluster is generated from a seed and stores a collection of data retrieved using the seed. For each cluster, elements of the collection of data stored by the cluster are evaluated according to the scoring strategy and a score is assigned to the cluster based on the evaluation. The clusters may be ranked according to the respective scores assigned to the plurality of clusters. The collection of data stored by each cluster may include financial data evaluated by the scoring strategy for a risk of fraud. The score assigned to each cluster may correspond to an amount at risk.

Подробнее
25-08-2015 дата публикации

Method for improving the responsiveness of a client device

Номер: US0009116892B2

A method for improving the responsiveness of a client application by providing that application with a local database which is a replicated subset of a database held on a remote server.

Подробнее
28-08-2012 дата публикации

Method and apparatus for document clustering and document sketching

Номер: US0008255397B2
Принадлежит: Ebrary, GOLLAPUDI SREENIVAS, EBRARY

A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint. One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.

Подробнее
22-10-2020 дата публикации

IDENTIFYING CORRELATED ROLES USING A SYSTEM DRIVEN BY A NEURAL NETWORK

Номер: US20200334599A1
Принадлежит:

A device receives a request associated with standardizing organization-specific roles within an organization, where the request includes data that identifies titles for the organization-specific roles. The device converts the data to vectors that represent semantic meanings of the titles. The device sets a configuration of a data model by assigning weighted values to title-class identifiers that are used to associate titles, of a standardized set of titles, to a hierarchy of role classifications. The device uses the data model to determine scores that indicate likelihoods of the titles mapping to the title-class identifiers. The device identifies, based on scores, a subset of title-class identifiers that associate particular titles, of the standardized set of titles, and particular role classifications. The subset of title-class identifiers is stored in association with information relating to the particular titles. The device performs an action based on the information relating to the ...

Подробнее
30-03-2017 дата публикации

SMART EMAIL ATTACHMENT SAVER

Номер: US20170091250A1
Принадлежит:

In an approach to save-to location selection, a computing device accesses a metadata file comprising a data table. The computing device successively checks the data table for entries that match a series of features of a file to be saved. If the computing device finds one or more matches, the computing device determines an associated save-to location. If the computing device does not find a match and has exhausted all of the series of features, the computing devices determines a default save-to location. The computing device receives a user selection based on or overriding the determination. The computing device updates the data table with information concerning each of the features of the file and information concerning the user selection.

Подробнее
30-05-2017 дата публикации

Systems and/or methods for automatically classifying and enriching data records imported from big data and/or other sources to help ensure data integrity and consistency

Номер: US0009665628B1
Принадлежит: XEEVA, INC., XEEVA INC, Xeeva, Inc.

Techniques relating to managing “bad” or “imperfect” data being imported into a database system are described herein. As an example, a lifecycle technology solution helps receive data from a variety of different data sources of a variety of known and/or unknown formats, standardize it, fit it to a known taxonomy through model-assisted classification, store it to a database in a manner that is consistent with the taxonomy, and allow it to be queried for a variety of different usages. Some or all of the disclosed technology concerning auto-classification, enrichment, clustering model and model stacks, and/or the like, may be used in these and/or other regards.

Подробнее
27-10-2022 дата публикации

SYSTEMS AND METHODS FOR GENERATING A FILTERED DATA SET

Номер: US20220345543A1
Принадлежит: People.ai, Inc.

The present disclosure relates to generating a filtered data set. Data from a plurality of systems of record of a plurality of data source providers may be accessed. A master data set generated using the data accessed from the plurality of systems of record may be maintained. Restriction policies including one or more rules for restricting sharing of data may be maintained. A filtered data set may be generated for a data source provider responsive to an application of restriction policies of other data source providers to the master data set. The filtered data set may be provisioned.

Подробнее
06-12-2022 дата публикации

Detecting extraneous topic information using artificial intelligence models

Номер: US0011521601B2
Принадлежит: INVOCA, INC., Invoca, Inc.

Systems and methods for improving machine learning systems used to model topics on a plurality of calls are described herein. In an embodiment, a server computer receives plurality of digitally stored call transcripts that have been prepared from digitally recorded voice calls. The server computer uses a topic model of an artificial intelligence machine learning system, the topic model modeling words of a call as a function of one or more word distributions for each topic of a plurality of topics, to generate an output of the topic model which identifies the plurality of topics represented in the plurality of call transcripts. The server computer computes, for a particular topic of the plurality of topics a first value representing a vocabulary of the particular topic and a second value representing a consistency of the particular topic in two more call transcripts of the plurality of call transcripts which include the particular topic. Based, at least in part, on one or more of the first ...

Подробнее
27-06-2023 дата публикации

Computer-implemented method and device for processing data

Номер: US0011687725B2
Принадлежит: ROBERT BOSCH GMBH, Robert Bosch GmbH

A computer-implemented method for processing text data including a multitude of text modules. In the method, a representation of the text is provided, and a model is used which predicts a classification for a respective text module of the text as a function of the representation of the text. The provision of the representation of the text includes the provision of a total word vector for a respective text module of the text. The total word vector is formed from at least two, preferably multiple word vectors, and a respective word vector being weighted as a function of properties of the respective text module.

Подробнее
19-03-2024 дата публикации

Aggregation functions for nodes in ontological frameworks in representation learning for massive petroleum network systems

Номер: US0011934440B2
Автор: Marko Maucec
Принадлежит: Saudi Arabian Oil Company

Systems and methods include a method for aggregating source data to form ontological frameworks. Aggregation functions are defined for ontological frameworks modeling categories of components of a facility. Each aggregation function defines a target component selected from a Things category, an Events category, and a Methods category. Defining the target component includes aggregating information from one or more components selected from one or more of the Things category, the Events category, and the Methods category. Source data is received in real-time from disparate sources and in disparate formats. The source data provides information about the components of the facility and external systems with which the facility interacts. Using the aggregation functions, the source data is aggregated to form the ontological frameworks. Each ontological framework models a component of the Things category, a component of the a Events category, or a component of the Methods category.

Подробнее
16-03-2018 дата публикации

Способ и устройство для извлечения шаблона данных

Номер: RU2647628C2
Принадлежит: Сяоми Инк. (CN)

Изобретение относится к извлечению шаблона данных. Техническим результатом является повышение точности обработки данных. Способ извлечения шаблона данных, включающий в себя: получение набора данных для моделирования, при этом набор данных для моделирования состоит из массива данных для моделирования; соответствующее шифрование числовых данных, входящих в массив данных для моделирования, с целью получения массива зашифрованных данных, кластеризацию массива зашифрованных данных с целью получения не менее одного шаблона данных; подтверждение коэффициента охвата для каждого шаблона данных; и определение фактического шаблона данных на основании не менее одного шаблона данных в соответствии с результатами подтверждения. 3 н. и 20 з.п. ф-лы, 4 ил.

Подробнее
18-11-2021 дата публикации

СПОСОБ АВТОМАТИЧЕСКОЙ КЛАССИФИКАЦИИ ФОРМАЛИЗОВАННЫХ ЭЛЕКТРОННЫХ ГРАФИЧЕСКИХ И ТЕКСТОВЫХ ДОКУМЕНТОВ В СИСТЕМЕ ЭЛЕКТРОННОГО ДОКУМЕНТООБОРОТА С АВТОМАТИЧЕСКИМ ФОРМИРОВАНИЕМ ЭЛЕКТРОННЫХ ДЕЛ

Номер: RU2759887C1

Изобретение относится к вычислительной технике. Технический результат заключается в автоматизации классификации формализованных электронных текстовых и графических документов в системе электронного документооборота по областям информационной ответственности должностных лиц для их доклада лицу, принимающему решения, и распределения их в электронные дела. Способ заключается в том, что на основе сведений о характеристиках документа и распознанных реквизитов документа определяют вид документа; на основе определенных областей информационной ответственности (ОИО), априорных сведений о структуре организации и вида документа получают кортеж данных и присваивают его реквизиту документа «резолюция»; на основе определенной ОИО и выявленного вида документа определяют статью ведомственного Перечня (далее - Перечень) документов со сроками их хранения, к которой может быть отнесен исполненный документ; на основании определенных реквизитов и уникальных ключевых слов определяют контур системы электронного ...

Подробнее
10-02-2016 дата публикации

СПОСОБ АВТОМАТИЧЕСКОЙ КЛАСТЕРИЗАЦИИ ОБЪЕКТОВ

Номер: RU2014130519A
Принадлежит:

Способ автоматической кластеризации объектов, сводящийся к формированию из исходного множества классифицируемых объектов выборок в виде начальных кластеров, отличающийся тем, что исходное множество формируется путем идентификации каждого классифицируемого объекта его параметром, задающим координату классифицируемого объекта в исходном множестве, и рассматривается как выборка обучения, которую формируют по показательному закону распределения, а данные о кластерах, полученные на этапе обучения, заносятся в соответствующие элементы блока памяти, которые используются при дальнейшем последовательном накоплении в них измерительной информации, на этапе обучения определяют также модель кластера Kс количеством элементов N, удовлетворяющую минимуму риска R(α) формирования модели кластера для коэффициента регуляризации α, которая определяется центром множества объектов кластерагде z- координата классифицируемого объекта кластера K, полученная на этапе обучения,для каждого кластера Kс радиусом rзадают ...

Подробнее
08-01-2003 дата публикации

Information storage and retrieval

Номер: GB0000227658D0
Автор:
Принадлежит:

Подробнее
07-04-2021 дата публикации

Methods of clustering computational event logs

Номер: GB2513885B

Подробнее
05-06-2002 дата публикации

Theme-based system and method for classifying patent documents

Номер: GB0002369698A
Принадлежит:

A classification system having a controller a document storage memory, and a document input is used to classify documents. The controller is programmed to generate a theme score from a plurality of source documents in a plurality of pre-classified source documents. A theme score is also generated for the unclassified document. The unclassified document theme score and the theme scores for the various classes are compared and the unclassified document is classified into the classification having the nearest theme score. Manually identified mis-classified documents may be used to improve the classification system. Different sections of a patent document (eg. abstract, description, claims) may be given different weights for classification.

Подробнее
30-04-2014 дата публикации

Resolving similar entities from a database

Номер: GB0201404499D0
Автор:
Принадлежит:

Подробнее
26-08-2020 дата публикации

Clustering facets on a two-dimensional facet cube for text mining

Номер: GB0202010822D0
Автор:
Принадлежит:

Подробнее
14-02-2018 дата публикации

Determining reply content for a reply to an electronic communication

Номер: GB0002552905A
Принадлежит:

Methods and apparatus related to determining reply content for a reply to an electronic communication. Some implementations are directed generally toward analyzing a corpus of electronic communications to determine relationships between one or more original message features of "original" messages of electronic communications and reply content that is included in "reply" messages of those electronic communications. Some implementations are directed generally toward providing reply text to include in a reply to a communication based on determined relationships between one or more message features of the communication and the reply text.

Подробнее
10-10-2012 дата публикации

Display of hypertext documents grouped according to their addinity

Номер: GB0201215208D0
Автор:
Принадлежит:

Подробнее
01-04-2020 дата публикации

Topic clustering and Event Detection

Номер: GB0202002192D0
Автор:
Принадлежит:

Подробнее
25-09-2019 дата публикации

Systems and methods for analysing information content

Номер: GB0201911459D0
Автор:
Принадлежит:

Подробнее
03-06-2021 дата публикации

A METHOD OF TEXT MINING IN RANKING OF WEB PAGES USING MACHINE LEARNING

Номер: AU2021100441A4
Принадлежит:

A METHOD OF TEXT MINING IN RANKING OF WEB PAGES USING MACHINE LEARNING A Web browser is a Software application for accessing the information from World Wide Web, displays the Web Pages based on the text keyword inputs from the Search Engine. The ranking of the Web Pages will vary from none Search Engine to another Search Engine. When user entered a query in the Search Engine, it should return relevant Web Pages related to the query by ranking the Web Pages. There are several factors affecting the ranking of the Web Pages such as content of pages, Title tags, URL structures, Meta Description Tags, XML Sitemap, Social Network Contents, Blog Forum Contents, and Videos. The Text Mining is an Artificial Intelligence technique for Text analytics to transform the unstructured Text in the documents and databases into normalized structured data. The existing Search Engines such as Google, Yahoo, Ask, and Bing produces numerous links for web pages which may or may not relevant for the search query ...

Подробнее
24-09-2020 дата публикации

METHOD AND SYSTEM FOR REDUCING INCIDENT ALERTS

Номер: AU2020200629A1
Принадлежит: Murray Trento & Associates Pty Ltd

A system and method for reducing incident alerts for an enterprise environment are described. In one embodiment, a method of reducing incident alerts for an enterprise environment includes receiving a plurality of historical incident alerts associated with previous incidents associated with nodes within an enterprise environment. The method includes extracting from a first subset of the historical incident alerts a plurality of rules to generate a rule knowledge base and analyzing a second subset of the historical incident alerts against the plurality of rules to identify candidate incidents alerts as potential dead-end tickets. The method also includes providing feedback on the candidate incident alerts to confirm or deny that the alert is a dead-end ticket. Based on the feedback, a prescriptive avoidance rule set is generated to identify an incident alert as a dead-end ticket and eliminate the dead-end tickets from submitted incident alerts. w N CN N1 H w z w 0 w J _ z 0 2u H wJ < wH ...

Подробнее
02-10-2014 дата публикации

Data clustering

Номер: AU2014201505A1
Принадлежит:

In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster. Further, cluster metascores may be generated based on various cluster scores associated with a cluster. Clusters may be ranked based on cluster metascores. Various embodiments may enable an analyst to discover various insights related to data clusters, and may be applicable to various tasks including, for example, financial fraud detection. 0U C"04 >--> CI, (9 (9D( ClD CDT C/) 13 LU w C/) I- -~( cn cv c C n 0 cr rr C.0 0.

Подробнее
06-02-2014 дата публикации

Information classification program, information classification method, and information processing apparatus

Номер: AU2013201018A1
Принадлежит:

Abstract An information classification program (110) includes the following: acquiring multiple posted information items (111), each of the multiple posted information items (111) including at least either of a text information item (112) and an image information item (113); generating text information items (112) including multiple text items in such a manner that image information items (113) are removed from the multiple posted information items (111), and individually classifying the text items included in the text information items (112) into first categories; generating image information items (113) including multiple images in such a manner that text information items (113) are removed from the multiple posted information items (111), and individually classifying the images included in the image information items (113) into second categories; associating the classified text items (114) and the classified images (115) with each other on the basis of the first and second categories ...

Подробнее
16-06-2016 дата публикации

WEIGHTED SUBSYMBOLIC DATA ENCODING

Номер: CA0002970168A1
Принадлежит:

Described herein is a method and system of geometrically encoding data including partitioning data into a plurality of semantic classes based on a dissimilarity metric, generating a subspace formed by first and second data elements, the first and second data elements being included in first and second numbers of partitioned semantic classes, encoding the first data element with respect to the second data element such that the generated subspace formed by the first data element and the second data element is orthogonal, computing a weight distribution of the first data element with respect to the second data element, the weight distribution being performed for each of the first number of semantic classes and the second number of semantic classes, and determining a dominant semantic class corresponding to an ordered sequence of the first data element and the second data element, the dominant semantic class having a maximum weight distribution.

Подробнее
06-05-2014 дата публикации

DETERMINING RELEVANT INFORMATION FOR DOMAINS OF INTEREST

Номер: CA0002716062C
Принадлежит: ATIGEO LLC

Techniques are described for determining and using relevant information related to domains of interest. In at least some situations, the techniques include automatically analyzing documents, terms and other information related to a domain of in-terest in order to automatically determine information about relevant themes within the domain and/or about which documents have contents that are relevant to such themes. Such automatically determined information related to a domain may then be used in various ways, including to assist users in specifying themes of interest and/or in obtaining documents and/or document frag-ments with contents that are relevant to specified themes. In addition, information about how the automatically determined infor-mation is used by users may be tracked and used as feedback for learning improved determinations of relevant themes and relevant documents within the domain, such as by using automated machine learning techniques.

Подробнее
08-09-2020 дата публикации

ELECTRONIC DOCUMENT CLASSIFICATION

Номер: CA0002704344C

An electronic document classification system disclosed herein classifies electronic documents. The classification of the documents may involve analyzing the document and the information attached to the document to generate a set of classification data and comparing the classification data with one or more classification rules to generate a set of classifying data. The system attaches the set of classifying data to the electronic document and displays the electronic document based on the set of classifying data. The classification data may also be used to prioritize the electronic documents and to assign a retention period to the electronic documents. The system is further adapted to receive user feedback regarding the classification of the electronic document and to update the classification rules.

Подробнее
24-05-2007 дата публикации

INFORMATION EXPLORATION SYSTEMS AND METHODS

Номер: CA0002629999A1
Принадлежит:

Disclosed information exploration system and method embodiments operate on a document set to determine a document cluster hierarchy. An exclusionary phrase index is determined for each cluster, and representative phrases are selected from the indexes. The selection process may enforce pathwise uniqueness and balanced sub-cluster representation. The representative phrases may be used as cluster labels in an interactive information exploration interface.

Подробнее
17-09-2015 дата публикации

EXTRACTING DATA FROM COMMUNICATIONS RELATED TO DOCUMENTS

Номер: CA0002892630A1
Принадлежит:

The disclosed embodiments provide a system that processes data. During operation, the system obtains a communication associated with a document and extracts data associated with the document from the communication. Next, the system uses the extracted data from the communication and document data from the document to build a context associated with the document. The system then uses the context to facilitate use of the document by a user associated with the communication.

Подробнее
08-05-2014 дата публикации

SYSTEMS AND METHODS FOR COLLECTING, CLASSIFYING, ORGANIZING AND POPULATING INFORMATION ON ELECTRONIC FORMS

Номер: CA0002889996A1
Принадлежит:

Systems and methods for collecting, classifying, transmitting and updating personal information for completion and submission or supplementation of electronic forms or databases on any type of mobile or other computing device are provided. Information relating to a user is obtained from one or more sources through electronic means, and the information is then organized and securely stored in a database using field mapping and other techniques to classify the information into specific categories. The information that is obtained and organized may include contact information, financial information, health information and historical information. The organized information may then be accessed by the user to automatically and instantaneously populate or supplement an electronic document, form or web-based application without requiring the user to manually enter the information. The system automatically detects and stores updates to information and builds a database of forms and electronic documents ...

Подробнее
19-12-2016 дата публикации

APPARATUS AND METHOD FOR SINGLE PASS ENTROPY DETECTION ON DATA TRANSFER

Номер: CA0002933370A1
Принадлежит:

Embodiments of the present invention include a memory unit and a processor coupled to a memory unit. The processor is operable to group a plurality of subsets of data from an input data stream and compute a first hash value corresponding to a first grouped subset of data. Additionally, the processor is operable to detect a match between the first hash value and a second hash value stored in a hash table. Furthermore, the processor is also configured to monitor a hash value match frequency for the input data stream in which the processor is operable to increment a counter value responsive to a detection of the match and determine an entropy level for the input data stream based on the counter value relative to a frequent hash value match threshold. The processor can generate an instruction to either initialize performance of a data compression operation when the counter value meets or exceeds the frequent hash value match threshold or refrain from the performance of the data compression ...

Подробнее
16-02-2021 дата публикации

APPARATUS AND METHOD FOR SINGLE PASS ENTROPY DETECTION ON DATA TRANSFER

Номер: CA2933370C
Принадлежит: HGST NETHERLANDS BV, HGST NETHERLANDS B.V.

Embodiments of the present invention include a memory unit and a processor coupled to a memory unit. The processor is operable to group a plurality of subsets of data from an input data stream and compute a first hash value corresponding to a first grouped subset of data. Additionally, the processor is operable to detect a match between the first hash value and a second hash value stored in a hash table. Furthermore, the processor is also configured to monitor a hash value match frequency for the input data stream in which the processor is operable to increment a counter value responsive to a detection of the match and determine an entropy level for the input data stream based on the counter value relative to a frequent hash value match threshold. The processor can generate an instruction to either initialize performance of a data compression operation when the counter value meets or exceeds the frequent hash value match threshold or refrain from the performance of the data compression ...

Подробнее
27-08-2020 дата публикации

Method and apparatus for searching information

Номер: KR0102148691B1
Автор:
Принадлежит:

Подробнее
16-10-2014 дата публикации

A computer system for presenting a topic, characteristics and an attribution of the topic to a plurality of users via a predetermined network and method therefor

Номер: TW0201439797A
Принадлежит:

A computer system interconnected to a community of users having a data processor input module programmed to receive communications from said users including one or more inputs regarding food recipes and store said inputs in accessible memory. A data processor determining module programmed to access stored data and to apply a data interpretative algorithm to said data to unify and organize disparate data inputs into a cohesive database relating to recipes. Also, a search entry module connected to the recipe database to permit access to the database to support a search algorithm applied to the database.

Подробнее
05-06-2018 дата публикации

System for processing data received from various data sources

Номер: US0009990424B2

Techniques are provided for processing and categorizing data received from data sources. The processing and categorizing of the received data comprises: determining whether the digital data can be associated with one or more categories by determining whether a first match between one or more image characteristics of the one or more categories and one or more image characteristics of the digital data is found; in response to determining that the first match is found: associating the one or more categories with the digital data; determining, based at least in part on the one or more categories, one or more applications that are to be used to process the digital data; in response to determining the one or more applications that are to be used to process the digital data, initiating the one or more applications to process the digital data.

Подробнее
17-07-2018 дата публикации

Systems and methods for implementing an encrypted search index

Номер: US0010025951B2
Принадлежит: salesforce.com, inc., SALESFORCE COM INC

An encrypted search index is disclosed. For instance, an exemplary system may include a search index stored on disk with customer information stored therein, the search index files having a term dictionary or a term index type file having internal structure which allows a portion of the individual search index file to be updated, encrypted, and/or decrypted without affecting the internal structure of the individual search index file; a file input/output (IO) layer to encrypt the customer information being written into the individual search index file and to decrypt the customer information being read from the individual search index file; and a query interface to execute the operation against the customer information stored in the memory in its decrypted form.

Подробнее
10-01-2013 дата публикации

SEMIOTIC INDEXING OF DIGITAL RESOURCES

Номер: US20130013603A1
Принадлежит: NAMESFORLIFE, LLC

A method of classifying a plurality of documents. The method includes steps of providing a first set of classification terms and a second set of classification terms, the second set of classification terms being different from the first set of classification terms; generating a first frequency array of a number of occurrences of each term from the first set of classification terms in each document; generating a second frequency array of a number of occurrences of each term from the second set of classification terms in each document; generating a first similarity matrix from the first frequency array; generating a second similarity matrix from the second frequency array; determining an entrywise combination of the first similarity matrix and the second similarity matrix; and clustering the plurality of documents based on the result of the entrywise combination.

Подробнее
28-08-2008 дата публикации

Classification-Based Method and Apparatus for String Selectivity Estimation

Номер: US2008208856A1
Автор: LIM LIPYEOW, WANG MIN
Принадлежит:

Histogram construction and selectivity estimation for string and substring match queries in databases of data having strings associated with attributes. The histogram construction counts string-attribute pairs in the documents, and outputs string-attribute-count triples sorted by count. The collection is partitions the collection into buckets. A synopsis is generated for the partition, having an average selectivity or count of the string-attribute-count triples in the partition and summary information representing the set of string-attribute pairs belonging to the bucket. Subsequent queries, both for exact and substring matches, use the synopsis to estimate the selectivity of buckets.

Подробнее
24-11-2015 дата публикации

Identifying potential duplicates of a document in a document corpus

Номер: US0009195714B1

According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document, is provided. A source document is obtained. A list of queries corresponding to the source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.

Подробнее
28-06-2012 дата публикации

KEYWORDS EXTRACTION AND ENRICHMENT VIA CATEGORIZATION SYSTEMS

Номер: US20120166441A1
Принадлежит: MICROSOFT CORPORATION

Techniques for determining a set of keywords associated with a document are provided. A document is received that may be classified into a taxonomy that includes a plurality of categories. A categorization ranking is determined for each category for the received document. A set of categories of the taxonomy having highest categorization rankings is determined for the received document. Documents representing the set of categories having highest categorization rankings are combined together into a cumulative representative text that includes a plurality of terms. A cumulative term corpus importance score is determined for each term in the cumulative representative text. The cumulative term corpus importance score for a particular term indicates an importance of the particular term in a context of the cumulative representative text. A set of terms of the cumulative representative text having highest cumulative term corpus importance scores is selected to be keywords for the received document.

Подробнее
02-06-2015 дата публикации

System and method of merging text analysis results

Номер: US0009047347B2
Принадлежит: SAP SE, SAP AG

A system and method of merging text analysis results. The system uses a set of three corrected, weakened Jaccard factors to determine whether the respective results of multiple text analysis operations are equal, subtypes of each other or associated with each other, in order to merge the results.

Подробнее
06-05-2010 дата публикации

SYSTEM AND METHOD FOR GEOGRAPHICALLY ORGANIZING AND CLASSIFYING BUSINESSES ON THE WORLD-WIDE WEB

Номер: US20100114904A1
Принадлежит: AOL INC.

A method and search engine for classifying a source publishing a document on a portion of a network, includes steps of electronically receiving a document, based on the document, determining a source which published the document, and assigning a code to the document based on whether data associated with the document published by the source matches with data contained in a database. An intelligent geographic- and business topic-specific resource discovery system facilitates local commerce on the World-Wide Web and also reduces search time by accurately isolating information for end-users. Distinguishing and classifying business pages on the Web by business categories using Standard Industrial Classification (SIC) codes is achieved through an automatic iterative process.

Подробнее
25-10-2011 дата публикации

Method for automated document selection

Номер: US0008045188B2
Принадлежит: Xerox Corporation, XEROX CORP, XEROX CORPORATION

Provided is a method for the automated selection of sample documents or pages from a large collection, and more particularly an application of the method in a proof presentment environmentwhere the method is employed for selection and review of representative or extreme pages from a large document, such as one scheduled for printing. The method characterizes pages or documents in a multi-dimensional vector space based upon a set of characteristics, and then uses clustering techniques to group the pages, enabling the selection of typical pages from the groups, outlier pages from extremes lying outside of the groups, or both typical and outlier pages.

Подробнее
16-11-2021 дата публикации

Data insight discovery using a clustering technique

Номер: US0011176187B2

Disclosed aspects relate to data insight discovery using a clustering technique. A set of data may be compressed based on a set of proximity values with respect to a set of predictors to assemble a set of sub-clusters. A set of subgroups may be established by merging a plurality of individual sub-clusters of the set of sub-clusters using a tightness factor. A subset of the subgroups may be selected based on a selection criterion. A set of insight data which indicates a profile of the subset of the set of subgroups with respect to the set of data may be compiled for the subset of the set of subgroups.

Подробнее
11-10-2018 дата публикации

TRAINING QUESTION DATASET GENERATION FROM QUERY DATA

Номер: US20180293508A1
Принадлежит:

A training query generation system is usable to generate fully formed training questions from prior search queries, The system may include a training dataset generation service that is configured to receive a message containing an indicator of a topic over a network, access a template database containing a plurality of question templates and associated topic indicators, each question template including one or more argument values indicative of a role, and select those question templates from the template database that match the received topic indicator. The training dataset generation service also is configured to select entities from an entity database that map to the argument values of the selected question templates and generate a plurality of fully formed questions, each generated fully formed question comprising a character string containing the question template and at least one of the selected agents.

Подробнее
01-03-2012 дата публикации

System and method for providing search query refinements

Номер: US20120054216A1
Автор: Paul Haahr, Steven Baker
Принадлежит: Google LLC

A system and method for providing search query refinements are presented. A stored query and a stored document are associated as a logical pairing. A weight is assigned to the logical pairing. The search query is issued and a set of search documents is produced. At least one search document is matched to at least one stored document. The stored query and the assigned weight associated with the matching at least one stored document are retrieved. At least one cluster is formed based on the stored query and the assigned weight associated with the matching at least one stored document. The stored query associated with the matching at least one stored document are scored for the at least one cluster relative to at least one other cluster. At least one such scored search query is suggested as a set of query refinements.

Подробнее
29-03-2012 дата публикации

System and method to extract models from semi-structured documents

Номер: US20120078969A1
Принадлежит: International Business Machines Corp

Systems and associated methods for automated and semi-automated building of domain models for documents are described. Embodiments provide an approach to discover an information model by mining documentation about a particular domain captured in the documents. Embodiments classify the documents into one or more types corresponding to concepts using indicative words, identify candidate model elements (concepts) for document types, identify relationships both within and across document types, and consolidate and learn a global model for the domain.

Подробнее
19-04-2012 дата публикации

Hiring Decisions Through Validation Of Job Seeker Information

Номер: US20120095933A1
Автор: David Goldberg
Принадлежит: Individual

The present methods and systems relate to means for job seekers to provide more detailed information to prospective employers to aid in job hiring decisions. The means comprises validation of job history, education, and skills information provided by the job seeker, which can comprise third party support and feedback. The means further comprises a display of information that comprises the relative strength of the validation, as well as possible means of further validation for the prospective employer to use. The means further comprises a skills assessment that is taken by the job seeker at a time of the seekers time and place convenience, but that can be validated at the place of employment. The means also provides the job seeker with information about how they could improve their employment prospects.

Подробнее
19-04-2012 дата публикации

Collapsed gibbs sampler for sparse topic models and discrete matrix factorization

Номер: US20120095952A1
Принадлежит: Xerox Corp

In an inference system for organizing a corpus of objects, feature representations are generated comprising distributions over a set of features corresponding to the objects. A topic model defining a set of topics is inferred by performing latent Dirichlet allocation (LDA) with an Indian Buffet Process (IBP) compound Dirichlet prior probability distribution. The inference is performed using a collapsed Gibbs sampling algorithm by iteratively sampling (1) topic allocation variables of the LDA and (2) binary activation variables of the IBP compound Dirichlet prior In some embodiments the inference is configured such that each inferred topic model is a clean topic model with topics defined as distributions over sub-sets of the set of features selected by the prior. In some embodiments the inference is configured such that the inferred topic model associates a focused sub-set of the set of topics to each object of the training corpus.

Подробнее
26-04-2012 дата публикации

Message thread searching

Номер: US20120102037A1
Автор: Mehmet Kivanc Ozonat
Принадлежит: Hewlett Packard Development Co LP

In one general aspect, a set of representations of message thread contents is decomposed into clusters of representations of message thread contents determined to be similar. Similarly, a set of representations of message thread titles is decomposed into clusters of representations of message thread titles determined to be similar, where the act of decomposing the set of representations of message thread titles is influenced by the act of decomposing the set of representations of message thread contents. In another general aspect, a search query is received and compared to representations of clusters of message threads (e.g., a cluster of representations of message thread titles). Based on this comparison, a particular cluster of message threads then is identified as matching the search query.

Подробнее
26-04-2012 дата публикации

System and method for providing topic cluster based updates

Номер: US20120102121A1
Принадлежит: Yahoo Inc until 2017

The present invention is directed towards a method and system for providing a recommendation set. The method and system includes determining various topic clusters from single topic clusters. The method and system further includes identifying various topic clusters for an identified single topic cluster and providing recommendations from the various topic clusters via web updates.

Подробнее
28-06-2012 дата публикации

Continuous content refinement of topics of user interest

Номер: US20120167010A1
Принадлежит: Yahoo Inc until 2017

Techniques are disclosed for a user interface that provides active assistance to discover, investigate, refine and save multiple topics of interest, i.e., a topic incubator where user interests are discovered, quickly developed to maturity and preserved. Each topic may have an independent topic interface with independently suggested topics. In each topic interface, users may control topic development by selecting saved topics, dynamically suggested topics and manually entered topics. Suggested topics may be based on saved interests, related interests and/or browsed content. Suggested topics may differ between topic interfaces and may change with topics. Suggested topics may be continuously refined or updated based on existing topics, changed topics, selected suggested topics and selected content. Users control treatment of selected topics individually or as refinements of (combinations with) other topics to create compound topics. Users replace or refine existing topics in existing topic interfaces or branch off topics into additional topic interfaces.

Подробнее
16-08-2012 дата публикации

Real-time data mining

Номер: US20120209852A1
Принадлежит: International Business Machines Corp

A significant recent trend in the internet and mobile telephony has been the dominance of user generated content. As such, in mobile technology have permitted users to upload content onto the internet, whereby sites provide an easily accessible and manageable medium for users to share their thoughts and form a portal for media-rich exchanges. It has been found that much of what is exchanged by users in such settings is context-sensitive, ranging from users' moods and opinions, to communication about users' plans. Broadly contemplated herein, in accordance with at least one embodiment of the invention, is the employment of data mining in information repositories settings to efficiently classify an information stream in real-time and thereby discern user intent.

Подробнее
11-10-2012 дата публикации

Document clustering system, document clustering method, and recording medium

Номер: US20120259855A1
Принадлежит: NEC Corp

In the provided document clustering system ( 100 ), a concept tree structure accumulation unit ( 11 ) stores a concept tree structure that represents a hierarchical relationship among concepts represented by each of a plurality of words. For any two words, a concept similarity computation unit ( 12 ) obtains a concept similarity, which is an index indicating how close the concepts represented by the two words are. Using concept similarities for words that appear in two documents in a document set, an inter-document similarity computation unit ( 13 ) obtains an inter-document similarity, which indicates how similar the two documents are semantically. A clustering unit ( 14 ) uses inter-document similarities to cluster the documents in the document set.

Подробнее
29-11-2012 дата публикации

Information processing device, information processing method, and computer-readable recording medium

Номер: US20120303611A1
Принадлежит: NEC Corp

The information processing device 1 processes document collections having tags permitting semantic class identification appended to each document and comprises a search unit 2 , which creates multiple semantic class units containing one, two, or more semantic classes based on a taxonomy that identifies relationships between semantic classes, and a frequency calculation unit 3 which, for each of the semantic class units, identifies documents that match that semantic class unit in the document collections and, for these matching documents, calculates a first frequency that represents the frequency of occurrence in a designated document collection and a second frequency that represents the frequency of occurrence in non-designated document collections. Once the calculations have been performed, the search unit 2 identifies any of the semantic class units based on the first frequency and the second frequency of the matching documents.

Подробнее
10-01-2013 дата публикации

Classification method and apparatus

Номер: US20130013537A1
Автор: Harry Urbschat, Pal Rujan
Принадлежит: BDGB Enterprise Software SARL

A method and system for building a classification model for classifying documents comprising: representing each of a plurality of documents by a vector of n dimensions, said n dimensions forming a vector space; and representing the classification of already classified documents into classes by separating said vector space into a plurality of subspaces by one or more hyperplanes.

Подробнее
04-04-2013 дата публикации

System and method for tracking patent ownership change

Номер: US20130086043A1
Автор: Steven W. Lundberg
Принадлежит: Black Hills IP Holdings LLC

A computer-implemented method and system are provided for automatically tracking change in ownership status of patents listed in a database at a patent registry. The method comprises receiving input from a user identifying one or more patents of interest to be tracked and, based on the input received, automatically performing a search of the registry database to identify changes in ownership status for any one or more of the patents of interest. The search results are formatted and transmitted to a user. The automatic search may be conducted on a regular basis thereby notifying as user of any intervening changes in ownership.

Подробнее
04-04-2013 дата публикации

Patent mapping

Номер: US20130086045A1
Автор: Steven W. Lundberg
Принадлежит: Black Hills IP Holdings LLC

System and method permit patent mapping. A method may comprise maintaining a database of patent portfolios and a database of patents with each patent stored in the database of patents associated with one or more patent portfolios stored in the database of patent portfolios. A target subject matter to be mapped is identified and a search query associated with the target subject matter is received. Search results are generated to define a first patent portfolio in the database with the search results including one or more patent claims associated with the search query. The one or more patent claims are mapped to a patent concept.

Подробнее
11-04-2013 дата публикации

Electronic discovery system

Номер: US20130091136A1
Принадлежит: Bank of America Corp

Embodiments of the invention relate to systems, methods, and computer program products for improved electronic discovery and custodian management. Embodiments herein disclosed provide for an enterprise wide e-discovery system that provides for data to be identified, located, retrieved, preserved, searched, reviewed and produced in an efficient and cost-effective manner across the entire enterprise system. In addition, by structuring management of e-discovery based on case/matter, custodian and data and providing for linkage between the same, further efficiencies are realized in terms of identifying, locating and retrieving data and leveraging results of previous e-discoveries with current requests.

Подробнее
09-05-2013 дата публикации

Online learning collaboration system and method

Номер: US20130117368A1
Принадлежит: EPALS Inc

Systems and methods for associating cohorts in a social networking environment are provided. At least two users of a social network can be associated in a cohort based at least in part on a determined relationship between the at least two users. This relationship can be different from relationships defined by the users. An analysis of one or more parameters of the at least two users in the cohort can be performed based at least in part on the cohort, wherein the one or more parameters are unrelated to association of the at least two users in the cohort. The parameters can be used to longitudinally measure parameters of the at least two users that may have been impacted by association in the cohort. At least a portion of the analysis can be communicated.

Подробнее
16-05-2013 дата публикации

Recommendations in a computing advice facility

Номер: US20130124449A1
Принадлежит: eBay Inc

According to various embodiments, a ratings matrix including matrix values is generated, each row of the ratings matrix identifying one of a plurality of users, each column of the ratings matrix identifying one of a plurality of items, and each of the matrix values corresponding to a known affinity rating describing a degree of affinity associated with one of the users and one of the items. The ratings matrix may include a missing entry representing an unknown affinity rating. According to various embodiments, a revised ratings matrix is generated by factoring the ratings matrix into a user matrix and an item matrix, the revised ratings matrix being the product of the user matrix and the item matrix and including at least one entry representing a predicted affinity rating in place of the missing entry.

Подробнее
04-07-2013 дата публикации

System and method for geographically organizing and classifying businesses on the world-wide web

Номер: US20130173629A1
Автор: Ajaipal Singh Virdy
Принадлежит: Facebook Inc

A method and search engine for classifying a source publishing a document on a portion of a network, includes steps of electronically receiving a document, based on the document, determining a source which published the document, and assigning a code to the document based on whether data associated with the document published by the source matches with data contained in a database. An intelligent geographic- and business topic-specific resource discovery system facilitates local commerce on the World-Wide Web and also reduces search time by accurately isolating information for end-users. Distinguishing and classifying business pages on the Web by business categories using Standard Industrial Classification (SIC) codes is achieved through an automatic iterative process.

Подробнее
15-08-2013 дата публикации

Apparatus for clustering a plurality of documents

Номер: US20130212106A1
Автор: Takeshi Inagaki
Принадлежит: International Business Machines Corp

According to an aspect, there are provided an apparatus, a program for causing a computer to function as such an apparatus, and a method, wherein the apparatus includes a selection section for selecting a plurality of sample documents from a plurality of documents and a first parameter generation section for analyzing the plurality of sample documents to generate an initial parameter matrix expressing a probability that each of a plurality of words included in the plurality of sample documents is included in each of a plurality of topics. The apparatus also includes a second parameter generation section for analyzing the plurality of documents by using each value included in the initial parameter matrix as an initial value to generate a parameter matrix expressing a probability that each of a plurality of words included in the plurality of documents is included in each of a plurality of topics.

Подробнее
15-08-2013 дата публикации

System and Method for Association Extraction for Surf-Shopping

Номер: US20130212110A1
Принадлежит: LinkShare Corp

The present disclosure is directed to a computer system and method performed by a selectively programmed data processor for providing data to a Web page such that items are presented to the user in a way that imitates a real world shopping experience. Various aspects of the disclosed technology also relate to systems and methods for calculating product or category associations using associative relation extraction. Additional aspects of the disclosed technology relate to automatic topic discovery, and event and category matching.

Подробнее
22-08-2013 дата публикации

Multi-Concept Latent Semantic Analysis Queries

Номер: US20130218554A1
Автор: Paul A. Jakubik
Принадлежит: Individual

A method includes accessing text, identifying a plurality of terms from the text, determining a plurality of term vectors associated with the identified plurality of terms, and clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors. The method further includes creating a first pseudo-document according to the first cluster, creating a second pseudo-document according to the second cluster, identifying a first set of terms associated with the first cluster using latent semantic analysis (LSA) of the first pseudo-document, identifying a second set of terms associated with the second cluster using LSA of the second pseudo-document, and combining the first and second sets of terms into a list of output terms.

Подробнее
19-09-2013 дата публикации

Population clustering through density based merging

Номер: US20130246434A1
Принадлежит: Leland Stanford Junior University

A method and/or system for analyzing data using population clustering through density based merging.

Подробнее
09-01-2014 дата публикации

Multilabel classification by a hierarchy

Номер: US20140012849A1
Принадлежит: Individual

A technique of extracting hierarchies for multilabel classification. The technique can process a plurality of labels related to a plurality of documents, using a clustering process, to cluster the labels into plurality of clusterings representing a plurality of classes. The technique classifies the documents and predicts a plurality of performance characteristics, respectively, for the plurality of clusterings. The technique selects at least one of the clusterings using information from the performance characteristics and adds the selected clustering into a resulting hierarchy.

Подробнее
06-01-2022 дата публикации

SYSTEMS AND METHODS FOR FILTERING ELECTRONIC ACTIVITIES BY PARSING CURRENT AND HISTORICAL ELECTRONIC ACTIVITIES

Номер: US20220006873A1
Принадлежит: People.ai, Inc.

The present disclosure relates to systems and methods for filtering electronic activities. The method includes identifying an electronic activity. The method includes parsing the electronic activity to identify one or more electronic accounts in the electronic activity. The method includes determining, responsive to parsing the electronic activity, that the electronic activity is associated with an electronic account of the one or more electronic accounts. The method includes selecting, based on the electronic account, one or more filtering policies associated with the data source provider to apply to the electronic activity. The method includes determining, by applying the selected one or more filtering policies to the electronic activity, to restrict the electronic activity from further processing based on the electronic activity satisfying at least one of the selected one or more filtering policies. The method includes restricting, the electronic activity from further processing. 1. A method comprising:identifying, by one or more processors, an electronic activity associated with a data source provider;parsing, by the one or more processors, the electronic activity to identify one or more electronic accounts in the electronic activity;determining, by the one or more processors, responsive to parsing the electronic activity, that the electronic activity is associated with an electronic account of the one or more electronic accounts, the electronic account corresponding to the data source provider;selecting, by the one or more processors based on the electronic account, one or more filtering policies associated with the data source provider to apply to the electronic activity, the selected one or more filtering policies including at least one of i) a keyword policy configured to restrict electronic activities including a predetermined keyword; ii) a regex pattern policy configured to restrict electronic activities including one or more character strings that match ...

Подробнее
05-01-2017 дата публикации

Natural language interpretation of hierarchical data

Номер: US20170004206A1
Принадлежит: International Business Machines Corp

A computer-implemented method includes receiving a search label and accessing a hierarchical data source comprising a plurality of nodes. One node may be a context node. The method further includes determining a similarity score between the search label and a node label of each node, determining a contextual score between the context node and each node, combining, for each node, the similarity score with the contextual score to yield a combined score, and returning a result. The result may be based on ordering the plurality of nodes according to each node's combined score. A corresponding computer program product and computer system are also disclosed.

Подробнее
27-01-2022 дата публикации

TECHNIQUES FOR INFORMATION RANKING AND RETRIEVAL

Номер: US20220027400A1
Автор: Ralhan Dushyant
Принадлежит: State Street Corporation

Systems, methods, apparatuses, and computer-readable media for information ranking and retrieval are described. In one embodiment, an apparatus may include a processor and a memory storing instructions which when executed by the processor cause the processor to access an ingested document, generate a converted document from the ingested document based on a conversion configuration, the converted document comprising at least one paragraph, and generate an index based on the converted document and an index configuration. Other embodiments are described. 120.-. (canceled)21. An apparatus , comprising:a processor; anda memory storing instructions which when executed by the processor cause the processor to:access an ingested document;generate a converted document from the ingested document based on a conversion configuration, the converted document comprising at least one paragraph;generate an index, according to an index configuration, based on the converted document;retrieve and rank, via a rank model, additional ingested documents, the rank model comprising a machine learning model;save feedback associated with ranking the additional ingested documents; andretrain the rank model based on the feedback.22. The apparatus of claim 21 , further comprising instructions to cause the processor to generate an operating environment for information rank and retrieval processes to retrieve and rank via the rank model.23. The apparatus of claim 22 , further comprising instructions to cause the processor to generate claim 22 , in the operating environment via a user interface (UI) template image and based on environment configurations claim 22 , containers for application specific client UIs for the information rank and retrieval processes.24. The apparatus of claim 23 , wherein the UI template image comprises a Docker template image claim 23 , the containers comprise Docker containers claim 23 , and the environment configurations comprise application environment configurations.25. ...

Подробнее
09-01-2020 дата публикации

Unstructured data clustering of information technology service delivery actions

Номер: US20200012728A1
Принадлежит: International Business Machines Corp

Systems, methods, and computer program products relating to clustering unstructured data. A set of unstructured documents is tokenized to produce a plurality of tokens. A frequency at which terms appear in the plurality of tokens is analyzed, to generate a vocabulary of terms. A vocabulary indices matrix is generated based on the generated vocabulary of terms. The matrix relates to the set of unstructured documents. A plurality of rows in the vocabulary indices matrix are matched to generate a plurality of clusters for the set of unstructured documents.

Подробнее
03-02-2022 дата публикации

REAL-TIME ADAPTIVE OPERATIONS PERFORMANCE MANAGEMENT SYSTEM

Номер: US20220036264A1
Принадлежит:

Embodiments are directed to managing operations. If Operations events are provided, event clusters may be associated with one or more Operations events, such that the Operations events may be associated with the event clusters based on characteristics of the Operations events. Metrics including resolution metrics, root cause analysis, notes, and other remediation information may be associated with the event clusters. Then a modeling engine may be employed to train models based on the Operations events, the event clusters, and the resolution metrics, such that the trained model may be trained to correlate and predict the resolution metrics from real-time Operations events. If real-time Operations events may be provided, the trained models may be employed to predict the resolution metrics that are associated with the real-time Operations events. If model performance degrades beyond accuracy requirements, new observations may be added to the training set and the model re-trained. 1 providing, by the one or more processors, one or more event clusters that are associated with one or more Operations events;', 'associating, by the one or more processors, one or more resolution metrics with the one or more event clusters;', 'employing, by the one or more processors, a modeling engine to train one or more models based on the one or more Operations events, the one or more event clusters, and the one or more resolution metrics, wherein the one or more trained models are stored in a datastore; and, 'employing a plurality of provided Operations events to perform further actions, includingretrieving, by the one or more processors, the one or more trained models, from the datastore, that are used to identify the one or more resolution metrics that are associated with one or more real-time Operations events.. A method for managing operations for organizations over a network using one or more network computers that include one or more processors that perform actions, comprising: ...

Подробнее
03-02-2022 дата публикации

SYSTEMS AND METHODS FOR IDENTIFYING A SEQUENCE OF EVENTS AND PARTICIPANTS FOR RECORD OBJECTS

Номер: US20220038548A1
Принадлежит: People.ai, Inc.

Methods, systems, and storage media for identifying a sequence of events and participants for record objects are disclosed. Exemplary implementations may: access record objects of a system of record; identify a subset of record objects associated with a group entity and having a first record object status; identify one or more electronic activities linked to the record objects; determine an event-participant pattern based on the electronic activities linked to the record object; identify electronic activities linked with a second record object; determine that a first event is performed by the a participant type and a second event is not yet performed by a second participant type; generate a content item identifying an action to trigger a performance of the second event; and transmit the content item to a device of a participant of at least one electronic activity linked with the second record object. 1. A method , comprising:accessing, by one or more processors, a plurality of record objects of a system of record of a data source provider, each record object of the plurality of record objects corresponding to a respective group entity, each record object comprising one or more object field-value pairs and linked to one or more electronic activities;identifying, by the one or more processors, a subset of record objects of the plurality of record objects associated with a group entity and having a first record object status;identifying, by the one or more processors, for each record object of the subset of record objects, one or more electronic activities linked to the record object, each electronic activity identifying one or more participants and corresponding to at least one event;determining, by the one or more processors, for each record object of the subset of record objects, an event-participant pattern based on the electronic activities linked to the record object, the event-participant pattern including at least a first event performed by a first participant ...

Подробнее
21-01-2021 дата публикации

EFFICIENT STORAGE AND RETRIEVAL OF TEXTUAL DATA

Номер: US20210019340A1
Автор: Gupta Vipul
Принадлежит: Microsoft Technology Licensing, LLC

A method of and system of efficient storage of data entries containing textual data is disclosed. The method may include accessing a plurality of data entries in a dataset, arranging the plurality of data entries in the dataset in a lexical order, placing a predetermined number of the plurality of data entries in each of a plurality of subblocks, performing data compression on the plurality of data entries in each of the plurality of subblocks to reduce redundancy in the plurality of data entries and create compressed data entries, placing one or more subblocks in each of a plurality of page blocks, and storing each of the plurality of page blocks in a storage device to provide efficient searching and improved functionality for the dataset. 1. A data processing system comprising:a processor; and accessing a plurality of data entries in a dataset;', 'arranging the plurality of data entries in the dataset in a lexical order;', 'placing a predetermined number of the plurality of data entries in each of a plurality of subblocks;', 'performing data compression on the plurality of data entries in each of the plurality of subblocks to reduce redundancy in the plurality of data entries and create compressed data entries;', 'placing one or more subblocks in each of a plurality of page blocks; and', 'storing each of the plurality of page blocks in a storage device to provide efficient searching and improved functionality for the dataset., 'a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the data processing system to perform functions of2. The data processing system of claim 1 , wherein the plurality of subblocks are in a lexical order.3. The data processing system of claim 2 , wherein the plurality of page blocks are in a lexical order.4. The data processing system of claim 1 , wherein performing data compression on the plurality of data entries in each of the plurality of subblocks to ...

Подробнее
10-02-2022 дата публикации

GENERATING ENTITY RELATION SUGGESTIONS WITHIN A CORPUS

Номер: US20220043848A1
Принадлежит:

Aspects of the invention include a computer-implemented method for entity relation type detection. The method includes detecting a plurality of candidate co-occurring entities from one or more documents. A first set of co-occurring entities and a second set of co-occurring entities from the plurality of co-occurring entities is grouped based on a synonymity of a first set of entity types associated with the first set of co-occurring entities and a second set of entity types associated with the second set of co-occurring entities. A synonymity of a first set of intervening tokens associated with the first set of co-occurring entities and a second set of intervening tokens associated with the second set of co-occurring entities is detected. A relation entity type label is generated based on a conflation of two or tokens of the first set of intervening tokens 1. A computer-implemented method comprising:detecting, by a processor, a plurality of candidate co-occurring entities from one or more documents;grouping, by the processor, a first set of co-occurring entities and a second set of co-occurring entities from the plurality of co-occurring entities based on a synonymity of a first set of entity types associated with the first set of co-occurring entities and a second set of entity types associated with the second set of co-occurring entities;detecting, by the processor, a synonymity of a first set of intervening tokens associated with the first set of co-occurring entities and a second set of intervening tokens associated with the second set of co-occurring entities; andgenerating, by the processor, a relation entity type label based on a conflation of two or tokens of the first set of intervening tokens.2. The computer-implemented method of claim 1 , wherein the synonymity between the first set of co-occurring entities and the second set of co-occurring entities is determined as follows:mapping the first set of co-occurring entities to a first word vector and mapping ...

Подробнее
10-02-2022 дата публикации

Cluster analysis method, cluster analysis system, and cluster analysis program

Номер: US20220043851A1
Принадлежит: Aixs Inc

A server 4 executes a similarity calculation step (S 2 ) of calculating similarity between content of one document and content of another document, a cluster classification step (S 3 ) of generating a network in which a document is set as a node based on calculated similarity and similar nodes are connected by an edge, and performing classification based on similar documents, a first index calculation step (S 4 ) of calculating a first index indicating centrality of a document in the network, a second index calculation step (S 5 ) of calculating a second index that is different from the first index in the network and indicates importance of a document, and a display data generation step (S 6 ) of generating, regarding a document, first display data indicating the network by an expression of a size of an object of a node according to the first index, an expression of a gauge having a shape corresponding to a shape of the object according to the second index and a length of the gauge, an expression according to a type of the cluster, and an expression according to magnitude of similarity between documents.

Подробнее
10-02-2022 дата публикации

SYSTEM AND/OR METHOD FOR GENERATING CLEAN RECORDS FROM IMPERFECT DATA USING MODEL STACK(S) INCLUDING CLASSIFICATION MODEL(S) AND CONFIDENCE MODEL(S)

Номер: US20220044126A1
Принадлежит:

Techniques relating to managing “bad” or “imperfect” data being imported into a database system are described herein. A lifecycle technology solution helps receive data from a variety of different data sources of a variety of known and/or unknown formats, standardize it, fit it to a known taxonomy through model-assisted classification, store it to a database in a manner that is consistent with the taxonomy, and allow it to be queried for a variety of different usages. Auto-classification, enrichment, clustering model and model stacks, and/or other disclosed techniques, may be used in these and/or other regards. 1an input interface configured to receive documents comprising data entries, at least some of the data entries having associated features represented directly in the documents;a non-transitory computer readable storage medium comprising a data warehouse configured to store curated and classified data elements;a model registry storing at least one model stack, each model stack including at least one classification model and at least one confidence model that is separate from the at least one classification model in the respective model stack, each classification model being configured to generate proposed classifications in accordance with a multi-element taxonomy, with each element in the multi-element taxonomy denoting a respective category, each confidence model being configured to make a trust or do not trust decision relative to a proposed classification generated by the at least one classification model in the associated model stack; and inspect documents received via the input interface to identify data entries and associated features located in the inspected documents;', 'identify one or more model stacks from the model registry for execution on the identified data entries;', 'execute the at least one classification model in the identified one or more model stacks to generate proposed classifications for the identified data entries;', 'execute the at ...

Подробнее
10-02-2022 дата публикации

SYSTEM AND METHOD FOR PEER GROUP DETECTION, VISUALIZATION AND ANALYSIS IN IDENTITY MANAGEMENT ARTIFICIAL INTELLIGENCE SYSTEMS USING CLUSTER BASED ANALYSIS OF NETWORK IDENTITY GRAPHS

Номер: US20220046086A1
Принадлежит:

Systems and methods for graph based artificial intelligence systems for identity management systems are disclosed. Embodiments of the identity management systems disclosed herein may utilize a network graph approach to peer grouping of identities of distributed networked enterprise computing environment. Specifically, in certain embodiments, data on the identities and the respective entitlements assigned to each identity as utilized in an enterprise computer environment may be obtained by an identity management system. A network identity graph may be constructed using the identity and entitlement data. The identity graph can then be clustered into peer groups of identities. The peer groups of identities may be used by the identity management system and users thereof in risk assessment or other identity management tasks. 1. An identity management system , comprising:a memory;a processor;a non-transitory, computer-readable storage medium including computer instructions for:presenting a peer group interface; a node for each of the first set of identities, and', 'an edge between a first node and a second node for each first identity and second identity that share at least one entitlement of the first set of entitlements, wherein the first node and the second node respectively represent the first identity and the second identity and where the edge has a weight based on the at least one shared entitlement between the first identity and the second identity; and, 'presenting a peer group determined from an identity graph through the peer group interface, wherein the identity graph was created from identity management data, the identity management data utilized in identity management in a distributed enterprise computing environment and comprising data on a first set of identities and a first set of entitlements associated with the first set of identities, wherein the identity graph includeswherein the peer group was determined by:pruning a first set of edges of the identity ...

Подробнее
01-02-2018 дата публикации

Tool for mining chat sessions

Номер: US20180032533A1
Принадлежит: Bank of America Corp

A method comprises counting, in a transcript of a chat session between a first user and a second user, for each theme of a plurality of themes, a number of occurrences of each keyword of a plurality of keywords assigned to a theme of a plurality of themes. The method further comprising identifying one or more themes of the chat session based on the number of occurrences of each keyword, counting the number of occurrences of a word of a first set of words and a word of a second set of words in the transcript, and assigning the transcript into a first group or a second group based on the one or more identified themes and the number of occurrences of first words and second words.

Подробнее
04-02-2021 дата публикации

Method and device for identifying a user interest, and computer-readable storage medium

Номер: US20210034819A1
Принадлежит: Ping An Technology Shenzhen Co Ltd

Disclosed is a method for identifying a user interest, including: obtaining training samples and test samples, the training samples being obtained by manually labeling after the corresponding topic models have been trained based on text data; extracting characteristics of the training samples and of the test samples, and computing optimal model parameters of a logistic regression model by an iterative algorithm based on the characteristics of the training samples; evaluating the logistic regression model based on the characteristics of the test samples and an area AUC under an ROC curve to train and obtain a first theme classifier; determining a theme to which the text data belongs using the first theme classifier, computing a score of the theme to which the text data belongs based on the logistic regression model, and computing a confidence score of the user being interested in the theme according to a second preset algorithm. Further disclosed are a device for identifying a user interest and a computer-readable storage medium.

Подробнее
30-01-2020 дата публикации

Filtering electronic messages

Номер: US20200036666A1
Принадлежит: Slice Technologies Inc

Improved systems and methods for automatically discovering and filtering electronic messages. These systems and methods improve the operation of computer apparatus to achieve dramatic reductions in processing resources, data storage resources, network resources, and filter production times compared to conventional approaches. In some examples, improvements result from configuring computer apparatus to perform a unique sequence of specific electronic message processing rules in a network communications environment. In this regard, these examples are able to automatically learn the structures and semantics of machine generated electronic message headers, accelerating the ability to support new message sources and new markets. These examples provide a purchase related electronic message discovery and filtering service that is able to identify and filter purchase related electronic messages with high accuracy across a wide variety of electronic message formats.

Подробнее
11-02-2016 дата публикации

System and Method for Storing Data in Clusters Located Remotely from Each Other

Номер: US20160042056A1
Принадлежит: Avere Systems Inc

A system for storing data includes a plurality of clusters located remotely from each other in which the data is stored. Each cluster has a token server that controls access to the data with only one token server responsible for any piece of data. Each cluster has a plurality of Cache appliances. Each cluster has at least one backend file server in which the data is stored. The system includes a communication network through which the servers and appliances communicate with each other. A Cache Appliance cluster in which data is stored in back-end servers within each of a plurality of clusters located remotely from each other. A method for storing data.

Подробнее
01-05-2014 дата публикации

Computer-Implemented System And Method For Clustering Documents Based On Scored Concepts

Номер: US20140122495A1
Принадлежит: FTI TECHNOLOGY LLC

A computer-implemented system and method for clustering documents based on scored concepts is provided. A set of documents is obtained and concepts are extracted from the documents. A score is calculated for each concept. The score is determined as a function of summation of a frequency of occurrence, concept weight, structural weight, and corpus weight. The documents in the set are clustered based on the scores. A vector is formed for each document based on the concepts in that document and the scores associated with the concepts. A similarity is determined between each document and each of the other documents based on the formed vectors. Those documents that are sufficiently distinct from the other documents are identified as seed documents for separate document clusters. Each of the remaining documents are grouped into one of the clusters most similar to that remaining document.

Подробнее
01-05-2014 дата публикации

Systems and methods for secure storage of user information in a user profile

Номер: US20140122508A1
Принадлежит: Fhoosh Inc

Systems and methods are provided for securely storing information of a user in a user profile to prevent access to the information and minimize the amount of information disclosed during a security breach. Information pertaining to a user is obtained from one or more sources and organized into a user profile and securely stored in a database. The user profile may be stored remotely in a cloud-based system at a remote encrypted server, with portions of the profile stored in separate locations with separate encryption to minimize the risk of unauthorized access to one portion of the information. The fields of data in the user profile may also be separately encrypted with separate encryption keys and separately stored in separate databases to minimize the amount of information which could be disclosed by the unauthorized access to a single encryption key or a single database.

Подробнее
18-02-2021 дата публикации

Computer implemented method and a computer system for document clustering and text mining

Номер: US20210049206A1
Принадлежит: Individual

A computer implemented method for document clustering comprises receiving one or more documents via one or more input means, arranging the one or more documents into a term-document matrix using term frequency-inverse document frequency, removing and stemming of one or more common clutter/stop words from the one or more documents, extracting one or more features from the one or more documents using non-negative matrix factorization (NMF) and k means, determining one or more vectors based on the one or more features, implementing k-means clustering thereby iterating the one or more documents and the one or more features and clustering the one or more documents based on similarity between the extracted one or more features and the each of the one or more documents.

Подробнее
08-05-2014 дата публикации

Method for inputting instruction and portable electronic device and computer readable recording medium

Номер: US20140125607A1
Принадлежит: Wistron Corp

A method for inputting instruction, a portable electronic device and a computer readable recording medium are provided. The method includes detecting taps applied on a touch screen, and determining whether tap positions of the taps belongs to the same group. The method also includes dividing the tap positions of the taps to groups if the tap positions of the taps do not belong to the same group, generating group flags according to the groups, and sorting the group flags according to a tap order of the taps, so as to generate a group flag sequence. In addition, the method further includes generating an operating instruction according to the group flag sequence.

Подробнее
25-02-2021 дата публикации

METHOD FOR MULTI-MODAL RETRIEVAL AND CLUSTERING USING DEEP CCA AND ACTIVE PAIRWISE QUERIES

Номер: US20210056127A1
Принадлежит:

A method for embedding learning and clustering for paired multi-modal data using deep canonical correlation analysis and active learning with pairwise queries is presented. The method includes collecting time-series data from a plurality of sensors, training, in an unsupervised manner, a cross-modal retrieval system by using the time-series data and relevant comment texts, depending on a modality of a query, retrieving the relevant comment texts from a time-series segment of the time-series data, the relevant comment texts used as human-readable explanations of a query segment, retrieving relevant time-series segments given a sentence or a set of keywords such that the relevant time-series segments match the sentence or set of keywords, and retrieving the relevant time-series segments given the time-series segment and the sentence or set of keywords such that a first subset of attributes match the set of keywords and a second subset of attributes resembles the time-series segment. 1. A computer-implemented method executed on a processor for embedding learning and clustering for paired multi-modal data using deep canonical correlation analysis (CCA) and active learning with pairwise queries , the method comprising:collecting time-series data from a plurality of sensors;training, in an unsupervised manner, a cross-modal retrieval system by using the time-series data and relevant comment texts; retrieving the relevant comment texts from a time-series segment of the time-series data, the relevant comment texts used as human-readable explanations of a query segment;', 'retrieving relevant time-series segments given a sentence or a set of keywords such that the relevant time-series segments match the sentence or set of keywords; and', 'retrieving the relevant time-series segments given the time-series segment and the sentence or set of keywords such that a first subset of attributes match the set of keywords and a second subset of attributes resembles the time-series ...

Подробнее
25-02-2021 дата публикации

ENTITY SEARCH METHOD, RELATED DEVICE, AND COMPUTER STORAGE MEDIUM

Номер: US20210056130A1
Принадлежит:

An entity search method, a related device, and a computer storage medium are provided. The method includes: determining a first classifier and a second classifier that are included in query information; determining, based on a first entity library and the first classifier and the second classifier that are included in the query information, s correlations corresponding to each of w candidate entities, wherein information about the candidate entity includes a third classifier and a fourth classifier, and the correlation indicates a correlation between the classifiers in the query information and the classifiers in the candidate entity; and determining, based on the s correlations corresponding to each of the w candidate entities, information about a target entity corresponding to the query information, wherein the target entity is an entity in the w candidate entities. 1. An entity search method applied to a terminal device , the method comprising:determining a first classifier and a second classifier that are comprised in query information, wherein the first classifier is a word that is in the query information and that represents a type of a query result, and the second classifier is a word in the query information other than the first classifier;determining, based on a first entity library and the first classifier and the second classifier that are comprised in the query information, s correlations corresponding to each of w candidate entities, wherein the first entity library comprises information about each of the w candidate entities, the information about each candidate entity comprises a third classifier and a fourth classifier, the third classifier and the first classifier belong to a same classification, the fourth classifier and the second classifier belong to a same classification, each of the s correlations indicates a correlation between a classifier in the query information and a classifier in the candidate entity, both w and s are positive integers, ...

Подробнее
25-02-2021 дата публикации

EXPANDED CONCEPT MATCHING

Номер: US20210056171A1
Принадлежит:

Methods, systems, and computer program products for expanded concept matching are provided. Aspects include receiving an ontology, determining a set of target concepts, building a cache from the ontology, the cache comprising a set of expressions extracted from the ontology for each target concept in the set of target concepts, receiving a document, determining a first segment of text in the document based on the set of target concepts, and annotating the first segment of text by comparing the set of target concepts to the set of expressions in the cache. 1. A computer-implemented method for expanded concept matching , the method comprising:receiving an ontology;determining a set of target concepts;building a cache from the ontology, the cache comprising a set of expressions extracted from the ontology for each target concept in the set of target concepts;receiving a document;determining a first segment of text in the document based on the set of target concepts; andannotating the first segment of text by comparing the set of target concepts to the set of expressions in the cache.2. The computer-implemented method of claim 1 , wherein building the cache comprises:analyzing the ontology to determine a set of targets based on the set of target concepts, wherein each target in the set of targets comprises one or more associated modifiers; anddetermining a concept unique identifier for each targets in the set of targets.3. The computer-implemented method of claim 2 , wherein building the cache further comprises:determining at least one synonym for a target in the set of targets.4. The computer-implemented method of claim 2 , wherein building the cache further comprises:determining at least one antonym for a target in the set of targets.5. The computer-implemented method of claim 2 , wherein building the cache further comprises:determining at least one colloquial variant for a target in the set of targets.6. The computer-implemented method of claim 1 , wherein annotating ...

Подробнее
13-02-2020 дата публикации

VERIFYING TEXTUAL CLAIMS WITH A DOCUMENT CORPUS

Номер: US20200050621A1
Автор: Malon Christopher
Принадлежит:

A system verifies textual claims using a document corpus. The system includes a memory for storing program code and a processor device for running the code to retrieve documents from the corpus based on Term Frequency Inverse Document Frequency (TFIDF) similarity to a set of textual claims. The processor extracts named entities and capitalized phrases from the textual claims. The processor retrieves documents from the corpus with titles matching any of the extracted named entities and capitalized phrases. The processor extracts premise sentences from the retrieved documents. The processor classifies the premise sentences together with sources of the premises sentences against the textual claims to obtain classifications from among possible classifications including a supported, an unverified, or a contradicted classification. The processor aggregates the classifications over the premise sentences to selectively output, for each textual claim, an overall decision of the supported classification, the unverified classification, or the contradicted classification. 1. A system for verifying textual claims using a document corpus , comprising:a memory for storing program code; and retrieve documents from the document corpus based on Term Frequency Inverse Document Frequency (TFIDF) similarity to a set of textual claims;', 'extract named entities and capitalized phrases from the textual claims;', 'retrieve documents from the document corpus with titles matching any of the extracted named entities and capitalized phrases;', 'extract premise sentences from the retrieved documents;', 'classify the premise sentences together with sources of the premises sentences against the textual claims to obtain classifications from among possible classifications including a supported classification, an unverified classification, or a contradicted classification; and', 'aggregate the classifications over the premise sentences to selectively output, for each of the textual claims, an ...

Подробнее
13-02-2020 дата публикации

Contextual Image Presentation

Номер: US20200050622A1
Принадлежит:

There are provided contextual image presentation systems and methods. Such a system includes a hardware processor and a system memory having stored therein a contextual image generator including a data mapping module and a data visualization module. The contextual image generator receives social media data describing social media posts, determines a geographical location corresponding to at least some of the social media posts, and identifies a subject category corresponding respectively to each of the social media posts. In addition, the contextual image generator groups the social media posts into social media collections based on at least one of the subject category and the geographical location corresponding to each social media post. The contextual image generator further generates a contextual image that visually associates at least one of the social media collections with the subject category and/or the geographical location used to group that social media collection. 120-. (canceled)21: A contextual image presentation system comprising: receive social media data describing social media posts;', 'identify a plurality of subject categories each corresponding respectively to at least one of the social media posts;', 'group the social media posts into social media collections based on a respective one of the plurality of subject categories corresponding to each of the social media posts;', 'generate a contextual image that visually associates at least one of the social media collections with the respective one of the plurality of subject categories used to group the at least one of the social media collections, wherein the contextual image labels the respective one of the plurality of subject categories; and', 'display the contextual image to a system user., 'a hardware processor and a system memory having stored therein a contextual image generator, wherein the hardware processor is configured to execute the contextual image generator to22: The contextual image ...

Подробнее
23-02-2017 дата публикации

Contextual Image Presentation

Номер: US20170053017A1
Принадлежит: Disney Enterprises Inc

There are provided contextual image presentation systems and methods. Such a system includes a hardware processor and a system memory having stored therein a contextual image generator including a data mapping module and a data visualization module. The contextual image generator receives social media data describing social media posts, determines a geographical location corresponding to at least some of the social media posts, and identifies a subject category corresponding respectively to each of the social media posts. In addition, the contextual image generator groups the social media posts into social media collections based on at least one of the subject category and the geographical location corresponding to each social media post. The contextual image generator further generates a contextual image that visually associates at least one of the social media collections with the subject category and/or the geographical location used to group that social media collection.

Подробнее
10-03-2022 дата публикации

NATURAL LANGUAGE PROCESSING OF UNSTRUCTURED DATA

Номер: US20220075939A1
Принадлежит:

A computer system for processing unstructured data, the computer system comprising a computer processor, a computer memory operatively coupled to the computer processor and the computer memory having disposed within it computer program instructions that, when executed by the processor, cause the computer system to carry out the steps of receiving unstructured data input from a client device, analyzing the unstructured data for features that satisfy logical segment criteria by using natural language processing (NLP), and partitioning the unstructured data into logical segments based on satisfaction of the logical segment criteria. 1. A method , in a data processing system comprising a processor and a memory , for processing unstructured data , the method comprising:receiving, by the data processing system, the unstructured data input from a client device;analyzing, by the data processing system, the unstructured data for features that satisfy logical segment criteria by using natural language processing (NLP); andpartitioning, by the data processing system, the unstructured data into logical segments based on satisfaction of the logical segment criteria, whereinthe satisfaction of the logical segment criteria includes comparing scores respectively assigned to text fragments within the logical segments to the logical segment criteria, andthe unstructured data is partitioned into the logical segments in accordance with the scores.2. The method of whereinthe unstructured data comprise text includes topics and/or content.3. The method of whereinthe analyzing the unstructured data for features further comprises using the NLP to identify text that satisfy the logical segment criteria.4. The method of whereinthe unstructured data includes compliance obligations.5. The method of whereinthe logical segment criteria include features associated with a plurality of industries or companies.6. The method of whereinthe logical segment criteria include features associated with ...

Подробнее
15-05-2014 дата публикации

Computer-Implemented System And Method For Visual Document Classification

Номер: US20140136539A1
Автор: William C. Knight
Принадлежит: FTI Consulting Inc

A computer-implemented system and method for visual document classification are provided. One or more uncoded documents, each associated with a visual representation, are obtained. Reference documents, each associated with a classification code and a visual representation of that classification code, are obtained. At least one of the uncoded documents is compared to the reference documents and the reference documents similar to the uncoded document are identified based on the comparison. A suggestion for assigning one of the classification codes to the uncoded document based on the classification codes of the similar reference documents is provided, including displaying the visual representation of the suggested classification code placed on a portion of the visual representation associated with the at least one uncoded document. An acceptance of the suggested classification code is received and a size of the displayed visual representation of the accepted classification code is increased.

Подробнее
01-03-2018 дата публикации

Computer-implemented system and method for cluster spine group arrangement

Номер: US20180061099A1
Автор: Lynne Marie Evans
Принадлежит: FTI Consulting Technology LLC

A computer-implemented system and method for cluster spine group arrangement is provided. A set of spine groups each having one or more spines of clusters and at least one singleton cluster is obtained. Unique spine groups are identified within the set and placed in a center of a display. At least a portion of the remaining spine groups in the set are placed to extend radially from the unique spine groups in the display center.

Подробнее
04-03-2021 дата публикации

UTILIZING SOURCE CONTEXT AND CLASSIFICATION IN A COPY OPERATION

Номер: US20210064448A1
Принадлежит:

Aspects of the present invention disclose a method, computer program product, and system for augmenting a copy operation with contextual information from a source. The method includes one or more processors determining that a user provides input selecting content, within a document, and input adding the selected content to a clipboard. The method further includes one or more processors determining a classification of the selected content based on analyzing the selected content and the document utilizing Natural Language Processing (NLP). The method further includes one or more processors extracting information associated with the selected content from the document based on the determined classification. The method further includes one or more processors updating the clipboard with the selected content and the extracted information. 1. A method comprising:determining, by one or more processors, that a user provides input selecting content, within a document, and input adding the selected content to a clipboard;identifying, by one or more processors, metadata, included within the document, that is associated with the selected content;identifying, by one or more processors, author preference information for the selected content based on the identified metadata, the author preference information including usage and attribution instructions that correspond to the selected content, as defined by an author associated with the selected content;extracting, by one or more processors, information associated with the selected content from the document based on the author preference information; andupdating, by one or more processors, the clipboard with the selected content and the extracted information.2. The method of claim 1 , further comprising:in response to receiving a request from the user to initiate a paste operation to a target location, pasting, by one or more processors, the selected content and the extracted information from the clipboard to the target location.3. ( ...

Подробнее
04-03-2021 дата публикации

SYSTEM FOR IDENTIFYING DUPLICATE PARTIES USING ENTITY RESOLUTION

Номер: US20210064705A1
Принадлежит:

An entity resolution system performs a method of resolving one or more candidate entities based on a data set. The entity resolution system has a rules-based module, a machine learning module, a narrative module, and an evaluation module. The rules-based module compares the first entity features to the second entity features and determines whether a rule identifies a relationship between the first entity and the second entity. The machine learning module rates a similarity of the first entity features and the second entity features. The narrative module generates a narrative output based on one or more of the rules-based module and the machine learning module, the narrative output stating an identified relationship between the first entity and the second entity. The evaluation module determines one or more metrics to apply feedback to the system. 1. A computer-implemented method for performing entity resolution in a data processing system comprising a processing device and a memory comprising instructions which are executed by the processing device , the method comprising:receiving a data set comprising first entity features describing a first entity and second entity features describing a second entity;performing, by the processing device, rules-based matching using the first entity features and the second entity features to attempt to identify a relationship between the first entity and the second entity based on one or more stored rules;performing, by the processing device, a machine learning matching using the first entity features and the second entity features to attempt to identify a relationship between the first entity and the second entity based on one or more machine learning algorithms;generating, by the processing device, a narrative output based on one or more of the rules-based matching and the machine learning matching, the narrative output stating an identified relationship between the first entity and the second entity; andproviding the narrative ...

Подробнее
04-03-2021 дата публикации

EVENT DETECTION BASED ON TEXT STREAMS

Номер: US20210064813A1
Принадлежит:

A text stream source is accessed that includes a plurality of text content items. Unique word groupings are determined for the plurality of text content items. A burst detection algorithm is executed to determine word groupings that are currently bursting and that started within a specified time period. Based on the word groupings, an issue is determined based on identifying a set of texts forming at least one clique. 1. A device comprising:one or more processors;a memory in communication with the one or more processors, the memory having computer-readable instructions stored thereupon which, when executed by the one or more processors, cause the device to perform operations comprising:accessing a text stream source comprising a plurality of entries;determining, for the plurality of entries, unique k-skip-n-grams;executing a burst detection algorithm to determine a burst level, burst start time, and burst end time of the k-skip-n-grams;based on the burst level, burst start time, and burst end time, identifying k-skip-n-grams that are currently bursting and that started within a specified time period;based on a graph of the identified k-skip-n-grams, identifying cliques where a set of texts have all words included therein;applying the burst detection algorithm over times at which feedback has occurred; andbased on determined burst detections, identifying an issue where there is no overlap of cliques.2. The device of claim 1 , wherein the cliques are identified by performing constrained z-clique finding and constrained clique percolation on the graph of the identified k-skip-n-grams.3. The device of claim 1 , further comprising computer-readable instructions stored thereupon which claim 1 , when executed by the one or more processors claim 1 , cause the device to perform operations comprising: aggregating the determined unique k-skip-n-grams prior to executing the burst detection algorithm to determine the burst level claim 1 , burst start time claim 1 , burst end ...

Подробнее
17-03-2022 дата публикации

INFORMATION PROCESSING SYSTEM AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Номер: US20220083576A1
Автор: SAKATA Yui
Принадлежит: FUJIFILM Business Innovation Corp.

An information processing system includes a processor configured to receive entry of an attribute given to a document as a search key for searching for a document, and search for the document by using a representative value of a group including plural first attributes having similar meanings as elements thereof in a case where the received search key includes any one of the first attributes, a second attribute being set as the representative value of the group. 1. An information processing system comprising{'claim-text': ['receive entry of an attribute given to a document as a search key for searching for a document, and', 'search for the document by using a representative value of a group including a plurality of first attributes having similar meanings as elements thereof in a case where the received search key includes any one of the first attributes, a second attribute being set as the representative value of the group.'], '#text': 'a processor configured to'}2. The information processing system according to claim 1 , whereinthe processor is configured to search for a document given the second attribute, which is the representative value of the group including the first attributes as the elements thereof.3. The information processing system according to claim 2 , whereinthe second attribute is given on a directory basis, which is used for hierarchical document management.4. An information processing system comprising: a processor configured toreceive entry of an attribute given to a document as a search key for searching for a document, andsearch for the document by using all elements of a group including a plurality of first attributes having similar meanings as elements thereof in a case where the received search key includes any one of the first attributes, a second attribute being set as the representative value of the group.5. The information processing system according to claim 4 , whereinthe processor is configured to search for a document given any one ...

Подробнее
17-03-2022 дата публикации

Method and system for performing summarization of text

Номер: US20220083579A1
Принадлежит: L&T Technology Services Ltd

In an embodiment, a method of performing summarization of text is disclosed. The method may include receiving an input text including a plurality of paragraphs and a user-query including one or more tokens. The method may further include segregating the input text into the plurality of paragraphs, creating a plurality of paragraph-vectors representative of the plurality of paragraphs, and clustering the plurality of paragraph-vectors to generate one or more clusters of paragraph-vectors. The method may further include determining a relevant cluster of paragraph-vectors from the one or more clusters of paragraph-vectors, based on a degree of similarity of each cluster of paragraph-vectors with the user-query. The relevant cluster of paragraph-vectors is representative of a set of relevant paragraphs from the input text. The set of relevant paragraphs corresponding to the relevant cluster of paragraph-vectors may be outputted.

Подробнее
17-03-2022 дата публикации

TEXT CLASSIFICATION DEVICE, TEXT CLASSIFICATION METHOD, AND TEXT CLASSIFICATION PROGRAM

Номер: US20220083581A1
Принадлежит:

A text classification device includes an important word extraction portion that extracts important words from analysis target text data, a distributed representation creation portion that creates distributed representations of words from related document data, a keyword candidate creation portion that extracts words near the important words as synonyms in the distributed representations of the words, a clustering portion that clusters the distributed representations of the important words and synonyms and creates a term cluster, and a viewpoint word creation portion that extracts a hypernym that is a word having a generalized concept of a term in the term cluster using a knowledge base in which relationships between terms are accumulated and creates a viewpoint dictionary in which a viewpoint word selected from the hypernyms is set as a headword and the terms included in the term cluster are set as keywords for the headword. 1. A text classification device that classifies texts included in a text log , the device comprising:an important word extraction portion that extracts important words from analysis target text data;a distributed representation creation portion that creates distributed representations of words from related document data;a keyword candidate creation portion that extracts words located near the important word in the distributed representations of words as synonyms;a clustering portion that executes clustering to the distributed representations of the important words and the synonyms to create a term cluster; anda viewpoint word creation portion that extracts a hypernym that is a word having a generalized concept of a term included in the term cluster by using a knowledge base in which relationships between terms are accumulated, and creates a viewpoint dictionary in which a viewpoint word selected from the hypernyms is set as a headword and the terms included in the term cluster are set as keywords for the headword.2. The text classification ...

Подробнее
12-03-2015 дата публикации

Defect record classification

Номер: US20150073773A1
Принадлежит: International Business Machines Corp

An approach to classify different defect records by mapping plain language phrases to a taxonomy. The approach includes a method that includes receiving, by at least one computing device, a defect record associated with a defect. The method further includes receiving, by the least one computing device, a plain language phrase or word. The method further includes mapping, by the least one computing device, the plain language phrase or word to a taxonomy. The method further includes classifying, by the least one computing device, how the defect was at least one of detected and resolved using the taxonomy.

Подробнее
11-03-2021 дата публикации

LOGICAL DOCUMENT STRUCTURE IDENTIFICATION

Номер: US20210073257A1
Автор: Newey Neville
Принадлежит:

To support the task of analyzing unstructured text, computer-implemented statistical natural language processing-based techniques for automatically identifying the unstructured text's logical structure are disclosed. The techniques can be used to automatically covert unstructured text into structured text. In a possible implementation of the present invention, the structured text is well-formed and schema-validated eXtensible Markup Language (XML) formatted text. Instead of relying solely on rules to convert unstructured text to structured text, disclosed techniques use statistical natural language processing techniques to recognize section boundaries in unstructured text. 1. A computer-implemented method , comprising:using a trained sequence labeling model to tag tokens in first text;based on tags assigned to tokens in the first text by the trained sequence labeling model, identifying section boundaries in the first text;based on the section boundaries identified in the first text, identifying sections in the first text; andstoring second text comprising and identifying the sections.2. The computer-implemented method of claim 1 , further comprising:based on styles of text, of the first text, that start the sections identified in the first text, identifying hierarchical relationships between the sections; andstoring the second text identifying the hierarchical relationships between the sections.3. The computer-implemented method of claim 1 , further comprising:determining a first style of a text, of the first text, that starts a first section of the sections;determining if the first style is a new style or a previously detected style among one or more sections of the sections processed so far;if the first style is the new style, then creating a new hierarchical level for the new style and assigning the first section to the new hierarchical level, or if the first style is the previously detected style, then assigning the first section to a hierarchical level ...

Подробнее
24-03-2022 дата публикации

CLUSTERING OF LOG MESSAGES

Номер: US20220092102A1
Принадлежит: LogsHero Ltd.

A computer implemented method of creating a clustering model used for clustering a plurality of log messages comprising using one or more processors for receiving a plurality of training log messages, performing the following for each of the plurality of training log messages: calculating a string distance between a textual content of the respective training log message and a representative string pattern of each of the plurality of clusters, associating the respective training log message with a respective one of the plurality of clusters in case the string distance is within a predefined threshold and adding a new cluster to the plurality of clusters for associating respective training log message in case the string distance exceeds the predefined threshold, and outputting the clustering model. 1. A computer implemented method of detecting at least one anomaly within a plurality of non-training log messages during run-time , by using a clustering model targeting a predefined source of the run-time messages , comprising:using at least one processor for:calculating a string distance between a textual content of each of a plurality of non-training log messages, in run-time, and a representative string pattern of each of a plurality of clusters of a clustering model targeting a predetermined certain source of messages;detecting at least one log message of the plurality of non-training log messages for which the string distance to the representative string pattern of each of the plurality of clusters exceeds a predefined threshold; andgenerating an alert indicative of the at least one detected log message as at least one suspected anomaly.2. The computer implemented method of claim 1 , further comprising:analyzing the detected log message to detect at least one of an anomaly type and at least one characteristic of the detected message.3. The computer implemented method of claim 2 , wherein said analyzing is a statistical analysis.4. The computer implemented method of ...

Подробнее
16-03-2017 дата публикации

System and method for prior art analysis

Номер: US20170075929A1
Автор: Steven W. Lundberg
Принадлежит: Black Hills IP Holdings LLC

The present inventive subject matter relates to prior art analysis. Various embodiments of the present inventive subject matter include systems and methods for analyzing prior art in a patent portfolio and annuity management system. In an example embodiment, a method comprises maintaining a patent matter database. The database includes data about the patent matters including for at least one patent matter a claim set or statement of invention and a priority date for the claim set or statement of invention. A database of prior art documents is maintained including data about the prior art documents. The data may include for at least one prior art document a priority date or publication date of the document. A keyword analysis is performed on a given patent matter and associated prior art documents to identify keywords occurring uniquely in the first patent matter as potential claim elements differentiating the patent matter over the disclosures contained in the one or more prior art documents.

Подробнее
26-03-2015 дата публикации

Keyword extraction apparatus and method

Номер: US20150088491A1
Принадлежит: Toshiba Corp

According to one embodiment, a keyword extraction apparatus includes a separation unit, a generation unit, a calculation unit, a first update unit, a second update unit. The separation unit separates a first annotation from each of a plurality of documents. The generation unit generates one or more document clusters by calculating a score of keywords and performing clustering on documents having a correlation value higher than a threshold. The calculation unit calculates a characteristic quantity in accordance with a type of a second annotation. The first update unit updates the score of the keyword to which the second annotation is added, based on the characteristic quantity. The second update unit updates the one or more document cluster in accordance with the updated score to obtain an updated document cluster.

Подробнее
31-03-2022 дата публикации

METHOD AND DEVICE FOR ADJUSTING AND IMPLEMENTING TOPIC DETECTION PROCESSES

Номер: US20220100965A1
Принадлежит: AT&T Intellectual Property I, L.P.

Aspects of the subject disclosure may include, for example, applying a topic detection process to documents to obtain automatically detected topics and groups of automatically detected words, comparing the automatically detected topics with manually determined topics to determine actual purity metrics, determining an error metric based on a measure of deviation between ideal purity metrics and the actual purity metrics, and adjusting a parameter of the topic detection process according to the error metric resulting in an adjusted topic detection process. Other embodiments are disclosed. 1. A device comprising:a processing system including a processor; anda memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations, comprising:obtaining a first set of topics relating to a document and a first set of word lists, each of the first set of word lists comprising one or more words and corresponding to a respective one of the first set of topics, wherein each topic of the first set of topics is characterized by a probability distribution over a set of words associated with the document, wherein each word of the first set of word lists is in the document, and wherein the obtaining comprises an automated topic detection process using a topic detection parameter;comparing, for each topic of the first set of topics, a corresponding first word list of the first set of word lists with a second word list of a second set of word lists, each of the second set of word lists comprising one or more words and corresponding to a respective one of a second set of topics, the comparing performed according to similarities between respective words of the first word list and respective words of the second word list to determine actual purity metrics, wherein the second set of topics and the second set of word lists are determined in a non-automated process;determining an error metric based on a measure of deviation between ...

Подробнее
12-03-2020 дата публикации

SYSTEM AND METHOD FOR DYNAMIC TREND CLUSTERING

Номер: US20200081975A1
Принадлежит:

A method includes extracting a keyword and a slot from a natural language input, where the slot includes information. The method includes determining whether the keyword corresponds to one of a plurality of formation groups. In response to determining that the keyword corresponds to a specific formation group, the method includes updating metadata of the specific formation group with the information of the slot. In response to determining that the keyword does not correspond to any of the formation groups, the method includes determining whether the keyword corresponds to one of a plurality of clusters. In response to determining that the keyword corresponds to a specific cluster, the method includes updating the specific cluster with the information of the slot. In response to determining that the keyword does not correspond to any of the clusters, the method includes creating an additional formation group that includes the keyword and the slot. 1. A method for trend clustering comprising:extracting a keyword and a slot from a natural language input, the slot including information;determining whether the keyword corresponds to one of a plurality of formation groups;in response to determining that the keyword corresponds to a specific formation group of the plurality of formation groups, updating metadata of the specific formation group with the information of the slot;in response to determining that the keyword does not correspond to any of the plurality of formation groups, determining whether the keyword corresponds to one of a plurality of clusters;in response to determining that the keyword corresponds to a specific cluster of the plurality of clusters, updating the specific cluster with the information of the slot; andin response to determining that the keyword does not correspond to any of the plurality of clusters, creating an additional formation group that includes the keyword and the slot.2. The method of claim 1 , further comprising:receiving, from a ...

Подробнее
12-03-2020 дата публикации

SYSTEM AND METHOD FOR DYNAMIC CLUSTER PERSONALIZATION

Номер: US20200082811A1
Принадлежит:

A system and method for dynamic cluster personalization is provided. A method of dynamic cluster personalization comprises acquiring information from a user, creating a usage log based on the acquired user information including language information and generating user features based on the usage log. The method further comprises determining a clustering feature from the user features, creating a user cluster based on the clustering feature, determining a personalization feature within the user cluster from the user features, generating a personalization for the user cluster based on the personalization feature and applying the personalization to the users in the user cluster. 1. A method comprising:acquiring information from a user including language information;creating a usage log based on the acquired user information;generating user features based on the usage log;determining a clustering feature from the user features;creating a user cluster based on the clustering feature, a user cluster comprising a plurality of users;determining a personalization feature within the user cluster from the user features;generating a personalization for the user cluster based on the personalization feature; andapplying the personalization to the users in the user cluster.2. The method of claim 1 , wherein the user information includes user requests claim 1 , user behavior claim 1 , user feedback claim 1 , electronic device information or user profile information.3. The method of claim 1 , further comprising:receiving a plurality of additional user information including language information;storing the plurality of additional user information including language information in the usage log;determining an individual personalization based on the plurality of additional user information including language information; andapplying the individual personalization to the user.4. The method of claim 3 , further comprising:determining if the individual personalization is related to the ...

Подробнее
30-03-2017 дата публикации

Method for identifying clusters of fluorescence-activated cell sorting data points

Номер: US20170091199A1

A method and/or system for analyzing data using population clustering through density based merging.

Подробнее
30-03-2017 дата публикации

Confidence score-based smart email attachment saver

Номер: US20170093767A1
Принадлежит: International Business Machines Corp

In an approach to save-to location selection, a computing device accesses a metadata file comprising a data table. The computing device checks the data table for entries that match one or more features of a file to be saved, wherein each match is associated with a save-to location. The computing device computes confidence scores for each save-to location based on a predefined weight associated with to each feature. The computing device produces a list of recommended save-to locations based on the confidence scores. The computing device receives a user selection based on or overriding the recommendations. The computing device updates the data table with information concerning each of the features of the file and the user selection.

Подробнее
19-03-2020 дата публикации

SYSTEMS AND METHODS FOR DETERMINING THE SHAREABILITY OF VALUES OF NODE PROFILES

Номер: US20200089681A1
Принадлежит: People.ai, Inc.

The present disclosure relates to determining the shareability of values of node profiles. Record objects and electronic activities of a system of record corresponding to a data source provider may be accessed. Each record object may correspond to a record object type and have one or more object field-value pairs. Node profiles may be maintained. Values of fields corresponding to a predetermined type of field including fewer than a predetermined threshold number of data source providers may be identified. A restriction tag used to restrict populating other node profiles may be generated. Provision of the value with a second data source provider may be restricted. 1. A method comprising:accessing, by one or more processors, at least one of i) a plurality of electronic activities transmitted or received via electronic accounts associated with one or more data source providers or ii) a plurality of record objects of one or more systems of record associated with the one or more data source providers;maintaining, by the one or more processors, a plurality of node profiles, each node profile of the plurality of node profiles including one or more field-value pairs, each field-value pair including a value determined from data of one or more of the plurality of electronic activities or the plurality of record objects;identifying, by the one or more processors, for a first node profile, a value of a field corresponding to a predetermined type of field that is included in one or more electronic activities or record objects accessed from a set of first data source providers including fewer than a predetermined threshold number of data source providers;generating, by the one or more processors, a restriction tag used by the one or more processors to restrict populating other node profiles of the plurality of node profiles or systems of record with the value of the node field-value pair of the first node profile; andrestricting, by the one or more processors, provision of the ...

Подробнее
19-03-2020 дата публикации

SYSTEMS AND METHODS FOR MATCHING ELECTRONIC ACTIVITIES TO RECORD OBJECTS USING FEEDBACK BASED MATCH POLICIES

Номер: US20200089682A1
Принадлежит: People.ai, Inc.

Systems and methods for matching electronic activities to record objects using feedback based match policies can include accessing a plurality of electronic activities and record objects. The systems and method can include identifying candidate record objects by applying a matching model. The systems and methods can include selecting a record object based on a match score. The systems and methods can include configuring the matching model in a first configuration responsive to a first feedback type or configuring the matching model in a second configuration responsive to a second feedback type. 1. A method , comprising:accessing, by one or more processors, a plurality of electronic activities transmitted or received via electronic accounts of one or more data source providers;accessing, by the one or more processors, a plurality of record objects of one or more systems of record, each record object of the plurality of record objects corresponding to a record object type and comprising one or more object fields having one or more object field values, the systems of record corresponding to the one or more data source providers;identifying, by the one or more processors, responsive to applying a matching model for identifying candidate record objects, a plurality of candidate record objects with which to match an electronic activity of the plurality of electronic activities, the matching model used to generate a respective match score for each candidate record object of the plurality of candidate record objects;selecting, by the one or more processors, a first record object from the plurality of candidate record objects based on a first match score for the first record object;configuring, by the one or more processors, responsive to a first feedback type regarding the selection of the first record object, the matching model in a first configuration to generate a second match score between the electronic activity and the first record object greater than the first match ...

Подробнее
19-03-2020 дата публикации

Determining and discerning items with multiple meanings

Номер: US20200089770A1
Автор: Oded Shmueli
Принадлежит: International Business Machines Corp

A method, system, and non-transitory compute readable medium determining and discerning items with multiple meanings in a sequence of items including producing a distributed representation for each item of the sequence of items including a word vector and a context vector, partitioning the sequence of items into classes, for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector, and producing a new sequence of items by modifying the distributed representation in the producing by replacing each occurrence of an item depending on the cosine distance calculated by the calculating.

Подробнее
07-04-2016 дата публикации

Reuse Of Documentation Components When Migrating Into A Content Management System

Номер: US20160098399A1
Принадлежит: Red Hat Inc

A method relates to receiving, by a processing device, a document comprising a first topic to be imported into a content management system, calculating a first signature of the first topic in view of content associated with the first topic, determining whether the first topic of the document is substantially similar to at least one of a plurality of topics stored in the content management system by comparing the first signature with a respective signature of the plurality of topics stored in the content management system, and in response to a determination that the first topic of the document is not substantially similar to any of the plurality of topics stored in the content management system, adding the first topic and the content associated with the first topic to the content management system.

Подробнее
12-05-2022 дата публикации

Computerized assessment of articles with similar content and highlighting of distinctions therebetween

Номер: US20220147553A1
Принадлежит: International Business Machines Corp

A computer receives a list of reference topics from a topic database and a set of articles related to said reference topics. The computer generates article n-grams and compares them to the reference topics using NLP to determine a primary theme for each article that corresponds to one of reference topics. The computer collects articles with common primary themes into at least one article group and determining an article comparison value between articles in the article group. Responsive to determining that an article comparison value is below a predetermined similarity threshold, determining a distinguishing feature associated with one of the compared articles that contributed to the article comparison value. The computer assigns articles having the distinguishing feature into a secondary group based, at least in part, on the distinguishing feature.

Подробнее
12-05-2022 дата публикации

UNSUPERVISED INDUCTION OF USER INTENTS FROM CONVERSATIONAL CUSTOMER SERVICE CORPORA

Номер: US20220147707A1
Принадлежит:

A methodology and system are presented for inducing user intent in a corpus and storing this intent in an intent library. To accurately detect intent, the corpus is first cleaned of nonsensical words and symbols and then syntactically analyzed to extract words and dependencies between them, which are then semantically analyzed to select keywords that are indicative of intent, and map the keywords to ordered broad semantic categories of the types of action, modifier and object. Keywords are then converted into embedding vectors whose dimensions are reduced and clustered according to category and order. Relations are calculated for the clusters across the semantic categories and intent is then calculated with the help of intent templates and word dictionaries. 1. A system for updating an intent library , the system comprising:a syntactic parser arranged to process a sequence of word tokens and control characters of at least one sentence in a corpus and produce words and dependencies between the words;a semantic analyzer arranged to process the words and dependencies between the words for extracting a set of keywords and arranged to map the set of keywords to action (A), modifier (M) and object (O) semantic categories and create ordered AMO triplets;an embeddings processor arranged to convert the set of extracted keywords in the ordered AMO triplets into keyword embedding vectors;a clustering processor arranged to cluster the reduced dimension keyword embedding vectors, where each keyword cluster contains semantically similar keywords; andan intent calculator arranged to calculate cluster relations, and store the intent clusters and the intents the clusters represent to the intent library.2. The system of claim 1 , further comprising a pre-processor arranged to eliminate words and marks that have no linguistic value from a corpus claim 1 , and arranged to create the sequence of word tokens and pairs of sentence boundary control characters claim 1 , where the corpus ...

Подробнее
12-05-2022 дата публикации

DYNAMIC WORD CORRELATED TOPIC MACHINE LEARNING MODEL

Номер: US20220147716A1
Принадлежит:

A system implements a dynamic word correlated topic model (DWCTM) to model an evolution of topic popularity, word embedding, and topic correlation within a set of documents, or other dataset, that spans a period of time. For example, the DWCTM receives the set of documents and a quantity of topics for modeling. The DWCTM processes the set computing, for each topic, various distributions to capture a popularity, word embedding, and correlation with other topics across the period of time. In other examples, a dataset of user listening sessions comprised of media content items for modeling by the DWCTM. Media content metadata (e.g., artist or genre) of the media content items, similar to words of a document, can be modeled by the DWCTM. 1. A method for dynamic topic modeling with word correlation , the method comprising:receiving, as input to a dynamic word correlated topic model (DWCTM), a set of documents each comprised of a plurality of words and having associated timestamps, wherein the timestamps of the set of documents span a period of time;identifying, as input to the DWCTM, a quantity of topics for modeling;providing the set of documents as input to the DWCTM for modeling according to the quantity of topics identified; a document-topic distribution across the period of time to yield a popularity of each of the topics across the period of time;', 'a topic-word distribution across the period of time that captures a correlation among the plurality of words to yield a word embedding; and', 'a series of covariance matrices to yield a correlation of each topic with other topics across the period of time; and, 'for each topic, modeling via the DWCTM the popularity of each of the topics across the period of time;', 'the word embedding across the period of time; and', 'the correlation of each topic with other topics across the period of time., 'providing, as output of the DWCTM2. The method of claim 1 , further comprising: the document-topic distribution at given time ...

Подробнее
12-04-2018 дата публикации

Repairing Data Through Domain Knowledge

Номер: US20180101561A1
Принадлежит: Microsoft Technology Licensing LLC

Correcting data in a dataset. A set of data tokens from a tabular data store are grouped into a plurality of different clusters based on similarity of tokens. A reference cluster is selected from among the plurality of different clusters such that the plurality of clusters includes a reference cluster and one or more other clusters, one or more tokens in the one or more other clusters are transformed. Transforming tokens is performed based on a cost of transforming tokens. The effect on the reference cluster of adding the transformed tokens to the reference cluster is determined. Using this information, a correction for a token in the dataset is identified. The data store is updated to correct the token.

Подробнее
26-03-2020 дата публикации

SYSTEMS AND METHODS FOR DETECTING EVENTS BASED ON UPDATES TO NODE PROFILES FROM ELECTRONIC ACTIVITIES

Номер: US20200097474A1
Принадлежит: People.ai, Inc.

The present disclosure relates to methods, systems, and storage media for detecting events based on updates to node profiles from electronic activities. Exemplary implementations may access an electronic activity transmitted or received via an electronic account associated with a data source provider; generate a plurality of activity field-value pairs; maintain a plurality of node profiles; identify a first state of a first node profile of the plurality of node profiles; update the first node profile using the electronic activity; identify a second state of the first node profile subsequent to updating the first node profile using the electronic activity; detect a state change of the first node profile based on the first state and the second state; determine that the state change satisfies an event condition; and store an association between the first node profile and an event type corresponding to the event condition. 120-. (canceled)21. A method , comprising:maintaining, by one or more processors, in one or more data structures, a plurality of node profiles, each node profile of the plurality of node profiles including one or more node field-value pairs, each node field-value pair including a node value associated with a node field;updating, by the one or more processors, responsive to a first electronic activity of a plurality of electronic activities, a first node profile of the plurality of node profiles based on the first electronic activity by adding or updating a first node field-value pair corresponding to a company name field of the first node profile;determining, by the one or more processors, responsive to determining that the first node field-value pair corresponds to the company name field of the first node profile, a job change event based on an event detection policy for identifying events based on updates to a value of the one or more node field-value pairs corresponding to predetermined fields of node profiles of the plurality of node profiles; ...

Подробнее
26-03-2020 дата публикации

INTENT CLASSIFICATION SYSTEM

Номер: US20200097496A1
Принадлежит:

A data processing system analyzes a corpus of conversation data collected at an interactive conversation service to train an intent classification model. The intent classification model generates vectors based on the corpus of conversation data. A set of intents is selected and an intent seed input for each intent of the set of intents is input into the model to generate an intent vector corresponding to each intent. Vectors based on user inputs are generated and compared to the intent vectors to determine the intent. 1. A method for intent classification , comprising:generating, using a word embedding function, an intent classification model including a plurality of input vectors corresponding to a corpus of unclassified conversation data received at an interactive conversation agent;receiving a set of intent categories and at least one intent seed input for each intent category in the set of intent categories;generating, using the word embedding function and the intent classification model, at least one intent vector corresponding to each intent category in the set of intent categories, wherein the intent vector generated for each intent category is based at least in part on the at least one intent seed input corresponding to the respective intent category;receiving a conversation input comprising a text string at an instance of the interactive conversation agent;generating, using the word embedding function and the intent classification model, a conversation input vector based on the conversation input;calculating similarity scores between the conversation input vector and each intent vector corresponding to each intent category; andidentifying an intent category of the set of intent categories corresponding to the intent vector having a highest similarity score with the conversation input vector.2. The method of claim 1 , wherein generating the intent classification model further comprises:selecting a user input from the corpus of conversation data as a context ...

Подробнее
26-03-2020 дата публикации

Enhanced Knowledge Delivery and Attainment Using a Question Answering System

Номер: US20200097598A1
Принадлежит: International Business Machines Corp

A mechanism is provided in a data processing system for presentation delivery. The mechanism delivering a presentation content to a group of users and receives a plurality of questions concerning the presentation content from the group of users. The mechanism stores the plurality of questions in a question history database and clusters the plurality of questions in the question history database into one or more question clusters. The mechanism determines a topic for each of the one or more question clusters to form one or more question topics and generates feedback for updating the presentation content based on the one or more question topics.

Подробнее
02-06-2022 дата публикации

Barrage generation method and apparatus and computer-readable storage medium

Номер: US20220168641A1
Принадлежит: Tencent Technology Shenzhen Co Ltd

A barrage generation method is provided to a computing device. The method includes: obtaining a target game video; determining barrage trigger information according to video content of the target game video; determining a barrage type of a barrage according to the barrage trigger information, and determining a trigger time of the barrage according to the barrage type; and obtaining, according to the barrage trigger information, a barrage text corresponding to the barrage, and triggering the barrage text to generate the barrage at the corresponding trigger time.

Подробнее
13-04-2017 дата публикации

System and method for cross-cloud identification of topics

Номер: US20170103077A1
Автор: Roy Sheinfeld
Принадлежит: HARMONIE R&D Ltd

A system and method for identifying topics in unstructured data. The method includes obtaining unstructured data from at least one data source, wherein the obtained unstructured data includes at least one unstructured data object; determining, based on the obtained unstructured data, at least one set of topic identification rules; identifying, based on the at least one unstructured data object and the at least one set of topic identification rules, at least one candidate topic of the unstructured data; and analyzing the unstructured data with respect to the at least one candidate topic to determine at least one representative topic from among the at least one candidate topic, wherein each of the at least one representative topic indicates a context of at least a portion of the unstructured data.

Подробнее
08-04-2021 дата публикации

UNSUPERVISED INDUCTION OF USER INTENTS FROM CONVERSATIONAL CUSTOMER SERVICE CORPORA

Номер: US20210103634A1
Принадлежит:

A methodology and system are presented for inducing user intent in a corpus and storing this intent in an intent library. To accurately detect intent, the corpus is first cleaned of nonsensical words and symbols and then syntactically analyzed to extract words and dependencies between them, which are then semantically analyzed to select keywords that are indicative of intent, and map the keywords to ordered broad semantic categories of the types of action, modifier and object. Keywords are then converted into embedding vectors whose dimensions are reduced and clustered according to category and order. Relations are calculated for the clusters across the semantic categories and intent is then calculated with the help of intent templates and word dictionaries. 1. A system for updating an intent library , the system comprising:a syntactic parser arranged to process a sequence of word tokens and control characters of at least one sentence in a corpus and produce words and dependencies between the words;a semantic analyzer arranged to process the words and dependencies between the words for extracting a set of keywords and arranged to map the keywords to action (A), modifier (M) and object (O) semantic categories and create ordered AMO triplets;an embeddings processor arranged to convert the extracted keywords in the ordered AMO triplets into keyword embedding vectors and reduce the dimensions of the keyword embedding vectors in each of the action, modifier and object semantic category and in each order of the AMO triplets;a clustering processor arranged to cluster the reduced dimension keyword embedding vectors, where each keyword cluster contains semantically similar keywords, and which keywords in a cluster express a single intent; andan intent calculator arranged to calculate cluster relations, create intent templates, fill empty positions in the intent templates, and store the intent clusters and the intents the clusters represent to the intent library.2. The system ...

Подробнее
08-04-2021 дата публикации

Detection of a topic

Номер: US20210103698A1
Автор: Aleksanteri VUORISTO
Принадлежит: Telia Co AB

The present invention relates to a method for performing a detection of a topic of a message introduced in a real-time customer service messaging platform. In the method a message comprising at least one word from which the topic is definable is received; a topic from the received message is extracted; it is inquired from a database if the topic is determinable from a number of messages received chronically earlier than the received message; and an indication is generated to an operator of the real-time customer service messaging platform in accordance with a detection result obtained through an inquiry to the database. Some aspects of the present invention relate to a network node, to a computer program product and to a system.

Подробнее
08-04-2021 дата публикации

Data analysis using natural language processing to obtain insights relevant to an organization

Номер: US20210103865A1
Принадлежит: Saama Technologies Inc

Methods and apparatuses for generating insights for improving an organization from unstructured and structured data. Natural language processing is employed to process the aggregated data from various data sources to create topics and the features that impact the topics. These topics are then used to generate recommendations to improve customer satisfaction with the organization.

Подробнее
02-06-2022 дата публикации

CLUSTERING USING NATURAL LANGUAGE PROCESSING

Номер: US20220171800A1
Принадлежит: ORACLE INTERNATIONAL CORPORATION

In one aspect, a system receives a request to cluster a set of log records. Responsive to receiving the request, the system identifies at least one dictionary that defines a set of tokens and corresponding token weights and generates, based at least in part on the set of tokens and corresponding token weights, a set of clusters such that each cluster in the set of clusters represents a unique combination of two or more tokens from the dictionary and groups a subset of log records mapped to the unique combination of two or more tokens. The system may then perform one or more automated actions based on at least one cluster in the set of clusters. 1. A method comprising:receiving a request to cluster a set of records;responsive to receiving the request to cluster the set of records, identifying at least one dictionary that is associated with a set of one or more tokens and at least one of a set of one or more token weights or a set of one or more rules;generating, based at least in part on the set of one or more tokens and at least one of the set of one or more token weights or the set of one or more rules associated with the dictionary, a set of one or more clusters, wherein each cluster in the set of one or more clusters represents a unique subset of one or more tokens associated with the dictionary and groups, from the set of records, a subset of one or more records mapped to the unique subset of one or more tokens associated with the dictionary; andperforming at least one action based on at least one cluster in the set of one or more clusters.2. The method of claim 1 , wherein a token weight for a given token is generated claim 1 , at least in part claim 1 , on a sentiment associated with a corresponding token claim 1 , wherein a negative sentiment increases a weight given to the token.3. The method of claim 1 , wherein the at least one dictionary includes a domain-specific dictionary generated for a particular domain claim 1 , wherein a token weight associated ...

Подробнее
21-04-2016 дата публикации

Systems and Methods for Social Media Data Mining

Номер: US20160110429A1
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Systems and methods are provided to collect, analyze and report social media aggregated from a plurality of social media websites. Social media is retrieved from social media websites, analyzed for sentiment, and categorized by topic and user demographics. The data is then archived in a data warehouse and various interfaces are provided to query and generate reports on the archived data. In some embodiments, the system further recognizes alert conditions and sends alerts to interested users. In some embodiments, the system further recognizes situations where users can be influenced to view a company or its products in a more favorable light, and automatically posts responsive social media to one or more social media websites.

Подробнее
20-04-2017 дата публикации

System, method, and recording medium for determining and discerning items with multiple meanings

Номер: US20170109344A1
Автор: Oded Shmueli
Принадлежит: International Business Machines Corp

A method, system, and non-transitory compute readable medium determining and discerning items with multiple meanings in a sequence of items including producing a distributed representation for each item of the sequence of items including a word vector and a context vector, partitioning the sequence of items into classes, for an item using a representative word vector of each class, calculating a cosine distance between the word vector of said item and the class representative vector, and producing a new sequence of items by modifying the distributed representation in the producing by replacing each occurrence of an item depending on the cosine distance calculated by the calculating.

Подробнее
29-04-2021 дата публикации

COMPUTER-IMPLEMENTED METHOD AND DEVICE FOR PROCESSING DATA

Номер: US20210124877A1
Принадлежит:

A computer-implemented method for processing text data including a multitude of text modules. In the method, a representation of the text is provided, and a model is used which predicts a classification for a respective text module of the text as a function of the representation of the text. The provision of the representation of the text includes the provision of a total word vector for a respective text module of the text. The total word vector is formed from at least two, preferably multiple word vectors, and a respective word vector being weighted as a function of properties of the respective text module. 113-. (canceled)14. A computer-implemented method for processing text data including a multitude of text modules , for automatically extracting pieces of information from the text data and/or in models for creating databases , the method comprising the following steps:providing a representation of the text data; andusing a model to predict a classification for each respective text module of the text data as a function of the representation of the text data, the providing of the representation of the text data including providing a total word vector for each respective text module of the text data, the total word vector being formed from at least two word vectors, and each respective word vector of the at least two word vectors being weighted as a function of properties of the respective text module.15. The method as recited in claim 14 , wherein the databases are structured knowledge databases or knowledge graphs.16. The method as recited in claim 14 , further comprising:calculating a weight for each respective word vector.17. The method as recited in claim 14 , wherein a weight for each respective word vector is also calculated as a function of the respective word vector.18. The method as recited in claim 14 , wherein a first property of the properties of the respective text module represents a relative frequency of the respective text module in the text data ...

Подробнее
02-04-2020 дата публикации

SYSTEMS AND METHODS FOR DETERMINING A COMPLETION SCORE OF A RECORD OBJECT FROM ELECTRONIC ACTIVITIES

Номер: US20200104301A1
Принадлежит:

The present disclosure relates to a method for determining a completion score for a record object based on electronic activities. The method includes accessing record objects, each of which corresponds to a record object type and includes object fields having object field-values. The method includes selecting one of the record objects. The method includes identifying electronic activities transmitted or received associated with the record object. Each of the electronic activities has a timestamp indicating a receipt time or transmission time of the respective electronic activity. The method includes determining a participant of each of the electronic activities. The method includes determining a completion score indicating a likelihood of completing an event associated with the record object based on the timestamp of each of the electronic activities and the participant of each of the electronic activities. The method includes storing an association between the record object and the completion score. 1. A method comprising:accessing, by one or more processors, a plurality of record objects of one or more systems of record, each record object of the plurality of record objects corresponding to a record object type and comprising one or more object fields having one or more object field-values;selecting, by the one or more processors, a first record object of the plurality of record objects;identifying, by the one or more processors, a plurality of electronic activities transmitted or received via electronic accounts and associated with the first record object, each of the plurality of electronic activities having a timestamp indicating a receipt time or transmission time of the respective electronic activity;determining, by the one or more processors and responsive to parsing the plurality of electronic activities, at least one participant of each of the plurality of electronic activities;determining, by the one or more processors, a completion score indicating a ...

Подробнее
02-04-2020 дата публикации

SYSTEMS AND METHODS FOR MERGING TENANT SHADOW SYSTEMS OF RECORD INTO A MASTER SYSTEM OF RECORD

Номер: US20200104302A1
Принадлежит:

The present disclosure is related to systems and methods of merging tenant shadow systems of record into a master system of record. First tenant record objects of a first tenant system of record can be accessed. A master record object for a master system of record can be generated using the corresponding first tenant record object. A second tenant record object of a second tenant system of record can be accessed. Whether the second tenant record object is to be merged into the corresponding master record object can be determined. When determined to merge, the second tenant record object can be merged into the corresponding master record object. When determined to not merge, a new master record can be generated. 1. A method , comprising:accessing, by one or more processors, a plurality of first tenant record objects of a first tenant system of record, each first tenant record object of the plurality of first tenant record objects comprising one or more object field-value pairs, the first tenant system of record associated with a first data source provider of a plurality of data source providers;generating, by the one or more processors, for each first tenant record object of the first tenant system of record, a master record object for a plurality of master record objects using the first tenant record object, the master record object including one or more object field-value pairs of the corresponding first tenant record object;accessing, by one or more processors, a plurality of second tenant record objects of a second tenant system of record, each second tenant record object of the second plurality of tenant record objects comprising one or more object field-value pairs, the second tenant system of record associated with a second data source provider of the plurality of data source providers;determining, by the one or more processors, whether to match a second tenant record object of the plurality of second tenant record object with a corresponding first master ...

Подробнее
02-04-2020 дата публикации

Systems and methods of generating an engagement profile

Номер: US20200104303A1
Принадлежит: People AI Inc

The present disclosure relates to systems and methods for determining an engagement profile of a participant by associating electronic activities to a profile. It may generate the engagement profile based on analysis of the electronic activity level. An example implementation may contain the following steps. The system may access for a first record object a plurality of electronic activities linked with the first record object. The system may identify for a participant from the plurality of electronic activities a set of electronic activities including the participant. The system may determine an engagement profile of the participant based on a first number of electronic activities of the set of electronic activities sent by the participant, a second number of the set of electronic activities received by the participant and a temporal distribution of the set of electronic activities. The system may store the engagement profile in one or more data structures.

Подробнее
11-04-2019 дата публикации

Author disambiguation and publication assignment

Номер: US20190108179A1
Принадлежит: ResearchGate GmbH

Described herein are computer-implemented systems and methods for automatically disambiguating author names for a plurality of publications so as to create clusters of author name mentions that are with high probability associated with a single author identity for each cluster. Also described are systems and methods for assigning the clusters to respective unique author identities, automatically and/or based on human input (e.g., as received from authors, co-authors, or administrative curators).

Подробнее
02-04-2020 дата публикации

SEMANTIC MATCHING SYSTEM AND METHOD

Номер: US20200104315A1
Принадлежит:

A computer-based system and method for determining similarity between at least two heterogenous unstructured data records and for optimizing processing performance. A plurality of occupational data records is generated and, for each of the occupational data records, a respective vector is created to represent the occupational data record. Each of the vectors is sliced into a plurality of chunks. Thereafter, semantic matching of the chunks occurs in parallel, to compare at least one occupational data record to at least one other occupational data record simultaneously and substantially in real time. Thereafter, values representing similarities between at least two of the occupational data records are output. 1. A computer-based method for determining similarity between at least two heterogenous unstructured data records and for optimizing processing performance , the method comprising:generating, by at least one processor that is configured by executing code stored on non-transitory processor readable media, a plurality of occupational data records;creating, by the at least one processor, for each of the occupational data records, a respective vector to represent the occupational data record;slicing, by the at least one processor, each of the vectors into a plurality of chunks;performing, by the at least one processor, semantic matching for each of the chunks in parallel to compare at least one occupational data record to at least one other occupational data record simultaneously and substantially in real time; andoutputting, by the at least one processor, values representing similarities between at least two of the occupational data records.2. The method of claim 1 , wherein each of the vectors has magnitude and direction.3. The method of claim 1 , further comprising creating an n-dimensional non-orthogonal unit vector space.4. The method of claim 3 , wherein the n-dimensional non-orthogonal unit vector space is created by calculating dot products between unit ...

Подробнее
11-04-2019 дата публикации

Hierarchical Classification of Transaction Data

Номер: US20190108593A1
Принадлежит: Yodlee Inc

Methods, systems and computer program products implementing hierarchical classification techniques are disclosed. A hierarchical classification system receives training data including labeled transaction records. The system determines tag sequences from the training data. The system clusters the tag sequences into clusters. The system determines a cluster-level classifier that is trained to predict a cluster for an input transaction record. The system determines a respective cluster-specific classifier for each cluster. The system trains the cluster-specific classifier to predict a label of entity of interest for an input transaction record, given a particular cluster. Upon receiving a test transaction record, the system first applies the cluster-level classifier to determine a particular cluster for the test transaction record, and then determines a label of entity of interest of the test transaction record by applying a cluster-specific classifier of that particular cluster.

Подробнее
30-04-2015 дата публикации

Text mining system, text mining method, and program

Номер: US20150120735A1
Принадлежит: NEC Corp

The present invention is a text mining system comprising a synonym cluster acquiring section configured to acquire synonym clusters from texts in text data to be analyzed, the synonym clusters each being a collection of synonymous texts, an implication relationship acquiring section configured to acquire implication relationships among the synonym clusters, and an implication graph generating section configured to generate an implication graph including vertices of synonym clusters and directed edges each indicating a direction from an implied synonym cluster to an implying synonym cluster from the implication relationships among the synonym clusters.

Подробнее
09-06-2022 дата публикации

METHOD, APPARATUS, AND COMPUTER PROGRAM PRODUCT FOR CLASSIFICATION AND TAGGING OF TEXTUAL DATA

Номер: US20220179895A1
Автор: PENDAR Nick
Принадлежит:

Provided herein are systems, methods and computer readable media for classification and tagging of textual data. An example method may include accessing a corpus comprising a plurality of documents, each document having one or more labels indicative of services offered by a merchant, generating a query based on extracted features and the documents, generating a precision score for at least a portion of the generated query and selecting a subset of the generated queries based on an assigned precision score satisfying a precision score threshold, the selected subset of the generated queries configured to provide an indication of one or more labels to be applied to machine readable text. A second example method, utilized for tagging machine readable text with unknown labels, may include assigning a label to textual portions of the machine readable text based on results of the application of the queries. 176.-. (canceled)77. A method for identifying one or more services based on machine readable text comprising:accessing a corpus comprising a plurality of documents, each of one or more documents of the corpus having one or more labels indicative of one or more services offered by a merchant;generating, using a processor, at least one query based on one or more extracted features and the one or more documents;generating a recall score for at least a portion of the generated at least one query, wherein the recall score is calculated based on a number of true positives returned by the query divided by a total number of elements that belong to a positive class; andselecting a subset of the generated at least one query based on an assigned recall score satisfying a recall score threshold, wherein the selected subset of the generated at least one query are configured to provide an indication of one or more labels to be applied to machine readable text.7819. The method according to claim , wherein generating the query further comprises:generating an array of feature index ...

Подробнее
09-06-2022 дата публикации

ORGANIZING FRAGMENTS OF MEANINGFUL TEXT

Номер: US20220179896A1
Принадлежит:

Organizing and/or aligning fragments of text that are included in a set of physical and/or digital documents so that the arrangement of the text fragments is in a readily understandable and meaningful format for a given reader. This organization and/or alignment uses a relation model of the various text fragments to correlate a meaning between and amongst the various text fragments to ultimately determine the final alignment and/or arrangement of those text fragments. 1. A computer-implemented method (CIM) comprising:receive, from a corpus of text documents, a plurality of text fragments, with the plurality of text fragments being text strings that are organized in a staggered manner;identify, from the plurality of text fragments, a first arrangement of the plurality of text fragments;responsive to the identification of the first arrangement, extracting a first set of text fragments to obtain an extracted set of text fragments;generating a relation model based upon the extracted set of text fragments, with the relation model including information indicative of a semantic relatedness value defining the extracted set of text fragments;clustering the extracted set text fragments into a first group, based, at least in part, upon the relation model;combining the clustered set of text fragments from the first group into composite text objects; andoptimizing an alignment of the composite text objects to obtain an organized text fragment.2. The CIM of wherein the semantic relatedness value defining the extracted set of text fragments is determined from a cosine similarity of the extracted set of text fragments.3. The CIM of wherein a vector for the composite text objects is a function of the clustered set of text fragments.4. The CIM of wherein the semantic relatedness value for the composite text objects is determined through the use of a semantic web graph.5. The CIM of wherein the combination of the clustered set of text fragments from the first group into composite text ...

Подробнее
09-06-2022 дата публикации

Computerized grouping of news articles by activity and associated phase of focus

Номер: US20220179916A1
Принадлежит: International Business Machines Corp

A computer categorizes a news article by an activity and an associated activity phase, includes receiving, by a computer, at least one news article from an article source. The computer assigns for each news article, an activity of focus selected from a list of target activities, using a first machine learning model. The computer identifies, for each news article, at least one activity phase candidate selected from a list of activity phases associated with the activity of focus, using a second machine learning model. The computer determines, for each of new article, an activity phase of focus from among the activity phase candidates. The determination is based, at least in part on a confirmation attribute associated with the article. The computer categorizes each of the articles by said activity of focus and said activity phase of focus.

Подробнее
27-04-2017 дата публикации

System and method for maintaining a dynamic dictionary

Номер: US20170116331A1
Автор: Yitshak Yishay
Принадлежит: Verint Systems Ltd

An apparatus and techniques for constructing and utilizing a “dynamic dictionary” that is not a compiled dictionary, and therefore does not need to be recompiled in order to be updated. The dynamic dictionary includes respective data structures that represent (i) a management automaton that includes a plurality of management nodes, and (ii) a runtime automaton that is derived from the management automaton and includes a plurality of runtime nodes. The runtime automaton may be used to search input data, such as communication traffic over a network, for keywords of interest, while the management automaton manages the addition of keywords to the dynamic dictionary. Typically, at least two (e.g., exactly two) such dynamic dictionaries are used in combination with a static dictionary.

Подробнее
09-06-2022 дата публикации

Systems and methods for patient record matching

Номер: US20220180988A1

An AI record matching system includes processors that may compare patient records using one or more rules, criteria, or parameters and determine whether any of the patient records include different demographic information but include same medical information for a same person based on comparing the patient records using the one or more rules, criteria, or parameters. The processors may receive feedback data indicating an overmatching of the patient records to the same person or an undermatching of the patient records to the same person. The processors may be trained by modifying the one or more rules, criteria, or parameters based on the feedback data. The processors may iteratively repeat one or more of examining the patient records, determining whether any of the patient records include the different demographic information but the same medical information for the same person, receiving the feedback data, and training the one or more processors.

Подробнее
09-04-2020 дата публикации

SYSTEMS AND METHODS FOR CLASSIFYING ELECTRONIC ACTIVITIES BASED ON SENDER AND RECIPIENT INFORMATION

Номер: US20200110750A1
Принадлежит: People.ai, Inc.

The system and methods described herein can classify electronic activities based on sender and recipient information. The system can determine a relationship between a sender of an electronic activity and at least one recipient of the electronic activity using a sender node profile and a recipient node profile. The system can assign a tag to the electronic activity based on the relationship between the sender and one or more recipients of the electronic activity. The system can process the electronic activity based on the assigned tag. 1. A method comprising:maintaining, by one or more processors, a plurality of node profiles corresponding to a plurality of unique entities, each node profile including a plurality of fields, each field of the plurality of fields including one or more value data structures, each value data structure of the one or more value data structures including a value and one or more entries corresponding to respective one or more data points that include the value of the value data structure;accessing, by the one or more processors, a plurality of electronic activities transmitted or received via electronic accounts associated with one or more data source providers, the one or more processors configured to update the plurality of node profiles using the plurality of electronic activities;identifying, by the one or more processors, an electronic activity of the plurality of electronic activities to process;determining, by the one or more processors, a relationship between a sender of the electronic activity and at least one recipient of the one or more recipients of the electronic activity using the node profiles of the sender and the at least one recipient included in the plurality of node profiles;assigning, by the one or more processors, a tag to the electronic activity based on the relationship between the sender and the one or more recipients; andprocessing, by the one or more processors, the electronic activity based on the assigned tag.2. ...

Подробнее