Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 2165. Отображено 199.
18-08-2022 дата публикации

Способ и система перевода речи в текст

Номер: RU2778380C2

Группа изобретений относится к области распознавания речи и может быть использована для перевода речи в текст. Техническим результатом является повышение точности распознавания. Способ содержит этапы, на которых пользовательское высказывание, соответствующее речи, принимается, первый прогнозный текст, соответствующий пользовательскому высказыванию, и первый показатель достоверности, соответствующий первому прогнозному тексту, определяются с использованием локального графа, пользовательское высказывание передается на сервер, второй прогнозный текст, соответствующий пользовательскому высказыванию, и второй показатель достоверности, соответствующий второму прогнозному тексту, принимаются от сервера, если первый показатель достоверности выше, чем второй показатель достоверности, первый прогнозный текст выводится. 3 н. и 14 з.п. ф-лы, 8 ил.

Подробнее
26-08-2021 дата публикации

Номер: RU2020108161A3
Автор:
Принадлежит:

Подробнее
03-11-2021 дата публикации

Номер: RU2019145083A3
Автор:
Принадлежит:

Подробнее
01-09-2020 дата публикации

СПОСОБ И СИСТЕМА ДЛЯ ФОРМИРОВАНИЯ ТЕКСТОВОГО ПРЕДСТАВЛЕНИЯ ФРАГМЕНТА УСТНОЙ РЕЧИ ПОЛЬЗОВАТЕЛЯ

Номер: RU2731334C1

Группа изобретений относится к области обработки естественного языка. Техническим результатом является формирование текстового представления фрагмента устной речи пользователя с учетом характеристик пользователя и акустических свойств фрагмента речи. Способ включает в себя: прием указания на фрагмент устной речи пользователя; формирование по меньшей мере двух гипотез; формирование электронным устройством из этих по меньшей мере двух гипотез набора спаренных гипотез, одна из которых содержит первую гипотезу, спаренную со второй гипотезой; определение для одной пары спаренных гипотез из набора спаренных гипотез парной оценки; формирование набора признаков фрагмента речи, указывающего на одну или несколько характеристик, связанных с этим фрагментом устной речи пользователя; ранжирование первой гипотезы и второй гипотезы на основе по меньшей мере парной оценки и набора признаков этого фрагмента речи и выбор первой гипотезы в качестве текстового представления этого фрагмента устной речи пользователя ...

Подробнее
14-11-2007 дата публикации

Automatic speech recognition method and apparatus

Номер: GB0000719453D0
Автор:
Принадлежит:

Подробнее
23-06-2010 дата публикации

A speech processing method and system

Номер: GB0201007524D0
Автор:
Принадлежит:

Подробнее
19-01-2022 дата публикации

End of speech detection using one or more neural networks

Номер: GB0002597126A
Принадлежит:

An Automatic Speech Recognition/voice transcription system indicates an End of Speech segment based on one or more characters predicted to be within the segment, especially a particular percentage of blank (non-speech) characters within a sliding window 352, fig. 3B (eg. 95% within 500 ms). Audio data is input to a Connectionist Temporal Classification (CTC) neural network model 304 to generate character probabilities based on extracted features (eg. mel-spectogram 204, fig. 2). The Start (11) and End (12) Of Speech segments are then detected 310 via a greedy (eg. ArgMax) decoder 308.

Подробнее
30-06-2021 дата публикации

End of speech detection using one or more neural networks

Номер: GB202107009D0
Автор:
Принадлежит:

Подробнее
15-08-2009 дата публикации

PROCEDURE FOR PERSONALISIERUNG A SERVICE

Номер: AT0000439665T
Принадлежит:

Подробнее
15-11-2011 дата публикации

LANGUAGE MODEL COMPRESSION MODELS WITH GOLOMBKODIERUNG

Номер: AT0000529852T
Принадлежит:

Подробнее
15-02-2004 дата публикации

NETZWERKUND OF LANGUAGE MODELS FOR USE IN A SPEECH RECOGNITION SYSTEM

Номер: AT0000258332T
Принадлежит:

Подробнее
04-06-2020 дата публикации

Systems and methods for adaptive proper name entity recognition and understanding

Номер: AU2018365166A1
Принадлежит: RnB IP Pty Ltd

Various embodiments contemplate systems and methods for performing automatic speech recognition (ASR) and natural language understanding (NLU) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted. Recognition of the other words in the utterances in question, e.g. words that are not part of the proper name entities, may occur at regular, high recognition accuracy. Various embodiments provide as output not only accurately transcribed running text of the complete utterance, but also a symbolic representation of the meaning of the input, including appropriate symbolic representations of proper name entities, adequate to allow a computer system to respond appropriately to the spoken request without further analysis of the user's input.

Подробнее
11-12-2001 дата публикации

Unified language model cfg and n-grams

Номер: AU0006341701A
Принадлежит:

Подробнее
05-03-2013 дата публикации

SEMANTIC OBJECT SYNCHRONOUS UNDERSTANDING FOR HIGHLY INTERACTIVE INTERFACE

Номер: CA0002467134C
Автор: WANG, KUANSAN
Принадлежит: MICROSOFT CORPORATION

... ²² A method and system provide a speech input ²mode which dynamically reports partial semantic ²parses, while audio captioning is still in progress. ²The semantic parses can be evaluated with an outcome ²immediately reported back to the user. The net effect ²is that task conventionally performed in the system ²turn are now carried out in the midst of the user ²turn thereby presenting a significant departure from ²the turn-taking nature of a spoken dialogue.² ...

Подробнее
07-08-2018 дата публикации

METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION

Номер: CA0002899537C

An automatic speech recognition method includes at a computer having one or more processors and a memory for storing one or more programs to be executed by the processors, obtaining a plurality of speech corpus categories through classifying and calculating raw speech corpus (801); obtaining a plurality of classified language models that respectively correspond to the plurality of speech corpus categories through language model training applied on each speech corpus category (802); obtaining an interpolation language model through implementing a weighted interpolation on each classified language model and merging the interpolated plurality of classified language models (803); constructing a decoding resource in accordance with an acoustic model and the interpolation language model (804); decoding input speech using the decoding resource, and outputting a character string with a highest probability as the recognition result of the input speech (805).

Подробнее
27-04-1991 дата публикации

METHOD AND APPARATUS FOR FINDING THE BEST SPLITS IN A DECISION TREE FOR A LANGUAGE MODEL FOR A SPEECH RECOGNIZER

Номер: CA0002024382A1
Принадлежит:

Подробнее
16-03-1997 дата публикации

AUTOMATED PHRASE GENERATION

Номер: CA0002180687A1
Принадлежит: KIRBY EADES GALE BAKER

A methodology for automated task selection is provided, where the selected task is identified in natural speech of a user making such a selection. A set of meaningful phrases are determined by a grammatical inference algorithm which operates on a predetermined corpus of speech utterances, each such utterance being associated with a specific task objective, and wherein each utterance is marked with its associated task objective. Each meaningful phrase developed by the grammatical inference algorithm can be characterized as having both a Mutual Information value and a Salience value (relative to an associated task objective) above a predetermined threshold.

Подробнее
29-07-2008 дата публикации

SPEECH INDEX PRUNING

Номер: KR1020080069990A
Принадлежит:

A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value. © KIPO & WIPO 2008 ...

Подробнее
12-07-2012 дата публикации

SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION METHOD, AND SPEECH RECOGNITION PROGRAM

Номер: WO2012093451A1
Принадлежит:

This speech recognition device is provided with: a hypothesis search means, which, with respect to inputted speech data, searches for the optimum solution by generating hypotheses involving chains of words to be searched as candidates for recognition results; a restatement determination means, which calculates the likelihood of a word or a word sequence included in a hypothesis being searched by the hypothesis search means being a restatement, and which determines whether the word or word sequence is a restatement; and a filtered word hypothesis generation means, which, when it has been determined that the word or word sequence is a restatement, generates a filtered word hypothesis, which is a hypothesis that treats the word or word sequence included in non-flowing segments or repaired segments among restatement segments that include the word or word sequence as filtered words.

Подробнее
11-07-2002 дата публикации

COMPUTER-IMPLEMENTED LANGUAGE MODEL PARTITIONING METHOD AND SYSTEM

Номер: WO2002054384A1
Принадлежит:

A computer-implemented method and system for generating speech models for use in speech recognition of a user speech input. Word conceptual networks are formed by grouping words with pre-selected pivot words. The groupings of words form phrases directed to pre-selected concepts. Phoneme networks are associated with the words in the word conceptual networks. The phoneme networks contain probabilities for recognizing the words in the word conceptual networks. A language model is partitioned into sub-language models based upon the pivot words. The sub-language models include the phoneme networks that are associated with the words grouped with the sub-language models' respective pivot words.

Подробнее
12-03-2015 дата публикации

SYSTEM AND METHOD FOR COMBINING GEOGRAPHIC METADATA IN AUTOMATIC SPEECH RECOGNITION LANGUAGE AND ACOUSTIC MODELS

Номер: US20150073793A1
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.

Подробнее
10-05-2007 дата публикации

Speech index pruning

Номер: US20070106512A1
Принадлежит: Microsoft Corporation

A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.

Подробнее
02-10-2003 дата публикации

Methods and apparatus for generating dialog state conditioned language models

Номер: US20030187648A1

Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.

Подробнее
15-02-2022 дата публикации

Natural language processing models for conversational computing

Номер: US0011250839B2
Принадлежит: Microsoft Technology Licensing, LLC

In non-limiting examples of the present disclosure, systems, methods and devices for training conversational language models are presented. An embedding library may be generated and maintained. Exemplary target inputs and associated intent types may be received. The target inputs may be encoded into contextual embeddings. The embeddings may be added to the embedding library. When a conversational entity receives a new natural language input, that new input may be encoded into a contextual embedding. The new embedding may be added to the embedding library. A similarity score model may be applied to the new embedding and one or more embeddings for the exemplary target inputs. Similarity scores may be calculated based on the application of the similarity score model. A response may be generated by the conversational entity for an intent type for which a similarity score exceeds a threshold value.

Подробнее
30-01-2001 дата публикации

Method and apparatus using probabilistic language model based on confusable sets for speech recognition

Номер: US6182039B1
Автор:
Принадлежит:

The speech recognizer incorporates a language model that reduces the number of acoustic pattern matching sequences that must be performed by the recognizer. The language model is based on knowledge of a pre-defined set of syntactically defined content and includes a data structure that organizes the content according to acoustic confusability. A spelled name recognition system based on the recognizer employs a language model based on classes of letters that the recognizer frequently confuses for one another. The language model data structure is optionally an N-gram data structure, a tree data structure, or an incrementally configured network that is built during a training sequence. The incrementally configured network has nodes that are selected based on acoustic distance from a predetermined lexicon.

Подробнее
09-02-1999 дата публикации

Method and apparatus for an improved language recognition system

Номер: US5870706A
Автор:
Принадлежит:

Methods and apparatus for a language model and language recognition systems are disclosed. The method utilizes a plurality of probabilistic finite state machines having the ability to recognize a pair of sequences, one sequence scanned leftwards, the other scanned rightwards. Each word in the lexicon of the language model is associated with one or more such machines which model the semantic relations between the word and other words. Machine transitions create phrases from a set of word string hypotheses, and incrementally calculate costs related to the probability that such phrases represent the language to be recognized. The cascading lexical head machines utilized in the methods and apparatus capture the structural associations implicit in the hierachical organization of a sentence, resulting in a language model and language recognition systems that combine the lexical sensitivity of N-gram models with the structural properties of dependency grammar.

Подробнее
19-10-2021 дата публикации

Engaging in human-based social interaction for performing tasks using a persistent companion device

Номер: US0011148296B2

A persistent companion robot detects human interaction cues through analysis of a range of sensory inputs. Based on the detected cue, the robot expresses a skill that involves interacting with human through verbal and non-verbal means to determine a second interaction cue in response to which the robot performs a second skill such as facilitating social interactions between humans, performing utilitarian tasks, informing humans, and entertaining humans.

Подробнее
06-11-2018 дата публикации

System and method for combining geographic metadata in automatic speech recognition language and acoustic models

Номер: US0010121468B2

Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.

Подробнее
22-08-2002 дата публикации

Method for preserving contextual accuracy in an extendible speech recognition language model

Номер: US2002116194A1
Автор:
Принадлежит:

A method of generating language model statistics for a new word added to a language model incorporating at least one class file containing contextually related words. The method can include the following steps: First, language model statistics can be computed based on references to at least one incorporated class file. Second, a new word can be substituted for each reference to a selected class file. Additionally, the language model statistics can be re-computed based on the new word having been substituted for the reference. Third, the re-computed language model statistics can be displayed in a user interface and modifications can be accepted to the re-computed language model statistics through the user interface. Fourth, the language model statistics can be further re-computed based on the modifications. In consequence, the language model statistics are re-computed for the new word without introducing contextual inaccuracies in the language model.

Подробнее
06-03-2003 дата публикации

Supervised automatic text generation based on word classes for language modeling

Номер: US2003046078A1
Автор:
Принадлежит:

A system and method is provided that randomly generates text with a given structure. The structure is taken from a number of learning examples. The structure of training examples is captured by word classification and the definition of the relationships between word classes in a given language. The text generated with this procedure is intended to replicate the information given by the original learning examples. The resulting text may be used to better model the structure of a language in a stochastic language model.

Подробнее
24-06-2021 дата публикации

RECOMMENDING MULTIMEDIA BASED ON USER UTTERANCES

Номер: US20210193130A1
Принадлежит: FUJITSU LIMITED

A method may include obtaining a dialogue of a user and a pre-trained language model. The method may include obtaining a corpus of dialogues and a corpus of response materials. The method may include modifying the pre-trained language model. The method may include identifying a dialogue topic of the dialogue of the user and identifying a set of response topics. The method may include selecting a set of response materials from the corpus of response materials. The method may include determining a first plurality of probabilities and, for each response material of the set of response materials, a respective second plurality of probabilities. The method may include comparing the first plurality of words with each respective second plurality of words associated with each respective response material of the set of response materials. The method may include selecting a response material of the set of response materials based on the comparison.

Подробнее
12-03-2019 дата публикации

Allowing spelling of arbitrary words

Номер: US10229109B1
Принадлежит: GOOGLE LLC, Google LLC

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the methods includes receiving a first voice input from a user device; generating a first recognition output; receiving a user selection of one or more terms in the first recognition output; receiving a second voice input spelling a correction of the user selection; determining a corrected recognition output for the selected portion; and providing a second recognition output that merges the first recognition output and the corrected recognition output.

Подробнее
19-10-2023 дата публикации

ELECTRONIC DEVICE AND METHOD FOR PROVIDING CONVERSATIONAL SERVICE

Номер: US20230335131A1
Принадлежит:

A method, performed by an electronic device, of providing a conversational service includes: receiving an utterance input; identifying a temporal expression representing a time in a text obtained from the utterance input; determining a time point related to the utterance input based on the temporal expression; selecting a database corresponding to the determined time point from among a plurality of databases storing information about a conversation history of a user using the conversational service; interpreting the text based on information about the conversation history of the user, the conversation history information being acquired from the selected database; generating a response message to the utterance input based on a result of the interpreting; and outputting the generated response message.

Подробнее
26-09-2001 дата публикации

Generation of a language model and an acoustic model for a speech recognition system

Номер: EP0001136982A3
Принадлежит:

Abstract not available! Abstract of correspondent: US2001029453 The invention relates to a method of generating a language model and a method of generating an acoustic model for a speech recognition system. There is proposed to successively reduce the respective training material by training material portions in dependence on application-specific data or to extend it to obtain the respective training material for generating a language model and the acoustic model ...

Подробнее
08-07-1998 дата публикации

Method and system for speaker-independent recognition of user-defined phrases

Номер: EP0000852374A3
Принадлежит:

Подробнее
05-10-1994 дата публикации

COMPOSITE EXPERT

Номер: EP0000617827A1
Принадлежит:

In a continuous speech recognizer which includes at least one acoustic expert and one linguistic expert which generate respective scores, a method is disclosed for adjusting the relative weighting to be applied to those scores employing training data utilizing the words to be recognized in multiple word phrases. Multiple word test phrases are applied to the acoustic expert to determine, for each phrase, plural multi-word hypotheses (15) each having corresponding cumulative scores. The linguistic expert generates corresponding cumulative linguistic scores (17). An objective function is calculated (25) for each test phrase having a value which is variable as a function of the difference between the combined score of any correct hypothesis and that of the most easily confused incorrect hypothesis. The objective function values are cumulated (27) and a gradient descent procedure is used (31) to adjust the relative weighting of the acoustic and linguistic scores in obtaining a combined score ...

Подробнее
06-04-2011 дата публикации

Automatic speech recognition method and apparatus

Номер: GB0002453366B

Подробнее
08-04-2009 дата публикации

Automatic speech recognition method and apparatus

Номер: GB2453366A
Принадлежит:

A system for calculating the look ahead probabilities at the nodes in a language model look ahead tree, wherein the words of the vocabulary of the language are located at the leaves of the tree, said apparatus comprising: means to assign a language model probability to each of the words of the vocabulary using a first low order language model; means to calculate the language look ahead probabilities for all nodes in said tree using said first language model; means to determine if the language model probability of one or more words of said vocabulary can be calculated using a higher order language model and updating said words with the higher order language model; and means to update the look ahead probability at only the nodes which are affected by the words where the language model has been updated. The system and method may be used for automatic speech recognition as part of speech to speech translation, optical character recognition (OCR), or handwriting recognition systems.

Подробнее
15-10-2004 дата публикации

HIERARCHICAL LANGUAGE MODELS

Номер: AT0000276568T
Принадлежит:

Подробнее
15-11-2011 дата публикации

UNIFORM TREATMENT OF DATA SCARCENESS AND DATA OVER ADJUSTMENT WITH THE MAXIMUM ENTROPY MODELING

Номер: AT0000531034T
Принадлежит:

Подробнее
07-10-1999 дата публикации

INFORMATION RETRIEVAL AND SPEECH RECOGNITION BASED ON LANGUAGE MODELS

Номер: CA0002321112A1
Принадлежит:

A language model (70) is used in a speech recognition system (60) which has access to a first, smaller data store (72) and a second, larger data store (74). The language model (70) is adapted by formulating an information retrieval query based on information contained in the first data store (72) and querying the second data store (74). Information retrieved from the second data store (74) is used in adapting the language model (70). Also, language models are used in retrieving information from the second data store (74). Language models are built based on information in the first data store (72), and based on information in the second data store (74). The perplexity of a document in the second data store (74) is determined, given the first language model, and given the second language model. Relevancy of the document is determined based upon the first and second perplexities. Documents are retrieved which have a relevancy measure that exceeds a threshold level.

Подробнее
08-02-2011 дата публикации

A SYSTEM AND METHOD OF USING META-DATA IN SPEECH-PROCESSING

Номер: CA0002486125C
Принадлежит: AT&T CORP.

Methods relate to generating a language model for use in, for example, a spoken dialog system or some other application. The method comprises building a class- based language model, generating at least one sequence network and replacing class labels in the class-based language model with the at least one sequence network. In this manner, placeholders or tokens associated with classes can be inserted into the models at training time and word/phone networks can be built based on meta-data information at test time. Finally, the placeholder token can be replaced with the word/phone networks at run time to improve recognition of difficult words such as proper names.

Подробнее
07-08-2014 дата публикации

METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION

Номер: CA0002899537A1
Принадлежит:

An automatic speech recognition method includes at a computer having one or more processors and a memory for storing one or more programs to be executed by the processors, obtaining a plurality of speech corpus categories through classifying and calculating raw speech corpus (801); obtaining a plurality of classified language models that respectively correspond to the plurality of speech corpus categories through language model training applied on each speech corpus category (802); obtaining an interpolation language model through implementing a weighted interpolation on each classified language model and merging the interpolated plurality of classified language models (803); constructing a decoding resource in accordance with an acoustic model and the interpolation language model (804); decoding input speech using the decoding resource, and outputting a character string with a highest probability as the recognition result of the input speech (805).

Подробнее
02-08-1994 дата публикации

METHOD AND APPARATUS FOR FINDING THE BEST SPLITS IN A DECISION TREE FOR A LANGUAGE MODEL FOR A SPEECH RECOGNIZER

Номер: CA0002024382C

A method and apparatus for finding the best or near best binary classification of a set of observed events, according to a predictor feature X so as to minimize the uncertainty in the value of a category feature Y. Each feature has three or more possible values. First, the predictor feature value and the category feature value of each event is measured. From the measured predictor feature values, the joint probabilities of each category feature value and each predictor feature value are estimated. The events are then split, arbitrarily, into two sets of predictor feature values. From the estimated joint probabilities, the conditional probability of an event falling into one set of predictor feature values is calculated for each category feature value. A number of pairs of sets of category feature values are then defined where each set SYj contains only those category feature values having the j lowest values of the conditional probability. From among these pairs of sets, an optimum pair ...

Подробнее
12-09-1996 дата публикации

SPEECH RECOGNITION

Номер: CA0002211636A1
Принадлежит:

A recogniser is provided with a priori probability values (e.g. from some previous recognition) indicating how likely the various words of the recogniser's vocabulary are to occur in the particular context, and recognition "scores" are weighted by these values before a result (or results) is chosen. The recogniser also employs "pruning" whereby low-scoring partial results are discarded, so as to speed the recognition process. To avoid premature pruning of the more likely words, probability values are applied before the pruning decisions are made. A method of applying these probability values is described.

Подробнее
06-05-1999 дата публикации

SELECTION OF SUPERWORDS BASED ON CRITERIA RELEVANT TO BOTH SPEECH RECOGNITION AND UNDERSTANDING

Номер: CA0002275774A1
Принадлежит:

This invention is directed to the selection of superwords based on a criterion relevant to speech recognition and understanding. Superwords are used to refer to those word combinations which are so often spoken that they are recognized as units or should have models to reflect them in the language model. The selected superwords are placed in a lexicon along with selected meaningful phrases. The lexicon is then used by a speech recognizer to improve recognition of input speech utterances for the proper routing of a user's task objectives.

Подробнее
30-03-2018 дата публикации

BILINGUAL CORPUS UPDATE METHOD, BILINGUAL CORPUS UPDATE APPARATUS AND UPDATE PROGRAM THEREOF

Номер: CN0107861937A
Принадлежит:

Подробнее
14-02-2020 дата публикации

Multiple recognizer speech recognition

Номер: CN0110797027A
Принадлежит:

Подробнее
23-04-2012 дата публикации

VOICE RECOGNITION DEVICE CAPABLE OF ENHANCING THE VOICE RECOGNITION PERFORMANCE AND A METHOD THEREOF

Номер: KR1020120038198A
Принадлежит:

PURPOSE: A voice recognition device and a method thereof are provided to select a language model with high points for each word of a word string from a forward language model and a backward language model and calculate a bidirectional language model point. CONSTITUTION: The first voice recognition unit(110) generates word lattice information through voice recognition for an input voice. A word string generating unit(120) generates at least one word string from the word lattice information. A language model point calculating unit(130) calculates a bidirectional language model point for each word of the word string by using a forward language model(160) and a backward language model. A sentence output unit(140) outputs a word string with high points as the voice recognition result. COPYRIGHT KIPO 2012 ...

Подробнее
13-12-2007 дата публикации

LANGUAGE MODEL LEARNING SYSTEM, LANGUAGE MODEL LEARNING METHOD, AND LANGUAGE MODEL LEARNING PROGRAM

Номер: WO000002007142102A1
Принадлежит:

A language model learning system for learning a language model on an identifiable basis relating to the word error rate used in speech recognition. The language model learning system (10) comprises recognizing means (101) for recognizing an input speech by using a sound model and a language model and outputting the recognized word sequence as the recognition result, reliability degree computing means (103) for computing the degree of reliability of the word sequence, and language model parameter updating means (104) for updating the parameters of the language model by using the degree of reliability. The language model parameter updating means updates the parameters of the language model to heighten the degree of reliability of the word sequence the computed degree of reliability of which is low when the recognizing means recognizes by using the updated language model and the reliability degree computing means computes the degree of reliability.

Подробнее
02-06-2000 дата публикации

OPTIMIZATION DEVICE FOR OPTIMIZING A VOCABULARY OF A SPEECH RECOGNITION DEVICE

Номер: WO2000031727A1
Принадлежит:

L'invention concerne un dispositif d'optimisation (19) visant à optimiser le vocabulaire d'un dispositif de reconnaissance vocale (2). Ce dernier comprend une mémoire (10) de lexique dans laquelle des informations sous forme de mots (WI) comptant au moins un premier et un deuxième mots formant le vocabulaire du dispositif (2) peuvent être mémorisées. Ce dispositif comporte également une mémoire (11) de modèle vocal dans laquelle au moins une probabilité d'occurrence du deuxième mot après le premier mot peut être mémorisée sous forme d'informations de probabilité de transition (UWI) dans une séquence de mots composée par ces mots. Par ailleurs, ce dispositif est également pourvu de moyens de définition de mots (21) pour définir un troisième mot et pour le mémoriser sous forme d'informations de mots (WI) dans la mémoire de lexique (10) et pour mémoriser au moins l'information de probabilité de transition (UWI) de la probabilité d'occurrence du troisième mot dans une séquence de mots après ...

Подробнее
22-03-2022 дата публикации

Generative adversarial network based modeling of text for natural language processing

Номер: US0011281976B2

Mechanisms are provided to implement a generative adversarial network (GAN) for natural language processing. With these mechanisms, a generator neural network of the GAN is configured to generate a bag-of-ngrams (BoN) output based on a noise vector input and a discriminator neural network of the GAN is configured to receive a BoN input, where the BoN input is either the BoN output from the generator neural network or a BoN input associated with an actual portion of natural language text. The mechanisms further configure the discriminator neural network of the GAN to output an indication of a probability as to whether the input BoN is from the actual portion of natural language text or is the BoN output of the generator neural network. Moreover, the mechanisms train the generator neural network and discriminator neural network based on a feedback mechanism that compares the output indication from the discriminator neural network to an indicator of whether the input BoN is from the actual ...

Подробнее
11-09-2012 дата публикации

Speech recognition system for providing voice recognition services using a conversational language model

Номер: US0008265933B2

Embodiments of the present invention provide a method, system and article of manufacture for adjusting a language model within a voice recognition system, based on text received from an external application. The external application may supply text representing the words of one participant to a text-based conversation. In such a case, changes may be made to a language model by analyzing the external text received from the external application.

Подробнее
04-07-2017 дата публикации

Acoustic model training

Номер: US0009697823B1

A method, executed by a computer, includes receiving a channel recording corresponding to a conversation, receiving a transcription for the conversation, generating a conversation-specific language model for the conversation using the transcription, and conducting speech recognition on the channel recording using the conversation-specific language model to provide time boundaries and written language corresponding to utterances within the channel recording. The method further includes determining sentence or phrase boundaries for the transcription, aligning written language within the one or more transcriptions with the written language corresponding to the utterances with the channel recording to provide sentence or phrase boundaries for the channel recording, and training a speech recognizer according to the sentence or phrase boundaries for the transcription and the sentence or phrase boundaries for the channel recording. A computer system and computer program product corresponding to ...

Подробнее
04-08-2020 дата публикации

Processing text sequences using neural networks

Номер: US0010733390B2

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling. In one aspect, a system comprises: a masked convolutional decoder neural network that comprises a plurality of masked convolutional neural network layers and is configured to generate a respective probability distribution over a set of possible target embeddings at each of a plurality of time steps; and a modeling engine that is configured to use the respective probability distribution generated by the decoder neural network at each of the plurality of time steps to estimate a probability that a string represented by the target embeddings corresponding to the plurality of time steps belongs to the natural language.

Подробнее
15-05-2018 дата публикации

Allowing spelling of arbitrary words

Номер: US0009971758B1
Принадлежит: Google LLC, GOOGLE INC, GOOGLE LLC, Google Inc.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the methods includes receiving a first voice input from a user device; generating a first recognition output; receiving a user selection of one or more terms in the first recognition output; receiving a second voice input spelling a correction of the user selection; determining a corrected recognition output for the selected portion; and providing a second recognition output that merges the first recognition output and the corrected recognition output.

Подробнее
22-06-2017 дата публикации

MULTI-SPEAKER SPEECH SEPARATION

Номер: US20170178666A1
Автор: DONG YU, YU DONG
Принадлежит:

The technology described herein uses a multiple-output layer RNN to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech. The multiple-output layer RNN has multiple output layers, each of which is meant to trace one speaker (or noise) and represent the mask for that speaker (or noise). The output layer for each speaker (or noise) can have the same dimensions and can be normalized for each output unit across all output layers. The rest of the layers in the multiple-output layer RNN are shared across all the output layers. The result from the previous frame is used as input to the output layer or to one of the hidden layers of the RNN to calculate results for the current frame. This pass back of results allows the model to carry information from previous frames to future frames to trace the same speaker.

Подробнее
24-03-2020 дата публикации

Voice recognition device configured to start voice recognition in response to user instruction

Номер: US0010600422B2
Принадлежит: TOSHIBA TEC KABUSHIKI KAISHA, TOSHIBA TEC KK

A voice recognition device includes a memory and a processor. The processor is configured to store in the memory, digital voice data corresponding to a voice signal input from a voice input unit, recognize a spoken voice utterance from the voice data after a voice input start instruction is received, determine whether to correct the recognition result of the spoken voice utterance based on a time interval from a time when the voice input start instruction is received to a time when the voice signal is input via the voice input unit, and correct the recognition result of the voice utterance based on the time interval.

Подробнее
09-08-2011 дата публикации

System and method of using meta-data in speech processing

Номер: US0007996224B2

Systems and methods relate to generating a language model for use in, for example, a spoken dialog system or some other application. The method comprises building a class-based language model, generating at least one sequence network and replacing class labels in the class-based language model with the at least one sequence network. In this manner, placeholders or tokens associated with classes can be inserted into the models at training time and word/phone networks can be built based on meta-data information at test time. Finally, the placeholder token can be replaced with the word/phone networks at run time to improve recognition of difficult words such as proper names.

Подробнее
28-07-2005 дата публикации

Method and apparatus for identifying programming object attributes

Номер: US20050165767A1
Принадлежит: Microsoft Corporation

The present invention provides a method and computer-readable medium for searching for programming objects on a computer system. Under one aspect of the invention, optional search attributes are used to order a list of references to found programming objects. Under a second aspect of the invention, object attributes that are stored outside of a static attribute storage area are inspected during the search for programming objects.

Подробнее
02-07-2020 дата публикации

SCALABLE DYNAMIC CLASS LANGUAGE MODELING

Номер: US20200211537A1
Принадлежит:

This document generally describes systems and methods for dynamically adapting speech recognition for individual voice queries of a user using class-based language models. The method may include receiving a voice query from a user that includes audio data corresponding to an utterance of the user, and context data associated with the user. One or more class models are then generated that collectively identify a first set of terms determined based on the context data, and a respective class to which the respective term is assigned for each respective term in the first set of terms. A language model that includes a residual unigram may then be accessed and processed for each respective class to insert a respective class symbol at each instance of the residual unigram that occurs within the language model. A transcription of the utterance of the user is then generated using the modified language model.

Подробнее
28-08-2014 дата публикации

CONVERSION OF NON-BACK-OFF LANGUAGE MODELS FOR EFFICIENT SPEECH DECODING

Номер: US20140244261A1
Принадлежит:

Techniques for conversion of non-back-off language models for use in speech decoders. For example, an apparatus for conversion of non-back-off language models for use in speech decoders. For example, an apparatus is configured convert a non-back-off language model to a back-off language model. The converted back-off language model is pruned. The converted back-off language model is usable for decoding speech.

Подробнее
19-05-2016 дата публикации

DYNAMIC LANGUAGE MODEL

Номер: US20160140218A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving a base language model for speech recognition including a first word sequence having a base probability value; receiving a voice search query associated with a query context; determining that a customized language model is to be used when the query context satisfies one or more criteria associated with the customized language model; obtaining the customized language model, the customized language model including the first word sequence having an adjusted probability value being the base probability value adjusted according to the query context; and converting the voice search query to a text search query based on one or more probabilities, each of the probabilities corresponding to a word sequence in a group of one or more word sequences, the group including the first word sequence having the adjusted probability value.

Подробнее
12-03-2013 дата публикации

Method and system for using a statistical language model and an action classifier in parallel with grammar for better handling of out-of-grammar utterances

Номер: US0008396713B2

A method (and system) of handling out-of-grammar utterances includes building a statistical language model for a dialog state using, generating sentences and semantic interpretations for the sentences using finite state grammar, building a statistical action classifier, receiving user input, carrying out recognition with the finite state grammar, carrying out recognition with the statistical language model, using the statistical action classifier to find semantic interpretations, comparing an output from the finite state grammar and an output from the statistical language model, deciding which output of the output from the finite state grammar and the output from the statistical language model to keep as a final recognition output, selecting the final recognition output, and outputting the final recognition result, wherein the statistical action classifier, the finite state grammar and the statistical language model are used in conjunction to carry out speech recognition and interpretation ...

Подробнее
01-09-2022 дата публикации

METHODS AND APPARATUSES FOR DISCRIMINATIVE PRE-TRAINING FOR LOW RESOURCE TITLE COMPRESSION

Номер: US20220277735A1
Принадлежит:

A system for generating compressed product titles that can be used in conversational transactions includes a computing device configured to obtain product title data characterizing descriptive product titles of products available on an ecommerce marketplace and to determine compressed product titles based on the product title data using a machine learning model that is pre-trained using a replaced-token detection task. The computing device also stores the compressed product titles for use during conversational transactions.

Подробнее
26-05-2022 дата публикации

Adapting Hotword Recognition Based On Personalized Negatives

Номер: US20220165277A1
Принадлежит: Google LLC

A method for adapting hotword recognition includes receiving audio data characterizing a hotword event detected by a first stage hotword detector in streaming audio captured by a user device. The method also includes processing, using a second stage hotword detector, the audio data to determine whether a hotword is detected by the second stage hot word detector in a first segment of the audio data. When the hotword is not detected by the second stage hotword detector, the method includes, classifying the first segment of the audio data as containing a negative hotword that caused a false detection of the hotword event in the streaming audio by the first stage hotword detector. Based on the first segment of the audio data classified as containing the negative hotword, the method includes updating the first stage hotword detector to prevent triggering the hotword event in subsequent audio data that contains the negative hotword.

Подробнее
24-03-1993 дата публикации

Method and apparatus for recognizing uttered words in a speech signal

Номер: EP0000533261A2
Принадлежит:

In the recognition of coherently uttered speech, speech models which take into consideration, for example, the probabilities of word combinations, particularly of word pairs, are used for increasing the reliability of recognition. For this purpose, a speech model value corresponding to this probability is added at word boundaries. In some recognition methods, for example if the vocabulary is built up in the form of a tree of phonemes, it is not known at the beginning of the continuation of a hypothesis after a word end which word will actually follow so that a speech model value can only be taken into consideration at the end of the following word. Measures are specified how this can be carried out in such a manner that, if possible, the optimum predecessor word or the optimum predecessor word sequence is taken into consideration for the speech model value without having to set up a copy of the search tree for every predecessor word sequence ending at the same time. ...

Подробнее
18-12-2002 дата публикации

Method and apparatus for speech recognition

Номер: EP0000801378B1
Автор: Alshawi, Hiyan
Принадлежит: LUCENT TECHNOLOGIES INC.

Подробнее
11-06-2009 дата публикации

SPOKEN DATA RETRIEVAL SYSTEM

Номер: JP2009128508A
Автор: SAGAWA HIROHIKO
Принадлежит:

PROBLEM TO BE SOLVED: To provide a speech data retrieval system capable of retrieving a part where a user-specified keyword is uttered, with high accuracy and high speed, even when a speech data becomes a large amount. SOLUTION: Candidate periods are narrowed down, in advance, on the basis of a sequence of subwords generated from a keyword, and then the count values of the candidate periods containing the subwords are each calculated by adding up certain values. Through such a simple process, the candidate periods are prioritized and then selected as retrieved results. In addition, the sequence of subwords generated from the keyword is complemented, assuming that speech recognition errors occur, and then, candidate period generation and selection are performed, on the basis of the complemented sequence of subwords. COPYRIGHT: (C)2009,JPO&INPIT ...

Подробнее
10-11-2005 дата публикации

СИНХРОННОЕ ПОНИМАНИЕ СЕМАНТИЧЕСКИХ ОБЪЕКТОВ ДЛЯ ВЫСОКОИНТЕРАКТИВНОГО ИНТЕРФЕЙСА

Номер: RU2004116303A
Принадлежит:

... 1. Способ взаимодействия с компьютерной системой, реализуемый компьютером, содержащий этапы, на которых принимают ввод от пользователя и воспринимают ввод для обработки, и осуществляют распознавание относительно ввода для получения семантической информации, относящейся к первой части ввода, и выводят семантический объект, содержащий данные в формате, подлежащем обработке компьютерным приложением и соответствующем распознанному вводу, и семантическую информацию для первой части, причем операцию распознавания и вывод семантического объекта осуществляют в то время, как продолжается восприятие последующих частей ввода. 2. Реализуемый компьютером способ по п.1, отличающийся тем, что дополнительно содержит этап, на котором устанавливают языковую модель для осуществления распознавания и понимания, причем языковая модель адаптирована к обеспечению данных в формате, подлежащем обработке компьютерным приложением и соответствующем принятому вводу, и к обеспечению семантической информации для принятого ...

Подробнее
16-08-2018 дата публикации

FAHRZEUGSTEUERSYSTEME UND -VERFAHREN ZUR EINGABE VON MEHRFACHABFRAGEN DURCH SPRACHEINGABE

Номер: DE102018103211A1
Принадлежит:

Ein Infotainmentsystem eines Fahrzeugs beinhaltet: ein primäres Absichtsmodul, das dazu konfiguriert ist, eine primäre Absicht zu ermitteln, die in der Spracheingabe mittels automatischer Spracherkennung (ASR) enthalten ist; und ein Ausführungsmodul, das dazu konfiguriert ist, über eine erste Hardware-Ausgabevorrichtung des Fahrzeugs die primäre Absicht auszuführen. Ein sekundäres Absichtsmodul ist dazu konfiguriert: basierend auf der primären Absicht eine erste Domäne der primären Absicht zu ermitteln; basierend auf der ersten Domäne der primären Absicht eine zweite Domäne zu ermitteln; und basierend auf der Spracheingabe und der zweiten Domäne eine sekundäre Absicht zu ermitteln, die in der Spracheingabe mittels ASR enthalten ist. Ein Anzeigesteuermodul ist dazu konfiguriert, eine Anforderung für eine Benutzereingabe anzuzeigen, die angibt, ob die sekundäre Absicht ausgeführt werden soll. Das Ausführungsmodul ist weiterhin dazu konfiguriert, über eine zweite Hardware-Ausgabevorrichtung ...

Подробнее
16-09-1999 дата публикации

MEHRTEILIGER EXPERTSYSTEM

Номер: DE0069229124T2

Подробнее
22-08-2002 дата публикации

Spracherkennungssystem, Trainingseinrichtung und Verfahren zum Berechnen von Iterationswerten für freie Parameter eines Maximum-Entropie-Sprachmodells

Номер: DE0010106580A1
Принадлежит:

Die Erfindung betrifft ein Spracherkennungssystem und ein Verfahren zum Berechnen von Iterationswerten für freie Parameter lambdaalpha des Maximum-Entropie-Sprachmodells. Es ist im Stand der Technik bekannt, diese freien Parameter lambdaalpha z. B. mit Hilfe eines GIS-Trainingsalgorithmus zyklisch iterativ zu approximieren. Zyklisch bedeutet in diesem Fall, dass bei jedem Iterationsschritt n eine zyklisch vorbestimmte Merkmalsgruppe Ai(n) des Sprachmodells zur Berechnung des n + 1 Iterationswertes für die freien Parameter ausgewertet wird. Eine derartig starr zyklisch zugeordnete Merkmalsgruppe Ai(n) ist jedoch nicht immer am besten geeignet, den GIS-Trainingsalgorithmus in einer aktuellen Situation am schnellsten und effektivsten konvergieren zu lassen. Es wird deshalb erfindungsgemäß ein Verfahren zur Auswahl der in dieser Hinsicht am besten geeigneten Merkmalsgruppe vorgeschlagen, wobei der Grad der Anpassung von Iterationsrandwerten DOLLAR I1 an jeweils zugehörige gewünschte Randwerte ...

Подробнее
24-09-2009 дата публикации

Verfahren zur Personalisierung eines Dienstes

Номер: DE602005015984D1
Принадлежит: SWISSCOM AG

Подробнее
08-08-2001 дата публикации

Method and apparatus for voice annotation and retrieval of multimedia data

Номер: GB0000114490D0
Автор:
Принадлежит:

Подробнее
15-05-2005 дата публикации

LANGUAGE IDENTIFICATION

Номер: AT0000295604T
Принадлежит:

Подробнее
15-05-2010 дата публикации

SPEECH RECOGNITION OF MEANS OF A STATISTIC LANGUAGE MODEL USING SQUARE ROOT SMOOTHING

Номер: AT0000466361T
Принадлежит:

Подробнее
19-06-2000 дата публикации

Method for identifying the language of individual words

Номер: AU0001623200A
Принадлежит:

Подробнее
12-09-2002 дата публикации

HIERARCHICHAL LANGUAGE MODELS

Номер: CA0002437620A1
Автор: EPSTEIN, MARK EDWARD
Принадлежит:

The invention disclosed herein concerns a method of converting speech to text using a hierarchy of contextual models. The hierarchy of contextual models can be statistically smoothed into a language model. The method can include processing text with a plurality of contextual models. Each one of the plurality of contextual models can correspond to a node in a hierarchy of the plurality of contextual models. Also included can be identifying at least one of the contextual models relating to the text and processing subsequent user spoken utterances with the identified at least one contextual model.

Подробнее
23-02-2006 дата публикации

SYSTEM AND METHOD OF LATTICE-BASED SEARCH FOR SPOKEN UTTERANCE RETRIEVAL

Номер: CA0002515613A1
Принадлежит:

A system and method are disclosed for retrieving audio segments from a spoken document. The spoken document preferably is one having moderate word error rates such as telephone calls or teleconferences. The method comprises converting speech associated with a spoken document into a lattice representation and indexing the lattice representation of speech. These steps are performed typically off line. Upon receiving a query from a user, the method further comprises searching the indexed lattice representation of speech and returning retrieved audio segments from the spoken document that match the user query.

Подробнее
30-09-2003 дата публикации

SELECTION OF SUPERWORDS BASED ON CRITERIA RELEVANT TO BOTH SPEECH RECOGNITION AND UNDERSTANDING

Номер: CA0002275774C
Принадлежит: AT&T CORP., AT & T CORP

This invention is directed to the selection of superwords based on a criteri on relevant to speech recognition and understanding. Superwords are used to ref er to those word combinations which are so often spoken that they are recognize d as units or should have models to reflect them in the language model. The selected superwords are placed in a lexicon along with selected meaningful phrases. The lexicon is then used by a speech recognizer to improve recognition of input speech utterances for the proper routing of a user's ta sk objectives.

Подробнее
10-12-2004 дата публикации

TRAINING FOR DISCRIMINATING CML LANGUAGE MODEL TO CLASSIFY TEXT AND SPEECH

Номер: KR20040104420A
Принадлежит:

PURPOSE: Training for discriminating a CML(Conditional Maximum Likelihood) language model to classify text and speech is provided to increase accuracy of classification by training the language model that conditional likelihood of a class gets to be maximal if a word string is given. CONSTITUTION: One specified class model is generated to each class or task(402). When natural language is inputted, the specified class model is executed for corresponding information of each class(406). Output of each language model is multiplied to a prior probability for the corresponding class(408). The class having the highest result value is corresponding to a target class(410). © KIPO 2005 ...

Подробнее
05-06-2014 дата публикации

SPEECH TRANSCRIPTION INCLUDING WRITTEN TEXT

Номер: WO2014085049A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transcribing utterances into written text are disclosed. The methods, systems, and apparatus include actions of obtaining a lexicon model that maps phones to spoken text and obtaining a language model that assigns probabilities to written text. Further includes generating a transducer that maps the written text to the spoken text, the transducer mapping multiple items of the written text to an item of the spoken text. Additionally, the actions include constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model.

Подробнее
22-04-2004 дата публикации

LANGUAGE MODEL CREATION/ACCUMULATION DEVICE, SPEECH RECOGNITION DEVICE, LANGUAGE MODEL CREATION METHOD, AND SPEECH RECOGNITION METHOD

Номер: WO2004034378A1
Принадлежит:

A language model creation/accumulation device (10) for creating and accumulating a language model for speech recognition includes: an upper node class N-gram creation/accumulation section (11) for creating and accumulating an upper node N-gram language model obtained by modeling a plurality of texts as a word string containing a word string cluster having a particular language characteristic; and a lower node class-dependent word N-gram creation/accumulation section (12) for creating and accumulating a lower node N-gram language model obtained by modeling a word string in a word string cluster.

Подробнее
24-05-2016 дата публикации

Learning parsing rules and argument identification from crowdsourcing of proposed command inputs

Номер: US0009348805B1
Принадлежит: Google Inc., GOOGLE INC, GOOGLE INC.

Systems, methods and apparatus for learning parsing rules and argument identification from crowdsourcing of proposed command inputs are disclosed. Crowdsourcing techniques are used to generate rules for parsing input sentences. A parse is used to determine whether the input sentence invokes a specific action, and if so, what arguments are to be passed to the invocation of the action.

Подробнее
05-12-2019 дата публикации

Question Answering Method and Apparatus

Номер: US2019371299A1
Принадлежит:

A question answering method includes obtaining target question information; determining a candidate question and answer pair based on the target question information; calculating a confidence of answer information in the candidate question and answer pair, where the confidence is used to indicate a probability that question information in the candidate question and answer pair belongs to an answer database or an adversarial database; determining whether the confidence is less than a first preset threshold; and when the confidence is less than the first preset threshold, outputting information indicating incapable of answering.

Подробнее
22-03-2016 дата публикации

Sub-lexical language models with word level pronunciation lexicons

Номер: US0009292489B1
Принадлежит: Google Inc., GOOGLE INC, GOOGLE INC.

An automatic speech recognition (ASR) system and method are provided for using sub-lexical language models together with word level pronunciation lexicons. These approaches operate by introducing a transduction between sequences of sub-lexical units and sequences of words.

Подробнее
19-02-2019 дата публикации

Disambiguation of a spoken query term

Номер: US0010210267B1
Принадлежит: Google LLC, GOOGLE LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing spoken query terms. In one aspect, a method includes performing speech recognition on an audio signal to select two or more textual, candidate transcriptions that match a spoken query term, and to establish a speech recognition confidence value for each candidate transcription, obtaining a search history for a user who spoke the spoken query term, where the search history references one or more past search queries that have been submitted by the user, generating one or more n-grams from each candidate transcription, where each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription, and determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency.

Подробнее
26-01-2012 дата публикации

Speech to Text Conversion

Номер: US20120022867A1
Принадлежит: Google LLC

Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device.

Подробнее
23-02-2012 дата публикации

Retrieval and presentation of network service results for mobile device using a multimodal browser

Номер: US20120046950A1
Принадлежит: Nuance Communications Inc

A method of obtaining information using a mobile device can include receiving a request including speech data from the mobile device, and querying a network service using query information extracted from the speech data, whereby search results are received from the network service. The search results can be formatted for presentation on a display of the mobile device. The search results further can be sent, along with a voice grammar generated from the search results, to the mobile device. The mobile device then can render the search results.

Подробнее
25-10-2012 дата публикации

Speech recognition using multiple language models

Номер: US20120271631A1
Принадлежит: ROBERT BOSCH GMBH

In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.

Подробнее
20-06-2013 дата публикации

Retrieval and presentation of network service results for mobile device using a multimodal browser

Номер: US20130158994A1
Принадлежит: Nuance Communications Inc

A method of obtaining information using a mobile device can include receiving a request including speech data from the mobile device, and querying a network service using query information extracted from the speech data, whereby search results are received from the network service. The search results can be formatted for presentation on a display of the mobile device. The search results further can be sent, along with a voice grammar generated from the search results, to the mobile device. The mobile device then can render the search results.

Подробнее
04-07-2013 дата публикации

Speech recognition apparatus, speech recognition method, and speech recognition program

Номер: US20130173267A1
Автор: Nobuyuki Washio
Принадлежит: Fujitsu Ltd

A apparatus includes: a storage unit to store a model representing a relationship between a relative time and an occurrence probabilities; a first detection unit to detect first speech period of a first speaker; a second period detection unit to detect second speech period of a second speaker; a unit to calculate a feature value of the first speech period; a detection unit to detect a word using the calculated feature value; an adjustment unit to make an adjustment such that in detecting a word for a reply by the detection unit, the adjustment unit retrieves an occurrence probability corresponding to a relative position of the reply in the second speech period, and adjusts a word score or a detection threshold value for the reply; and a second detection unit to re-detect, using the adjusted word score or the adjusted detection threshold value, the detected word by the detection unit.

Подробнее
06-01-2022 дата публикации

CROSS-CONTEXT NATURAL LANGUAGE MODEL GENERATION

Номер: US20220005463A1
Принадлежит:

Provided is a method including obtaining a corpus and an associated set of domain indicators. The method includes learning a set of vectors in an embedding space based on n-grams of the corpus. The method includes updating ontology graphs comprising a set of vertices and edges associating the set of vertices with each other. The method also includes determining a vector cluster using hierarchical clustering based on distances of the set of vectors with respect to each other in the embedding space and determining a hierarchy of the ontology graphs based on a set of domain indicators of a respective set of vertices corresponding to vectors of the vector cluster. The method also includes updating an index based on the ontology graphs. 1. A computer-implemented method of using domain-specific ontologies to of providing summaries of documents in a corpora of natural-language text documents , the method comprising:obtaining, with a computer system, a set of user-specific context parameters and a natural-language text document;determining, with the computer system, a first domain of knowledge based on the set of user-specific context parameters, wherein the first domain of knowledge maps to a first ontology amongst a plurality of ontologies, and wherein ontologies in the plurality of ontologies map n-grams onto a set of concepts to which the n-grams refer;scoring, with the computer system, a first set of n-grams of the natural-language text document using a scoring model based on relations between members of the first set of n-grams;selecting, with the computer system, text sections of the natural-language text based on n-gram scores provided by the scoring model;determining, with the computer system, an initial set of n-grams of the n-grams, wherein each respective n-gram of the initial set of n-grams maps to a respective concept of the set of concepts, and wherein each respective n-gram is identified by an ontology other than the first ontology;determining, with the ...

Подробнее
05-01-2017 дата публикации

Speech recognition apparatus, speech recognition method, and electronic device

Номер: US20170004824A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

A speech recognition apparatus includes a probability calculator configured to calculate phoneme probabilities of an audio signal using an acoustic model; a candidate set extractor configured to extract a candidate set from a recognition target list; and a result returner configured to return a recognition result of the audio signal based on the calculated phoneme probabilities and the extracted candidate set.

Подробнее
07-01-2021 дата публикации

TRAINING ARTIFICIAL INTELLIGENCE TO USE ANSWER PASSAGE MERGING FOR FULL SENTENCES RESPONSES IN A QUESTION ANSWERING SYSTEM

Номер: US20210004673A1
Принадлежит:

A method trains and utilizes an artificial intelligence (AI) system. The AI system receives a question that has contextual features. The method trains the AI system to identify entries in a corpus that have one or more of the contextual features from the question. The method further trains the AI system to: form a set of answers to the question based on identified contextual entries in the corpus; identify and name an entry in the corpus that has a highest quantity of contextual features that match the contextual features in the question as an initial answer to the question; identify and merge multiple other answers to the question from the corpus; and replace the initial answer with the merged answer in order to create a fully trained AI system. The fully trained AI system is then utilized to answer the question with the merged answer. 1. A method comprising:receiving, by an artificial intelligence (AI) system, a question that has contextual features;training the AI system to identify entries in a corpus that have one or more of the contextual features from the question;further training the AI system to identify multiple answers to the question, wherein the multiple answers are derived from the corpus;further training the AI system to merge the multiple answers into a merged answer to the question in order to create a fully trained AI system; andutilizing the fully trained AI system to answer the question with the merged answer.2. The method of claim 1 , further comprising:further training the AI system to identify an entry in the corpus that has a highest quantity of contextual features that match the contextual features in the question;further training the AI system to name the entry in the corpus that has the highest quantity of contextual features that match the contextual features in the question as an initial answer to the question;further training the AI system to identify merged answers that have a total quantity of contextual features that exceed the ...

Подробнее
04-01-2018 дата публикации

Speech Recognition

Номер: US20180005628A1
Автор: XUE Shaofei
Принадлежит:

A speech recognition method includes clustering feature vectors of training data to obtain clustered feature vectors of training data performing interpolation calculation on feature vectors of data to be recognized using the clustered feature vectors of training data, and inputting the feature vectors of data to be recognized after the interpolation calculation into a speech recognition model to adaptively adjust the speech recognition model. The techniques of the present disclosure improve speech recognition accuracy and adaptive processing efficiency. 1. A method comprising:clustering, by one or more processors of a computing device, feature vectors of training data to obtain clustered feature vectors of training data;performing interpolation calculation on feature vectors of data to be recognized using the clustered feature vectors of training data; andinputting the feature vectors of data to be recognized into a speech recognition model to adaptively adjust the speech recognition model.2. The method of claim 1 , further comprising:performing adaptive training of the speech recognition model using the clustered feature vectors of training data to obtain the speech recognition model after obtaining the clustered feature vectors of training data.3. The method of claim 1 , further comprising:after clustering the feature vectors of training data and before obtaining the clustered feature vectors of training data,performing weight average processing on clustered feature vectors of training data that belong to a cluster.4. The method of claim 1 , wherein the performing interpolation calculation on the feature vectors of data to be recognized using the clustered feature vectors of training data comprises:calculating a cosine distance between the feature vectors of data to be recognized and the clustered feature vectors of training data; andperforming interpolation calculation on the feature vectors of data to be recognized using a predetermined number of clustered ...

Подробнее
02-01-2020 дата публикации

SPEECH RECOGNITION METHOD AND SPEECH RECOGNITION DEVICE

Номер: US20200005774A1
Автор: YUN Hwan Sik
Принадлежит:

Disclosed are a speech recognition method capable of communicating with other electronic devices and an external server in a 5G communication condition by performing speech recognition by executing an artificial intelligence (AI) algorithm and/or a machine learning algorithm. The speech recognition method may comprise performing speech recognition by using an acoustic model and a language model stored in a speech database, determining whether the speech recognition of the spoken sentence is successful, storing speech recognition failure data when the speech recognition of the spoken sentence fails, analyzing the speech recognition failure data of the spoken sentence and updating the acoustic model or the language model by adding the recognition failure data to a learning database of the acoustic model or the language model when the cause of the speech recognition failure is due to the acoustic model or the language model and machine-learning the acoustic model or the language model. 1. A speech recognition method comprising:receiving a spoken sentence speech spoken by a user;performing speech recognition using an acoustic model and a language model stored in a speech database;determining whether the speech recognition is successful;storing speech recognition failure data when the speech recognition fails;analyzing the speech recognition failure data to determine whether a cause of the speech recognition failure is due to the acoustic model or the language model; andupdating the acoustic model by adding the recognition failure data to a learning database of the acoustic model when the cause of the speech recognition failure is due to the acoustic model and machine-learning the acoustic model based on the added learning database of the acoustic model and updating the language model by adding the recognition failure data to a learning database of the language model when the cause of the speech recognition failure is due to the language model and machine-learning the ...

Подробнее
20-01-2022 дата публикации

SPEECH SIGNAL PROCESSING METHOD AND APPARATUS

Номер: US20220020362A1
Автор: KANG Tae Gyoon
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

A speech signal processing method and apparatus is disclosed. The speech signal processing method includes receiving an input token that is based on a speech signal, calculating first probability values respectively corresponding to candidate output tokens based on the input token, adjusting at least one of the first probability values based on a priority of each of the first probability values, and processing the speech signal based on an adjusted probability value obtained by the adjusting. 1. A speech signal processing method , comprising:receiving an input token that is based on a speech signal;calculating first probability values respectively corresponding to candidate output tokens based on the input token;adjusting at least one of the first probability values based on a priority of each of the first probability values; andprocessing the speech signal based on an adjusted probability value obtained by the adjusting.2. The speech signal processing method of claim 1 , wherein the adjusting comprises:determining whether a first probability value corresponding to a first candidate output token from among the candidate output tokens is included in a predetermined priority; andadjusting the first probability value based on a result of the determining.3. The speech signal processing method of claim 2 , wherein the first candidate output token is a token corresponding to an end of a sentence.4. The speech signal processing method of claim 2 , wherein the adjusting of the first probability value based on the result of the determining comprises:decreasing the first probability value, in response to the first probability value not being included in the predetermined priority.5. The speech signal processing method of claim 4 , wherein the decreasing of the first probability value comprises:adjusting a logarithmic value of the first probability value to negative infinity.6. The speech signal processing method of claim 1 , wherein the processing comprises:outputting a text ...

Подробнее
27-01-2022 дата публикации

METHOD AND APPARATUS FOR AUTOMATICALLY EXTRACTING NEW FUNCTION OF VOICE AGENT BASED ON USAGE LOG ANALYSIS

Номер: US20220028386A1
Принадлежит:

A method and apparatus for generating a new function of a voice agent, wherein usage logs of users of the voice agent may be analyzed to extract a set of utterances of the users with respect to a new function of the voice agent, proto capsules for the set of utterances are provided. The method includes based on the set of utterances, ranks of importance of the proto capsules may be determined, a vocabulary of a proto capsule having a higher rank than a preset criterion may be identified, and a source code stub for a new function of the voice agent corresponding to the proto capsule having the higher rank may be generated based on the identified vocabulary. 1. A method of generating a new function of a voice agent , the method comprising:extracting a set of utterances of users of the voice agent with respect to the new function of the voice agent by analyzing usage logs of the users;generating proto capsules for the set of utterances based on the set of utterances;determining a rank of importance of each of the proto capsules;identifying a vocabulary of a proto capsule having a rank higher than a preset criterion; andgenerating a source code stub for the new function of the voice agent corresponding to the proto capsule having the higher rank, based on the identified vocabulary.2. The method of claim 1 , wherein the utterances of the users with respect to the new function of the voice agent comprise utterances of the users input to the voice agent for an operation that is unable to be performed with an existing function of the voice agent.3. The method of claim 1 , further comprising:obtaining an identified intention of the proto capsule having the higher rank,wherein the source code stub is generated based on the identified vocabulary and the identified intention.4. The method of claim 3 , wherein identifying the intention comprises one of an additional clustering method or a method of determining semantic similarity through comparison with already-developed ...

Подробнее
10-01-2019 дата публикации

SYLLABLE BASED AUTOMATIC SPEECH RECOGNITION

Номер: US20190013009A1
Принадлежит:

Systems, methods, and computer programs are described which utilize the structure of syllables as an organizing element of automated speech recognition processing to overcome variations in pronunciation, to efficiently resolve confusable aspects, to exploit context, and to map the speech to orthography. 1. A data processing method comprising:receiving, at a computing system, a production symbol stream produced from spoken words of a particular language from an acoustic processing system;extracting, from the production symbol stream, a plurality of production patterns;using a stored production to canonical mapping data comprising conditional probabilities for one or more mappings of production patterns to canonical patterns, generating candidate syllables and a probability of each candidate syllable from the plurality of production patterns;using a stored syllable to orthographic pattern mapping comprising conditional probabilities for one or more mappings, generating candidate orthographic patterns and a probability of each candidate orthographic pattern from the candidate syllables;based, at least in part, on the probabilities for each candidate orthographic pattern, generating an orthographic representation of the production symbol stream.2. The data processing method of claim 1 , wherein the production stream is segmented into phonotactic units comprising intervowel consonant (IVC) and vowel neighborhood (VN) units claim 1 , by performing sequentially for each symbol of the production symbol stream:initializing a three-symbol buffer to zero and an IVC accumulator buffer to zero and adding production symbols sequentially to the three-symbol buffer;after adding a symbol to the three-symbol buffer, determining if the middle symbol of the three-symbol buffer is a vowel and that the three symbols therefore comprise a VN, storing the VN;if an added symbol is a consonant, appending that consonant to the IVC accumulator;if the next added symbol is not a consonant, ...

Подробнее
09-01-2020 дата публикации

Acoustic information based language modeling system and method

Номер: US20200013391A1
Принадлежит: LG ELECTRONICS INC

Disclosed are a speech data based language modeling system and method. The speech data based language modeling method includes transcription of text data, and generation of a regional dialect corpus based on the text data and regional dialect-containing speech data and generation of an acoustic model and a language model using the regional dialect corpus. The generation of an acoustic model and a language model is performed by machine learning of an artificial intelligence (AI) algorithm using speech data and marking of word spacing of a regional dialect sentence using a speech data tag. A user is able to use a regional dialect speech recognition service which is improved using 5G mobile communication technologies of eMBB, URLLC, or mMTC.

Подробнее
09-01-2020 дата публикации

IMPLEMENTING A WHOLE SENTENCE RECURRENT NEURAL NETWORK LANGUAGE MODEL FOR NATURAL LANGUAGE PROCESSING

Номер: US20200013393A1
Принадлежит:

A computer selects a test set of sentences from among sentences applied to train a whole sentence recurrent neural network language model to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct. The computer generates imposter sentences from among the test set of sentences by substituting one word in each sentence of the test set of sentences. The computer generates, through the whole sentence recurrent neural network language model, a first score for each sentence of the test set of sentences and at least one additional score for each of the imposter sentences. The computer evaluates an accuracy of the natural language processing system in performing sequential classification tasks based on an accuracy value of the first score in reflecting a correct sentence and the at least one additional score in reflecting an incorrect sentence. 1: A method , comprising:selecting, by a computer system, a test set of sentences from among plurality of sentences applied to train a whole sentence recurrent neural network language model to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct;generating, by the computer system, a plurality of imposter sentences from among the test set of sentences by substituting one word in each sentence of the test set of sentences;generating, by the computer system, through the whole sentence recurrent neural network language model, a first score for each sentence of the test set of sentences and at least one additional score for each of the plurality of imposter sentences; andevaluating, by the computer system, an accuracy of the natural language processing system in performing sequential classification tasks based on an accuracy value of the first score in reflecting a correct sentence and the at least one additional score in reflecting an incorrect sentence.2: The method according to claim 1 , wherein generating claim ...

Подробнее
17-01-2019 дата публикации

WORD HASH LANGUAGE MODEL

Номер: US20190019503A1
Автор: Henry Shawn
Принадлежит:

A language model may be used in a variety of natural language processing tasks, such as speech recognition, machine translation, sentence completion, part-of-speech tagging, parsing, handwriting recognition, or information retrieval. A natural language processing task may use a vocabulary of words, and a word hash vector may be created for each word in the vocabulary. A sequence of input words may be received, and a hash vector may be obtained for each word in the sequence. A language model may process the hash vectors for the sequence of input words to generate an output hash vector that describes words that are likely to follow the sequence of input words. One or words may then be selected using the output word hash vector and used for a natural language processing task. 1. A computer-implemented method for implementing a neural network language model , the method comprising:obtaining a word hash vector for each word of a vocabulary of words;receiving a first sequence of words for processing by the neural network language model to select a word to follow the first sequence of words;generating a first sequence of word hash vectors by retrieving a word hash vector for each word of the first sequence of words;processing the first sequence of word hash vectors with a layer of the neural network language model to compute a first output vector;quantizing the first output vector to obtain a first output word hash vector;determining a distance between the first output word hash vector and a first hash vector for a first word in the vocabulary; andselecting the first word from the vocabulary using the distance between the first output word hash vector and the first hash vector for the first word.2. The computer-implemented method of claim 1 , further comprising using the selected first word to perform at least one of: speech recognition claim 1 , machine translation claim 1 , sentence completion claim 1 , part-of-speech tagging claim 1 , parsing claim 1 , handwriting ...

Подробнее
16-01-2020 дата публикации

Question Answering Using Trained Generative Adversarial Network Based Modeling of Text

Номер: US20200019642A1
Принадлежит:

Mechanisms are provided for implementing a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag-of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation. The QA system obtains a plurality of candidate answers to a natural language question, where each candidate answer comprises one or more ngrams. For each candidate answer, a confidence score is generated based on a comparison of the one or more ngrams in the candidate answer to ngrams in the BoN output of the generator neural network of the GAN. A final answer to the input natural language question is selected from the plurality of candidate answers based on the confidence scores associated with the candidate answers, and is output. 1. A method , in a data processing system comprising at least one processor and at least one memory , the at least one memory comprising instructions executed by the at least one processor to configure the processor to implement a Question Answering (QA) system , the method comprising:training a generator neural network of a generative adversarial network (GAN) to generate a bag-of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation;obtaining, by the QA system, a plurality of candidate answers to a natural language question, wherein each candidate answer comprises one or more ngrams;generating, by the QA system, for each candidate answer in the plurality of candidate answers, a confidence score associated with the candidate answer based on a comparison of the one or more ngrams in the candidate answer to ngrams in the BoN output of the generator neural network of the GAN, wherein the confidence score represents a confidence that the candidate answer is a correct answer to the input natural language question;selecting, by the QA system, at least one final answer to the input natural language question from the plurality ...

Подробнее
16-01-2020 дата публикации

Generative Adversarial Network Based Modeling of Text for Natural Language Processing

Номер: US20200019863A1
Принадлежит: International Business Machines Corp

Mechanisms are provided to implement a generative adversarial network (GAN) for natural language processing. With these mechanisms, a generator neural network of the GAN is configured to generate a bag-of-ngrams (BoN) output based on a noise vector input and a discriminator neural network of the GAN is configured to receive a BoN input, where the BoN input is either the BoN output from the generator neural network or a BoN input associated with an actual portion of natural language text. The mechanisms further configure the discriminator neural network of the GAN to output an indication of a probability as to whether the input BoN is from the actual portion of natural language text or is the BoN output of the generator neural network. Moreover, the mechanisms train the generator neural network and discriminator neural network based on a feedback mechanism that compares the output indication from the discriminator neural network to an indicator of whether the input BoN is from the actual portion of natural language text of the BoN output of the generator neural network.

Подробнее
16-01-2020 дата публикации

PREDICTING USER ACTIONS ON UBIQUITOUS DEVICES

Номер: US20200020326A1
Принадлежит:

A method includes that for each model from multiple models, evaluating a model prediction accuracy based on a dataset of a user over a first time duration. The dataset includes a sequence of actions with corresponding contexts based on electronic device interactions. Each model is trained to predict a next action at a time point within the first time duration, based on a first behavior sequence over a first time period from the dataset before the time point, a second behavior sequence over a second time period from the dataset before the time point, and context at the time point. A model is selected from the multiple models based on its model prediction accuracy for the user based on a domain. An action to be initiated at a later time using an electronic device of the user is recommended using the selected model during a second time duration. 1. A method , comprising: the dataset comprises a sequence of actions with corresponding contexts based on electronic device interactions; and', 'each model is trained to predict a next action at a time point within the first time duration, based on a first behavior sequence over a first time period from the dataset before the time point, a second behavior sequence over a second time period from the dataset before the time point, and context at the time point;, 'for each model from a plurality of models, evaluating a model prediction accuracy based on a dataset of a user over a first time duration, whereinselecting a model from the plurality of models based on its model prediction accuracy for the user based on a domain; andrecommending an action to be initiated at a later time using an electronic device of the user using the selected model during a second time duration.2. The method of claim 1 , wherein evaluating the model prediction accuracy further comprising:observing an actual action taking place at the time point; andcalculating the model prediction accuracy of the model based on difference between the predicted next ...

Подробнее
16-01-2020 дата публикации

METHOD AND APPARATUS FOR RECOGNIZING A VOICE

Номер: US20200020327A1
Принадлежит: LG ELECTRONICS INC.

Disclosed are a speech recognition device and a speech recognition method which perform speech recognition by executing an artificial intelligence (AI) algorithms and/or a machine learning algorithm installed thereon, to communicate with other electronic devices and an external server in a 5G communication environment. The speech recognition method according to an embodiment of the present disclosure may include converting a series of spoken utterance signals to a text item, extracting a discordant named-entity that is discordant with a parent domain inferred form the text, calculating probabilities of candidate words associated with the discordant named-entity based on calculated distances between a term representing the parent domain and each candidate word associated with the discordant named-entity, and based on the calculated probabilities, modifying the discordant named-entity in the text to one of the candidate words associated with the discordant named-entity. 1. A speech recognition method by a speech recognition device , comprising:converting a series of spoken utterance signals to a text item;extracting a discordant named-entity discordant with a parent domain inferred from the text item;calculating probabilities for candidate words associated with the discordant named-entity, based on calculation of distances between each candidate word associated with the discordant named-entity and a term representing the parent domain; andmodifying, based on the calculated probabilities, the discordant named-entity in the text item to one candidate word among the candidate words associated with the discordant named-entity.2. The speech recognition method of claim 1 , wherein the converting includes:extracting the candidate words by analyzing a pronunciation and a context of a word included in a spoken utterance, using an acoustic model and a language model;calculating probabilities of concordance between each candidate word and the word included in the spoken ...

Подробнее
21-01-2021 дата публикации

LANGUAGE MODELS USING DOMAIN-SPECIFIC MODEL COMPONENTS

Номер: US20210020170A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system. 1. A method comprising:obtaining, at data processing hardware, a plurality of training language examples each labeled as occurring in one or more particular aspects of non-linguistic context; and from among multiple domain-specific model components, triggering, by the data processing hardware, for use in training a language model on the corresponding training language example in unison with a baseline model component, one or more of the domain-specific model components that correspond to at least one of the one or more particular aspects of non-linguistic context that the corresponding training language example is labeled as occurring in;', 'generating, by the data processing hardware, as output from the language model, using both the baseline model component and the one or more triggered domain-specific model components, a language model score; and', 'updating, by the data processing hardware, using the language model score output from the language model, corresponding weights of each of the one or more triggered domain-specific model components without updating corresponding weights of the baseline model component., 'for each training language example of the plurality of training language examples2. The method of claim 1 , wherein ...

Подробнее
21-01-2021 дата публикации

METHOD, APPARATUS, DEVICE AND COMPUTER READABLE STORAGE MEDIUM FOR RECOGNIZING AND DECODING VOICE BASED ON STREAMING ATTENTION MODEL

Номер: US20210020175A1
Принадлежит:

A method, apparatus, device, and computer readable storage medium for recognizing and decoding a voice based on a streaming attention model are provided. The method may include generating a plurality of acoustic paths for decoding the voice using the streaming attention model, and then merging acoustic paths with identical last syllables of the plurality of acoustic paths to obtain a plurality of merged acoustic paths. The method may further include selecting a preset number of acoustic paths from the plurality of merged acoustic paths as retained candidate acoustic paths. Embodiments of the present disclosure present a concept that acoustic score calculating of a current voice fragment is only affected by its last voice fragment and has nothing to do with earlier voice history, and merge acoustic paths with the identical last syllables of the plurality of candidate acoustic paths. 1. A method for recognizing and decoding a voice based on a streaming attention model , comprising:generating a plurality of acoustic paths for decoding the voice using the streaming attention model;merging acoustic paths with identical last syllables among the plurality of acoustic paths to obtain a plurality of merging acoustic paths; andselecting a preset number of acoustic paths from the plurality of merging acoustic paths.2. The method according to claim 1 , wherein the generating a plurality of acoustic paths for decoding the voice using the streaming attention model comprises:generating the plurality of acoustic paths based on a candidate acoustic path of a voice prior to the last voice and a plurality of modeling units of the streaming attention model.3. The method according to claim 1 , wherein the generating a plurality of acoustic paths for decoding the voice using the streaming attention model comprises:determining an acoustic model score of each of the plurality of acoustic paths using the streaming attention model;determining a language model score of each of the plurality ...

Подробнее
26-01-2017 дата публикации

UNSUPERVISED TRAINING METHOD, TRAINING APPARATUS, AND TRAINING PROGRAM FOR AN N-GRAM LANGUAGE MODEL BASED UPON RECOGNITION RELIABILITY

Номер: US20170025118A1
Принадлежит:

A computer-based, unsupervised training method for an N-gram language model includes reading, by a computer, recognition results obtained as a result of speech recognition of speech data; acquiring, by the computer, a reliability for each of the read recognition results; referring, by the computer, to the recognition result and the acquired reliability to select an N-gram entry; and training, by the computer, the N-gram language model about selected one of more of the N-gram entries using all recognition results. 1. A non-transitory , computer readable storage medium having instructions stored thereon that , when executed by a computer , implement a training method for an N-gram language model , the method comprising:reading recognition results obtained as a result of speech recognition of speech data;acquiring a reliability for each of the read recognition results;referring to each recognition result's acquired reliability to select a subset of one or more N-gram entries based upon their respective reliabilities; andtraining the N-gram language model for one of more entries of the subset of N-gram entries using all recognition results, wherein the processing device is further configured to select from a first corpus, a second corpus, and a third corpus, each of the N-gram entries, whose sum of a first number of appearances in the first corpus as a set of all the recognition results, a second number of appearances in a second corpus as a subset of the recognition results with the reliability higher than or equal to a predetermined threshold value, and a third number of appearances in the third corpus as a baseline of the N-gram language model exceeds a predetermined number of times, where each of the first number of appearances, the second number of appearances, and the third number of appearances is given a different weight, respectively.2. The computer readable storage medium of claim 1 , wherein each of the weights respectively given to each of the first number ...

Подробнее
26-01-2017 дата публикации

Apparatus and method of acoustic score calculation and speech recognition

Номер: US20170025119A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

An apparatus for calculating acoustic score, a method of calculating acoustic score, an apparatus for speech recognition, a method of speech recognition, and an electronic device including the same are provided. An apparatus for calculating acoustic score includes a preprocessor configured to sequentially extract audio frames into windows and a score calculator configured to calculate an acoustic score of a window by using a deep neural network (DNN)-based acoustic model.

Подробнее
26-01-2017 дата публикации

Business Listing Search

Номер: US20170025123A1
Принадлежит:

A method of operating a voice-enabled business directory search system includes receiving category-business pairs, each category-business pair including a business category and a specific business, and establishing a data structure having nodes based on the category-business pairs. Each node of the data structure is associated with one or more business categories and a speech recognition language model for recognizing specific businesses associated with the one or more businesses categories. 1) A method of searching a business listing with voice commands over the Internet , the method comprising:receiving, over the Internet, from a user terminal, a query spoken by a user, wherein the query spoken by the user includes a speech utterance representing a category of businesses and a speech utterance representing a geographic location;recognizing the geographic location with a speech recognition engine based on the speech utterance representing the geographic location;recognizing the category of businesses with the speech recognition engine based on the speech utterance representing the category of businesses;searching, with one or more processors, a business listing for businesses within both the recognized category of businesses and the recognized geographic location to select businesses responsive to the query spoken by the user; andsending the user terminal at least some of the responsive businesses.2) The method of claim 1 , comprising selecting claim 1 , from a set of speech recognition language models for recognizing speech claim 1 , a subset of speech recognition language models claim 1 , wherein the subset of speech recognition language models is selected based on the recognized location or the recognized category of businesses.3) The method of claim 2 , wherein the set of speech recognition language models includes N-grams in which a probability of a word in a vocabulary is estimated by counting the occurrences of that word in the context of a last N words.4) ...

Подробнее
25-01-2018 дата публикации

Generation device, recognition system, and generation method for generating finite state transducer

Номер: US20180025723A1
Автор: Manabu Nagao
Принадлежит: Toshiba Corp

A generation device includes a receiving unit and a generating unit. The receiving unit receives a model representing correspondence between one or more phonetic symbols and one or more words. The generating unit generates a first finite state transducer based on the model, the first finite state transducer at least including, as outgoing transitions from a first state representing transition destination of a first transition which has a first phonetic symbol of a predetermined type as input symbol, a second transition that has a second phonetic symbol, which is different than a particular symbol representing part or whole of input symbol of the first transition, as input symbol, and a third transition that has a third phonetic symbol, which represents the particular symbol or silence, as input symbol.

Подробнее
23-01-2020 дата публикации

PROCESSING TEXT SEQUENCES USING NEURAL NETWORKS

Номер: US20200026765A1
Принадлежит:

A computer-implemented method for training a neural network that is configured to generate a score distribution over a set of multiple output positions. The neural network is configured to process a network input to generate a respective score distribution for each of a plurality of output positions including a respective score for each token in a predetermined set of tokens that includes n-grams of multiple different sizes. Example methods described herein provide trained neural networks which produce results with improved accuracy compared to the state of the art, e.g. translations that are more accurate compared to the state of the art, or more accurate speech recognition compared to the state of the art. 1. A computer-implemented method comprising: wherein the neural network is configured to receive a network input and to process the network input in accordance with a plurality of parameters of the neural network to generate a respective score distribution for each of a plurality of output positions,', 'wherein the respective score distribution for each of the output positions comprises a respective score for each token in a predetermined set of tokens,', 'wherein the predetermined set of tokens includes n-grams of multiple different sizes,', 'wherein, for each output position, the respective score for each of the tokens in the score distribution for the output position represents a likelihood that the token is a token at the output position in an output sequence for the network input, and', 'wherein the training data comprises a plurality of training inputs, and for each training input, a respective target output sequence comprising one or more words;, 'obtaining training data for training a neural network,'} processing the training input using the neural network in accordance with current values of the parameters of the neural network to generate a respective score distribution for each of a plurality of output positions;', 'sampling, from a plurality of ...

Подробнее
28-01-2021 дата публикации

Ambiguity resolution with dialogue search history

Номер: US20210027771A1
Принадлежит: Microsoft Technology Licensing LLC

A method comprising recognizing a user utterance including an ambiguity. The method further comprises using a previously-trained code-generation machine to produce, from the user utterance, a data-flow program including a search-history function. The search-history function is configured to select a highest-confidence disambiguating concept from one or more candidate concepts stored in a context-specific dialogue history.

Подробнее
28-01-2021 дата публикации

Unsupervised automated extraction of conversation structure from recorded conversations

Номер: US20210027772A1
Принадлежит:

A method for information processing includes completing, over a corpus of conversations, a conversation structure model including (i) a sequence of conversation parts having a defined order, and (ii) a probabilistic model defining each of the conversation parts. For a given conversation, a segmentation of the conversation is computed based on the computed conversation structure model. Action is taken on the given conversation according to the segmentation. 1. A method for information processing , the method comprising:computing, over a corpus of conversations, a conversation structure model comprising (i) a sequence of conversation parts having a defined order, and (ii) a probabilistic model defining each of the conversation parts;computing, for a given conversation, a segmentation of the conversation based on the computed conversation structure model; andacting on the given conversation according to the segmentation.2. The method according to , wherein. computing the probabilistic model comprises assigning a probability to an occurrence of each word. The method according to , wherein assigning the probability comprises running a Gibbs sampling process.4. The method according to claim 2 , wherein assigning the probability comprises using a prior probability distribution for one or more of the conversation parts.5. The method according to claim 1 , wherein computing the conversation structure model comprises pre-specifying a fixed number of the conversation parts.6. The method according to claim 1 , wherein computing the conversation structure model comprises selecting a subset of the conversations based on one or more business rules.7. The method according to claim 1 , wherein computing the segmentation of the conversation comprises finding the segmentation that best matches the conversation structure model.8. The method according to claim 1 , and comprising computing a coherence score claim 1 , which quantifies an extent of fit between the given conversation and ...

Подробнее
02-02-2017 дата публикации

Collaborative language model biasing

Номер: US20170032781A1
Принадлежит: Google LLC

Methods, including computer programs encoded on a computer storage medium, for collaborative language model biasing. In one aspect, a method includes receiving (i) data including a set of terms associated with a target user, and, (ii) from each of multiple other users, data including a set of terms associated with the other user, selecting a particular other user based at least on comparing the set of terms associated with the target user to the sets of terms associated with the other users, selecting one or more terms from the set of terms that is associated with the particular other user, obtaining, based on the selected terms that are associated with the particular other user, a biased language model, and providing the biased language model to an automated speech recognizer.

Подробнее
04-02-2016 дата публикации

Speech-Based Search Using Descriptive Features of Surrounding Objects

Номер: US20160035348A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

A natural language query arrangement is described for a mobile environment. An automatic speech recognition (ASR) engine can process an unknown speech input from a user to produce corresponding recognition text. A natural language understanding module can extract natural language concept information classifier uses the from the recognition text. A query recognition text and the natural language concept information to assign to the speech input a query intent related to one or more objects in the mobile environment. An environment database contains information descriptive of objects in the mobile environment. A query search engine searches the environment database based on the query intent, the natural language concept information, and the recognition text to determine corresponding search results, which can be to the user. 1. A method for processing natural language queries in a mobile environment employing at least one hardware implemented computer processor , the method comprising:extracting natural language concept information from text recognized by an automatic speech recognition (ASR) engine;using the text and the natural language concept information to assign to the speech input a query intent related to one or more objects in the mobile environment; andbased on the query intent, the natural language concept information, and the recognition text, searching an environment database containing information descriptive of the one or more objects in the mobile environment to determine corresponding search results.2. The method of further comprising processing an unknown speech input from a user with the ASR engine to produce the text.3. The method of wherein the mobile environment includes an environment in and around a vehicle containing the user.4. The method of to wherein the mobile environment includes an environment around a smartphone receiving the speech input and delivering the search results.5. The method of to wherein the natural language concept ...

Подробнее
30-01-2020 дата публикации

Systems and methods for multisensory semiotic communications

Номер: US20200034025A1
Принадлежит: Individual

The present disclosure relates generally to systems and methods for receiving visual and communication inputs, and generating, modifying, and outputting multisensory semiotic communications. The multisensory semiotic communications can include an avatar, a dynamic image, an expressed phrase, and a visual text. The multisensory semiotic communications can be modified based on one or more customization selections. The customization selections can include a gender selection, an age selection, an emotion selection, a race selection, a location selection, a nationality selection, and a language selection.

Подробнее
31-01-2019 дата публикации

METHOD AND APPARATUS FOR MEASURING ORAL READING RATE

Номер: US20190035300A1
Автор: Bernstein Jared C.
Принадлежит:

A method and apparatus for measuring oral reading rate are described. Oral reading rate represents how fast a reader can decipher printed text and correctly speak the written words. The rate can be a representation of speed only, or in the case of accurate oral reading rate, it can measure the rate of oral reading for only words read correctly. In the method described, a passage of text is subdivided into units. The duration of speaking each unit is measured and these durations are mapped to values that are used in a polytomous probabilistic model. Through a series of steps using this model, a new value of each reader's reading rate called the Unified Reading Rate (URR) is estimated. An apparatus that measures the URR using the above method is also described. This apparatus is software that runs on one or more computational devices and includes a speech recognition engine as well as other components that produce the URR. 1. A method , comprising:measuring one or more durations it takes a reader to speak one or more units of a passage of text presented to the reader;performing a mathematical operation that adjusts the measured one or more durations using one or more baseline duration values that are estimates of durations it would take representative good readers to say the one or more units;mapping the adjusted values onto a scale through a transformation process;using the mapped adjusted value as input scores to a probabilistic model; andgenerating an estimate of the reader's unified reading rate using the probabilistic model.2. An apparatus for measuring oral reading rate , comprising:a speech input interface configured to accept speech signals and apply the speech signals as inputs to a speech recognition engine and speech processing component for automatically determining text units that a reader spoke, and automatically measuring a duration it took the reader to speak each of the text units;an adjustment computation component for receiving and adjusting the ...

Подробнее
31-01-2019 дата публикации

HIERARCHICAL SPEECH RECOGNITION DECODER

Номер: US20190035389A1
Принадлежит:

A speech interpretation module interprets the audio of user utterances as sequences of words. To do so, the speech interpretation module parameterizes a literal corpus of expressions by identifying portions of the expressions that correspond to known concepts, and generates a parameterized statistical model from the resulting parameterized corpus. When speech is received the speech interpretation module uses a hierarchical speech recognition decoder that uses both the parameterized statistical model and language sub-models that specify how to recognize a sequence of words. The separation of the language sub-models from the statistical model beneficially reduces the size of the literal corpus needed for training, reduces the size of the resulting model, provides more fine-grained interpretation of concepts, and improves computational efficiency by allowing run-time incorporation of the language sub-models. 1. A computer-implemented method , comprising:receiving, over a computer network, an utterance of a user, the utterance having been accepted from the user at a client device as spoken input;storing the utterance, the storing comprising identifying a plurality of sub-expressions by applying a parameterized statistical model that determines likely n-grams of literal word tokens and concept placeholders included in the utterance and storing each of the sub-expressions in the data structure as either: a set of literal word tokens representing the sub-expression, or a concept placeholder representing the sub-expression and providing an indication of a language sub-model;determining likely textual representations of the sub-expressions stored as concept placeholders by applying the indicated language sub-models to the sub-expressions;generating a user-specific textual interpretation of the utterance, the textual interpretation being a combination of the literal word tokens and the determined likely textual representations of the sub-expressions.2. The computer- ...

Подробнее
30-01-2020 дата публикации

AUGMENTED GENERALIZED DEEP LEARNING WITH SPECIAL VOCABULARY

Номер: US20200035219A1
Принадлежит:

Systems and methods are disclosed for customizing a neural network for a custom dataset, when the neural network has been trained on data from a general dataset. The neural network may comprise an output layer including one or more nodes corresponding to candidate outputs. The values of the nodes in the output layer may correspond to a probability that the candidate output is the correct output for an input. The values of the nodes in the output layer may be adjusted for higher performance when the neural network is used to process data from a custom dataset.

Подробнее
04-02-2021 дата публикации

NEURAL NETWORK MODEL WITH EVIDENCE EXTRACTION

Номер: US20210034813A1
Принадлежит: 3M INNOVATIVE PROPERTIES COMPANY

Aspects of the present disclosure relate to a system. The system includes one or more computers having processing circuitry and a memory storing instructions which, when executed by the processing circuitry, cause the processing circuitry to perform operations including determining a labeled classification from a collection of documents corresponding to an encounter. The collection of documents comprises a first plurality of n-grams. The operations also include determining an evidence score for an n-gram based on contribution of the n-gram to the labeled classification, ranking at least some of the first plurality of n-grams based on the evidence score for each n-gram, selecting an n-gram from the first plurality of n-grams as an explanation evidence based on the ranking and performing at least one operation in response to selecting the n-gram. 1. A system comprising:one or more computers, comprising:a processing circuitry; and determining, using the processing circuitry operating a neural network model, a labeled classification from a collection of documents corresponding to an encounter, the collection of documents comprises a first plurality of n-grams;', 'determining an evidence score for an n-gram based on contribution of the n-gram to the labeled classification;', 'ranking at least some of the first plurality of n-grams based on the evidence score for each n-gram;', 'selecting an n-gram from the first plurality of n-grams as an explanation evidence based on the ranking; and', 'performing at least one operation in response to selecting the n-gram as the explanation evidence., 'a memory storing instructions which, when executed by the processing circuitry, cause the processing circuitry to perform operations comprising2. The system of claim 1 , further comprising a display claim 1 , wherein performing at least one operation comprises presenting the explanation evidence on the display.3. The system of claim 2 , wherein the display is on a client device.4. The ...

Подробнее
04-02-2021 дата публикации

DEEP LEARNING INTERNAL STATE INDEX-BASED SEARCH AND CLASSIFICATION

Номер: US20210035565A1
Принадлежит:

Systems and methods are disclosed for generating internal state representations of a neural network during processing and using the internal state representations for classification or search. In some embodiments, the internal state representations are generated from the output activation functions of a subset of nodes of the neural network. The internal state representations may be used for classification by training a classification model using internal state representations and corresponding classifications. The internal state representations may be used for search, by producing a search feature from an search input and comparing the search feature with one or more feature representations to find the feature representation with the highest degree of similarity. 1. A system comprising one or more processors , and a non-transitory computer-readable medium including one or more sequences of instructions that , when executed by the one or more processors , cause the system to perform operations comprising:providing a trained speech recognition neural network, the speech recognition neural network including a plurality of layers each having a plurality of nodes;transcribing speech audio by the speech recognition neural network;generating one or more feature representations from a subset of the nodes;receiving a first set of classifications for a first portion of the speech audio;providing a trained a classification model, the classification model trained on a first set of feature representations corresponding to the first portion of the speech audio and the first set of classifications; anddetermining a second set of classifications for a second portion of the speech audio by inputting a second set of feature representations corresponding to the second portion of the speech audio into the trained classification model, the second set of feature representations comprising a second subset of the feature representations generated during the speech audio transcription.2. ...

Подробнее
04-02-2021 дата публикации

DYNAMIC INTERPOLATION FOR HYBRID LANGUAGE MODELS

Номер: US20210035569A1
Принадлежит: SoundHound, Inc.

In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context. 1. A method of speech transcription , the method comprising:computing a transcription hypothesis from a sequence of phonemes;computing, according to a first model, a first model score for the transcription;computing, according to a second model, a second model score for the transcription;computing a hybrid score by interpolation between the first model score and second model score using interpolation weights, where the interpolation weights are in dependence upon a dynamic variable.2. The method of claim 1 , wherein the dynamic variable is conditioned on the content of the transcription.3. The method of claim 2 , wherein the conditioning is based on word presence.4. The method of claim 2 , wherein the conditioning is based on semantic information.5. The method of claim 1 , further comprising computing a second transcription hypothesis from the sequence of phonemes wherein the dynamic variable depends on the content of the second hypothesized transcription.6. The method of claim 1 , wherein the first model is an n-gram model and the second model is a neural network.7. The method of claim 1 , wherein the interpolation weights are generated using rule-based logic.8. The method of claim 1 , wherein the interpolation weights are generated using a neural network.9. The method of claim 1 , wherein the first model score and second ...

Подробнее
08-02-2018 дата публикации

System and method for speech-enabled access to media content by a ranked normalized weighted graph using speech recognition

Номер: US20180039481A1
Принадлежит: Nuance Communications Inc

Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information. The information can be a list of actors, directors, composers, titles, and/or locations. The graph that models how the media are interconnected can further model pieces of common information between two or more media. The method can further cause the computing device to weight the graph based on the retrieved information, wherein the weighted graph is further normalized weighted graph to help with speech query searching of media content using speech recognition. The graph can further model relative popularity information in the list. The method can rank information based on a PageRank algorithm.

Подробнее
11-02-2016 дата публикации

SYSTEM AND METHOD FOR ROBUST ACCESS AND ENTRY TO LARGE STRUCTURED DATA USING VOICE FORM-FILLING

Номер: US20160042732A1
Принадлежит:

A method, apparatus and machine-readable medium are provided. A phonotactic grammar is utilized to perform speech recognition on received speech and to generate a phoneme lattice. A document shortlist is generated based on using the phoneme lattice to query an index. A grammar is generated from the document shortlist. Data for each of at least one input field is identified based on the received speech and the generated grammar. 2. The method of claim 1 , wherein the insignificant units comprise silence and filler words.3. The method of claim 1 , wherein the index is generated based on a plurality of training phoneme lattices and factors of interest from valid entries in a database claim 1 , wherein the factors of interest comprise trigrams.4. The method of claim 3 , wherein the factors of interest further comprise N-grams based on the valid entries in the database.5. The method of claim 1 , further comprising using the shortlist of recognized speech possibilities for automatic speech recognition.6. The method of claim 5 , wherein the automatic speech recognition is further performed using a grammar.7. The method of claim 6 , wherein the grammar is an N-gram phonotactic grammar.8. The method of claim 7 , wherein the N-gram phonotactic grammar is unsmoothed claim 7 , recognizing only N-grams which have been seen in data used to train the N-gram phonotactic grammar.9. A system comprising:a processor; and 'identifying an index of words and a phone lattice;', 'a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising normalizing costs in the revised phone lattice such that a cost of a best path is set to zero;', 'generating a cost-normalized query using factors of interest, wherein the index of words is indexed by the factors of interest; and', 'generating a shortlist of recognized speech possibilities using the revised phone lattice, the index of words, and indices contained ...

Подробнее
24-02-2022 дата публикации

COMPUTERIZED DIALOG SYSTEM IMPROVEMENTS BASED ON CONVERSATION DATA

Номер: US20220059097A1
Принадлежит:

The computer receives a group of conversation data associated with the escalation node, identifies agent responses in the conversation data, and clusters them into agent response types. The computer identifies dialog state feature value sets for the conversations. The computer identifies feature value set associations with response types, and generates, Boolean expressions representing the feature value sets associated with each of the response types. The computer makes a recommendation to add to at least one child node for the escalation node, with the child node corresponding to one of the response types. The child node has, as an entry condition, the Boolean expression for the response type to which the child node corresponds. The child node has as an action, which according to some aspects, provides a response representative of the cluster of agent responses for the response type to which the child node corresponds. 1. A computer implemented method to modify a dialog system execution graph , comprising:receiving, by said computer, an execution graph including an escalation node;receiving, by said computer, conversation data for conversations associated with said escalation node;identifying, by said computer, agent responses in said conversation data and clustering said agent responses into response types;determining, by said computer, dialog state feature value sets for said conversations at said escalation node;identifying, by said computer, feature value set associations with said response types;generating, by said computer, for each response type, a Boolean expression representing the associated feature value sets; andmaking a recommendation, by said computer, to add to said execution graph at least one child node for said escalation node, said child node corresponding to one of said response types, wherein said at least one child node has, as an entry condition, the Boolean expression for the response type to which the child node corresponds, and wherein ...

Подробнее
07-02-2019 дата публикации

TIME CAPSULE BASED SPEAKING AID

Номер: US20190043490A1
Принадлежит:

A system, apparatus, method, and computer program product for a speaking aid. The system including network interface circuitry to receive speech input from a user. The speech input includes a partial sentence with a missing word or the partial sentence with a stuttered word. The system also includes a processor coupled to the network interface circuitry and one or more memory devices coupled to the processor. The one or more memory devices include instructions, that when executed by the processor, cause the system to detect a stutter or pause in the speech input, predict the stuttered word or the missing word, present a predicted word from an n-best list to the user; and if a prompt is received from the user, present a next word from the n-best list until the user speaks a correct word to replace the stutter or the pause. 1. A speaking aid system comprising:network interface circuitry to receive speech input from a user, the speech input including a partial sentence with a missing word or the partial sentence with a stuttered word;a processor coupled to the network interface circuitry;one or more memory devices coupled to the processor, the one or more memory devices including instructions, which when executed by the processor, cause the system to:detect a stutter or pause in the speech input;predict the stuttered word or the missing word;present a predicted word from an n-best list to the user; andif a prompt is received from the user, present a next word from the n-best list until the user speaks a correct word to replace the stutter or the pause.2. The system of claim 1 , wherein instructions to present a predicted word from an n-best list to the user comprises instructions to whisper the predicted word into an earpiece of the user.3. The system of claim 1 , wherein instructions to present a predicted word from an n-best list to the user comprises instructions to display the predicted word on a display.4. The system of claim 1 , wherein instructions to predict ...

Подробнее
06-02-2020 дата публикации

MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS

Номер: US20200043483A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided. 1. A method performed by one or more computers of a speech recognition system , the method comprising:receiving, by the one or more computers, audio data indicating acoustic characteristics of an utterance;generating, by the one or more computers, a sequence of feature vectors indicative of the acoustic characteristics of the utterance;processing, by the one or more computers, the sequence of feature vectors using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model comprising an encoder, an attention module, and a decoder, wherein the encoder and decoder each comprise one or more recurrent neural network layers;obtaining, by the one or more computers as a result of the processing with the speech recognition model, a sequence of output vectors representing distributions over a predetermined set of linguistic units;determining, by the one or more computers, a transcription for the utterance based on the ...

Подробнее
03-03-2022 дата публикации

CONTENT GENERATION SYSTEM AND METHOD

Номер: US20220062770A1
Принадлежит: Sony Interactive Entertainment Inc.

A content generation system, the system comprising an input obtaining unit operable to obtain one or more samples of input text and/or audio relating to a first content, an input analysis unit operable to generate n-grams representing one or more elements of the obtained inputs, a representation generating unit operable to generate a visual representation of one or more of the generated n-grams, and a display generation unit operable to generate second content comprising one or more elements of the visual representation in association with the first content.

Подробнее
03-03-2022 дата публикации

USER MEDIATION FOR HOTWORD/KEYWORD DETECTION

Номер: US20220068268A1
Принадлежит:

Techniques are described herein for improving performance of machine learning model(s) and thresholds utilized in determining whether automated assistant function(s) are to be initiated. A method includes: receiving, via one or more microphones of a client device, audio data that captures a spoken utterance of a user; processing the audio data using a machine learning model to generate a predicted output that indicates a probability of one or more hotwords being present in the audio data; determining that the predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in the audio data than is a primary threshold; in response to determining that the predicted output satisfies the secondary threshold, prompting the user to indicate whether or not the spoken utterance includes a hotword; receiving, from the user, a response to the prompting; and adjusting the primary threshold based on the response.

Подробнее
03-03-2022 дата публикации

ADAPTIVE BATCHING TO REDUCE RECOGNITION LATENCY

Номер: US20220068269A1
Принадлежит:

Embodiments may include collection of a first batch of acoustic feature frames of an audio signal, the number of acoustic feature frames of the first batch equal to a first batch size, input of the first batch to a speech recognition network, collection, in response to detection of a word hypothesis output by the speech recognition network, of a second batch of acoustic feature frames of the audio signal, the number of acoustic feature frames of the second batch equal to a second batch size greater than the first batch size, and input of the second batch to the speech recognition network.

Подробнее
14-02-2019 дата публикации

METHOD FOR PROCESSING A RECOGNITION RESULT OF AN AUTOMATIC ONLINE SPEECH RECOGNIZER FOR A MOBILE END DEVICE AS WELL AS COMMUNICATION EXCHANGE DEVICE

Номер: US20190051295A1
Автор: VOIGT Christoph
Принадлежит: Audi AG

A method for processing a recognition result of an automatic online speech recognizer for a mobile end device by a communication exchange device, wherein the recognition result for a phrase spoken by a user is received from the online speech recognizer as a text. A language model of permitted phrases is received from the mobile end device. A specification of meaning relating to a meaning of the phrase is assigned to each permitted phrase by the language model, and, through a decision-making logic of the communication exchange device, the text of the recognition result is compared with the permitted phrases defined by the language model and, for a matching permitted phrase in accordance with a predetermined matching criterion, the specification of meaning thereof is determined and the specification of meaning is provided to the mobile end device. 1. A method for processing a recognition result of an automatic online speech recognizer for a mobile end device by a communication exchange device , wherein the recognition result for a phrase spoken by a user is received from the online speech recognizer as a text , comprising:receiving a language model of permitted phrases from the mobile end device, wherein a specification of meaning relating to a meaning of the phrase is assigned to each permitted phrase by the language model, and, through a decision-making logic of the communication exchange device, the text of the recognition result is compared with the permitted phrases defined by the language model, and, for a matching permitted phrase in accordance with a predetermined matching criterion, the specification of meaning thereof is determined, and the specification of meaning is provided to the mobile end device.2. The method according to claim 1 , wherein the comparison of the text of the recognition result with the permitted phrases is made by a 1-to-1 comparison.3. The method according to claim 1 , wherein claim 1 , in the comparison of the text of the recognition ...

Подробнее
25-02-2021 дата публикации

DATA-DRIVEN AND RULE-BASED SPEECH RECOGNITION OUTPUT ENHANCEMENT

Номер: US20210056956A1
Принадлежит:

According to some embodiments, a multi-layer speech recognition transcript post processing system may include a data-driven, statistical layer associated with a trained automatic speech recognition model that selects an initial transcript. A rule-based layer may receive the initial transcript from the data-driven, statistical layer and execute at least one pre-determined rule to generate a first modified transcript. A machine learning approach layer may receive the first modified transcript from the rule-based layer and perform a neural model inference to create a second modified transcript. A human editor layer may receive the second modified transcript from the machine learning approach layer along with an adjustment from at least one human editor. The adjustment may create, in some embodiments, a final transcript that may be used to fine-tune the data-driven, statistical layer. 1. A multi-layer speech recognition transcript post processing system , comprising:a data-driven, statistical layer associated with a trained automatic speech recognition model that selects an initial transcript;a rule-based layer that receives the initial transcript from the data-driven, statistical layer and executes at least one pre-determined rule to generate a first modified transcript; anda machine learning approach layer that receives the first modified transcript from the rule-based layer and performs a neural model inference to create a second modified transcript.2. The system of claim 1 , wherein the data-driven claim 1 , statistical layer selects a best initial transcript from a set of N most probable speech recognition transcripts.3. The system of claim 2 , wherein the selection of the best initial transcript is augmented by external attention comprising multiple text documents.4. The system of claim 1 , wherein the pre-determined rule is associated with at least one of: (i) a white list claim 1 , (ii) a black list claim 1 , and (iii) a rule approach.5. The system of claim 4 , ...

Подробнее
22-02-2018 дата публикации

LANGUAGE MODELS USING DOMAIN-SPECIFIC MODEL COMPONENTS

Номер: US20180053502A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system. 1. A method performed by one or more computers , the method comprising:obtaining context data for an utterance, the context data indicating (i) a linguistic context that includes one or more words preceding the utterance, and (ii) a non-linguistic context;selecting, from among multiple domain-specific model components of a language model, a domain-specific model component based on the non-linguistic context of the utterance;generating a score for a candidate transcription for the utterance using the language model, the score being generated using (i) the selected domain-specific model component, and (ii) a baseline model component of the language model that is domain-independent;determining a transcription for the utterance using the score; andproviding the transcription as output of an automated speech recognition system.2. The method of claim 1 , wherein the domain-specific model components each correspond to a different domain in a set of multiple domains claim 1 , and the baseline model does not correspond to any of the multiple domains.3. The method of claim 1 , wherein the baseline model component is configured to provide a language model score independent of non-linguistic context information.4. The method of claim 1 , ...

Подробнее
13-02-2020 дата публикации

DETERMINISTIC MULTI-LENGTH SLIDING WINDOW PROTOCOL FOR CONTIGUOUS STRING ENTITY

Номер: US20200051552A1
Принадлежит:

A system for extracting verifiable entities from a user-utterance received on an automated calling service is provided. The system may include a receiver configured to receive a user-utterance, a processor and a non-transitory computer-readable media comprising computer-executable instructions. The processor may be configured to execute the instructions which, canonicalize the user-utterance into a plurality of tokens, determine the number of tokens of the user-utterance, and generate, using a sliding-window protocol, a comprehensive number of n-gram sequences from the user-utterance. The processor may be configured to process a plurality of threads of execution that may include a series of actions executed on the n-gram sequences to identify and extract verified entities from the user-utterance. 1. A method for extracting verifiable entities from a user-utterance received on an automated calling service , the method comprising:receiving a user-utterance;canonicalizing the user-utterance into a plurality of tokens;determining the number of tokens of the user-utterance;generating, using a sliding-window protocol, a comprehensive number of n-gram sequences from the user-utterance, the number of n-gram sequences equal to the number of determined tokens, each n-gram sequence including a window-size equal to a value of n in the n-gram sequence;retrieving a first n-gram from each n-gram sequence; [ 'in the event that the first n-gram or the subsequent n-gram is not determined to be verifiable, the method further comprising, retrieving the subsequent n-gram from each n-gram sequence and repeating the series of actions for the subsequent n-gram;', 'determining the first n-gram or a subsequent n-gram to be verifiable, said verifiable verifying the n-gram including a noun;'}, 'in the event that an entity-verifier associated with the verified n-gram is not found in the database, the method further comprising, retrieving the subsequent n-gram from each n-gram sequence and ...

Подробнее
21-02-2019 дата публикации

Multi-modal input on an electronic device

Номер: US20190056909A1
Принадлежит: Google LLC

A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application.

Подробнее
20-02-2020 дата публикации

METHOD AND DEVICE FOR UPDATING LANGUAGE MODEL AND PERFORMING SPEECH RECOGNITION BASED ON LANGUAGE MODEL

Номер: US20200058294A1
Принадлежит:

A method of updating a grammar model used during speech recognition includes obtaining a corpus including at least one word, obtaining the at least one word from the corpus, splitting the at least one obtained word into at least one segment, generating a hint for recombining the at least one segment into the at least one word, and updating the grammar model by using at least one segment comprising the hint. 114-. (canceled)15. A method of recognizing a voice signal , the method comprising:obtaining a voice signal;applying a first voice recognition model to the obtained voice signal;based on at least the applying of the first voice recognition model, determining whether the obtained voice signal includes a signal corresponding to at least one word from among a plurality of predefined words;selecting at least one model from among a plurality of second voice recognition models, based on at least the determining that the obtained voice signal includes the signal corresponding to the at least one word, each of the plurality of the second voice recognition models corresponding to each of the plurality of predefined words; andproviding a voice recognition result based on the selected second voice recognition model.16. The method of claim 15 , wherein each of the plurality of predefined words claim 15 , which is obtained based on the applying of the first voice recognition model claim 15 , is a word including an entry token.17. The method of claim 16 , wherein the voice signal is recognized in a class recognition mode converted from a general mode claim 16 , based on the determining that the obtained voice signal includes the signal corresponding to the predefined word including the entry token claim 16 ,in the general mode, the voice signal is recognized based on the first voice recognition model, andin the class recognition mode, the voice signal is recognized based on the second voice recognition model.18. The method of claim 17 , wherein the voice signal is recognized ...

Подробнее
20-02-2020 дата публикации

ELECTRONIC DEVICE AND CONTROL METHOD THEREFOR

Номер: US20200058298A1
Принадлежит:

Provided are an electronic device and a control method. The electronic device comprises: a storage unit for storing a user-based dictionary; an input unit for receiving an input sentence including a user-specific word and at least one word learned by a neural network-based language model; and a processor for determining a concept category of the user-specific word on the basis of semantic information of the input sentence, adding the user-specific word to the user-based dictionary to perform update, and when text corresponding to semantic information of the at least one learned word is input, providing the user-specific word as an autocomplete recommendation word which can be input subsequent to the text. 1. An electronic device for supporting a personalization service , the electronic device comprising:a storage storing a user-based dictionary;an inputter configured to receive an input of a sentence comprising a user-specific word and at least one word learned by a neural network-based language model; and identify a concept category of the user-specific word based on semantic information of the input sentence when the user-specific word is not included in the neural network-based language model,', 'add the user-specific word to the user-based dictionary to update the user-based dictionary, and', 'provide the user-specific word as an automatic completion recommendation word that is input after the text when a text corresponding to the semantic information of the at least one learned word is input., 'a processor configured to2. The electronic device as claimed in claim 1 , wherein the processor is further configured to provide the user specific word along with at least one word recommended from the neural network-based language model as the automatic completion recommendation word that is input after the text.3. The electronic device as claimed in claim 2 , wherein the processor is further configured to provide an upper number of words having specified priority as ...

Подробнее
02-03-2017 дата публикации

BUILDING OF N-GRAM LANGUAGE MODEL FOR AUTOMATIC SPEECH RECOGNITION (ASR)

Номер: US20170061960A1
Принадлежит:

A method, a system, and a computer program product for building an n-gram language model for an automatic speech recognition. The method includes reading training text data and additional text data both for the n-gram language model from a storage, and building the n-gram language model by a smoothing algorithm having discount parameters for n-gram counts. The additional text data includes plural sentences having at least one target keyword. Each discount parameter for each target keyword is tuned using development data which are different from the additional text data so that a predetermined balance between precision and recall is achieved. 1. A computer-implemented method for building an n-gram language model for an automatic speech recognition , comprising:reading training text data and additional text data for the n-gram language model from storage, wherein the additional text data comprises a plurality of sentences having at least one target keyword; andbuilding the n-gram language model by a smoothing algorithm having discount parameters for n-gram counts, wherein each discount parameter for each target keyword is tuned using development data which are different from the additional text data so that a predetermined balance between precision and recall is achieved.2. The method according to claim 1 , wherein building the n-gram language model comprises:counting the n-gram counts using the training text data and the additional text data; anddiscounting the n-gram counts by the smoothing algorithm.3. The method according to claim 1 , wherein the discount parameter for the target keyword is tuned to be larger to decrease erroneous detection of the target keyword claim 1 , which erroneous detection occurred in a spoken term detection done by using the development data.4. The method according to claim 1 , wherein the additional text data further comprises at least one sentence having at least one word which was erroneously detected as the target keyword by the ...

Подробнее
04-03-2021 дата публикации

Electronic device and method for providing conversational service

Номер: US20210065705A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

A method, performed by an electronic device, of providing a conversational service includes: receiving an utterance input; identifying a temporal expression representing a time in a text obtained from the utterance input; determining a time point related to the utterance input based on the temporal expression; selecting a database corresponding to the determined time point from among a plurality of databases storing information about a conversation history of a user using the conversational service; interpreting the text based on information about the conversation history of the user, the conversation history information being acquired from the selected database; generating a response message to the utterance input based on a result of the interpreting; and outputting the generated response message.

Подробнее
08-03-2018 дата публикации

Apparatus and method for training a neural network language model, speech recognition apparatus and method

Номер: US20180068652A1
Принадлежит: Toshiba Corp

According to one embodiment, an apparatus trains a neural network language model. The apparatus includes a calculating unit and a training unit. The calculating unit calculates probabilities of n-gram entries based on a training corpus. The training unit trains the neural network language model based on the n-gram entries and the probabilities of the n-gram entries.

Подробнее
27-02-2020 дата публикации

Unsupervised Learning of Interpretable Conversation Models from Conversation Logs

Номер: US20200066255A1
Принадлежит:

Methods, systems, and computer program products for unsupervised learning of interpretable conversation models from conversation logs are provided herein. A computer-implemented method includes obtaining human-to-human conversation logs; training a deep learning model by (i) learning, in an unsupervised manner, semantic labels for dialog contexts in the multiple human-to-human conversation logs, (ii) mapping the learned semantic labels to query responses across the multiple human-to-human conversation logs, and (iii) inferring one or more entities from the multiple conversation logs based at least in part on the mapping; constructing a human-interpretable conversation model based at least in part on patterns determined via the trained deep learning model; and outputting the human-interpretable conversation model to at least one user. 1. A computer-implemented method , the method comprising steps of:obtaining multiple human-to-human conversation logs;training a deep learning model by (i) learning, in an unsupervised manner, semantic labels for dialog contexts in the multiple human-to-human conversation logs, (ii) mapping the learned semantic labels to query responses across the multiple human-to-human conversation logs, and (iii) inferring one or more entities from the multiple conversation logs based at least in part on the mapping;constructing a human-interpretable conversation model based at least in part on one or more patterns determined via the trained deep learning model; andoutputting the human-interpretable conversation model to at least one user;wherein the steps are carried out by at least one computing device.2. The computer-implemented method of claim 1 , wherein the semantic labels comprise one or more intents.3. The computer-implemented method of claim 1 , wherein said training the deep learning model comprises determining probability distributions for transitioning between instances of the learned semantic labels.4. The computer-implemented method of ...

Подробнее
27-02-2020 дата публикации

SESSION INFORMATION PROCESSING METHOD AND DEVICE AND STORAGE MEDIUM

Номер: US20200066262A1
Автор: LIN Fen, SHU Yue
Принадлежит:

This application discloses a session information processing method and device, and a storage medium. The method includes: extracting a to-be-analyzed sentence and preceding sentences of the to-be-analyzed sentence from a session; performing word segmentation on the to-be-analyzed sentence and the preceding sentences, to obtain a first feature set including a plurality of first features; extracting a second feature set including one or more second features from a first word set corresponding to the to-be-analyzed sentence and a second word set corresponding to the preceding sentences, one second feature including a phrase or sentence including a first word and a second word, the first word being one or more words in the first word set, and the second word being one or more words in the second word set; and determining, according to the first feature set and the second feature set, a sentence category to which the to-be-analyzed sentence belongs. 1. A session information processing method performed at a computing device having one or more processors and memory storing programs to be executed by the one or more processors , the method comprising:extracting a to-be-analyzed sentence and a preset quantity of preceding sentences of the to-be-analyzed sentence from a session;performing word segmentation on the to-be-analyzed sentence and the preset quantity of preceding sentences, to obtain a first feature set comprising a plurality of first features;extracting a second feature set comprising one or more second features from a first word set corresponding to the to-be-analyzed sentence and a second word set corresponding to the preset quantity of preceding sentences, one second feature comprising a phrase or sentence comprising a first word and a second word, the first word being one or more words in the first word set, and the second word being one or more words in the second word set; anddetermining, according to the first feature set and the second feature set, a ...

Подробнее
11-03-2021 дата публикации

SUPPORT FOR GRAMMAR INFLECTIONS WITHIN A SOFTWARE DEVELOPMENT FRAMEWORK

Номер: US20210073333A1
Принадлежит:

A natural language understanding server includes grammars specified in a modified extended Backus-Naur form (MEBNF) that includes an agglutination metasymbol not supported by conventional EBNF grammar parsers, as well as an agglutination preprocessor. The agglutination preprocessor applies one or more sets of agglutination rewrite rules to the MEBNF grammars, transforming them to EBNF grammars that can be processed by conventional EBNF grammar parsers. Permitting grammars to be specified in MEBNF form greatly simplifies the authoring and maintenance of grammars supporting inflected forms of words in the languages described by the grammars. 1. A computer-implemented method of transforming modified Extended Backus-Naur Form (MEBNF) phrase grammars , the computer-implemented method comprising:obtaining a MEBNF grammar whose rules contain at least one agglutination metasymbol distinct from standard EBNF metasymbols;storing a plurality of agglutination rewrite rules, each agglutination rewrite rule when applied to a MEBNF expression producing a transformed MEBNF expression, while preserving the language generated by the MEBNF grammar; removes an agglutination metasymbol from the grammar rule; and', 'adds zero or more agglutination metasymbols to the grammar rule,, 'transforming the MEBNF grammar to an equivalent EBNF grammar by applying the agglutination rewrite rules to rules of the MEBNF grammar one or more times, wherein each application of an agglutination rewrite rule to a rule of the MEBNF grammarwherein the rules of the transformed MEBNF grammar no longer contain an agglutination metasymbol.2. The computer-implemented method of claim 1 , wherein the agglutination rewrite rules specify that the agglutination of two terminal strings is their string concatenation.3. The computer-implemented method of claim 1 , wherein all instances of an agglutination metasymbol in a rewrite rule are followed by a terminal suffix or preceded by a terminal prefix.4. The computer- ...

Подробнее
11-03-2021 дата публикации

DETERMINING STATE OF AUTOMATED ASSISTANT DIALOG

Номер: US20210074279A1
Принадлежит:

Determining a dialog state of an electronic dialog that includes an automated assistant and at least one user, and performing action(s) based on the determined dialog state. The dialog state can be represented as one or more slots and, for each of the slots, one or more candidate values for the slot and a corresponding score (e.g., a probability) for each of the candidate values. Candidate values for a slot can be determined based on language processing of user utterance(s) and/or system utterance(s) during the dialog. In generating scores for candidate value(s) of a given slot at a given turn of an electronic dialog, various features are determined based on processing of the user utterance and the system utterance using a memory network. The various generated features can be processed using a scoring model to generate scores for candidate value(s) of the given slot at the given turn. 1. A method implemented by one or more processors , comprising:identifying a conversation context of an electronic dialog that includes an automated assistant and a user, the conversation context based at least in part on a system utterance of the automated assistant, and a user utterance of the user, the system utterance and the user utterance provided during a turn of the electronic dialog;determining, based on the conversation context, one or more candidate values for a slot;identifying a textual descriptor, for the slot, that describes the parameters that can be defined by the candidate values for the slot; one or more representations for the system utterance and the user utterance, and', 'candidate value features for each of the candidate values for the slot, wherein generating the candidate value features for each of the candidate values for the slot comprises processing the textual descriptor, for the slot, using one or more of the memory networks;, 'generating, based on processing the conversation context using one or more memory networksgenerating, based on processing the one ...

Подробнее
11-03-2021 дата публикации

AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE RECEIVED VIA AN AUTOMATED ASSISTANT INTERFACE

Номер: US20210074280A1
Принадлежит:

Determining a language for speech recognition of a spoken utterance received via an automated assistant interface for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Implementations determine a user profile that corresponds to audio data that captures a spoken utterance, and utilize language(s), and optionally corresponding probabilities, assigned to the user profile in determining a language for speech recognition of the spoken utterance. Some implementations select only a subset of languages, assigned to the user profile, to utilize in speech recognition of a given spoken utterance of the user. Some implementations perform speech recognition in each of multiple languages assigned to the user profile, and utilize criteria to select only one of the speech recognitions as appropriate for generating and providing content that is responsive to the spoken utterance. 1. A method implemented by one or more processors , the method comprising:processing audio data, wherein the audio data is based on detection of spoken input of a user at a client device, the client device including an automated assistant interface for interacting with the automated assistant;determining, based on processing of the audio data, that at least a portion of the audio data matches a user profile accessible to the automated assistant;identifying at least one probabilistic metric assigned to the user profile and corresponding to a particular speech recognition model, for a particular language; and selecting the particular speech recognition model, for the particular language, for processing the audio data, and', 'processing the audio data, using the particular speech recognition model for to the particular language, to generate text, in the particular language, that corresponds to the spoken input; and, 'based on the at least ...

Подробнее
17-03-2016 дата публикации

METHOD AND APPARATUS FOR DISCOVERING TRENDING TERMS IN SPEECH REQUESTS

Номер: US20160078860A1
Принадлежит:

Systems and processes are disclosed for discovering trending terms in automatic speech recognition. Candidate terms (e.g., words, phrases, etc.) not yet found in a speech recognizer vocabulary or having low language model probability can be identified based on trending usage in a variety of electronic data sources (e.g., social network feeds, news sources, search queries, etc.). When candidate terms are identified, archives of live or recent speech traffic can be searched to determine whether users are uttering the candidate terms in dictation or speech requests. Such searching can be done using open vocabulary spoken term detection to find phonetic matches in the audio archives. As the candidate terms are found in the speech traffic, notifications can be generated that identify the candidate terms, provide relevant usage statistics, identify the context in which the terms are used, and the like. 1. A method for discovering trending terms in automatic speech recognition , the method comprising: identifying a candidate term based on a frequency of occurrence of the term in an electronic data source;', 'in response to identifying the candidate term, searching for the candidate term in an archive of speech traffic of an automatic speech recognizer using phonetic matching; and', 'in response to finding the candidate term in the archive, generating a notification comprising the candidate term., 'at an electronic device having a processor and memory2. The method of claim 1 , wherein identifying the candidate term comprises:identifying one or more terms in the electronic data source;determining a frequency of occurrence of the one or more terms in the electronic data source; andselecting the candidate term based on the determined frequency of occurrence of the one or more terms.3. The method of claim 2 , wherein selecting the candidate term comprises selecting from the one or more terms a term having a highest frequency of occurrence in the electronic data source.4. The ...

Подробнее
24-03-2022 дата публикации

Contextual sentence embeddings for natural language processing applications

Номер: US20220093088A1
Принадлежит: Apple Inc

Methods and systems for embedding natural language sentences within a highly-dimensional vector space are provided. Additionally, various applications relating to natural language processing, are provided. Such applications include digital assistants and search engines, as well as systems for classifying, sorting, organizing, and/or pairing content that are associated with natural language objects. The sentence vector embeddings encode various semantic features of the sentence. Two separate language models, arranged in a serial architecture are employed to generate a sentence vector. The first language model generates token vectors for each of the tokens included in the sentence. The token vectors are employed as inputs to the second language model. The second language model generates the sentence vector for the sentence. A sentence vector embeds the semantic context of the corresponding natural language object within the vector space. The second language model may be trained via supervised learning on multiple semantic-related tasks.

Подробнее
24-03-2022 дата публикации

MODEL CONSTRUCTING METHOD FOR AUDIO RECOGNITION

Номер: US20220093089A1
Принадлежит:

A model constructing method for audio recognition is provided. In the method, audio data is obtained. A predicted result of the audio data is determined by using the classification model which is trained by machine learning algorithm. The predicted result includes a label defined by the classification model. A prompt message is provided according to a loss level of the predicted result. The loss level is related to a difference between the predicted result and a corresponding actual result. The prompt message is used to query a correlation between the audio data and the label. The classification model is modified according to a confirmation response of the prompt message, and the confirmation response is related to a confirmation of the correlation between the audio data and the label. Accordingly, the labeling efficiency and predicting correctness can be improved. 1. A model construction method for audio recognition , comprising:obtaining an audio data;determining a predicted result of the audio data by using a classification model, wherein the classification model is trained based on a machine learning algorithm, and the predicted result comprises a label defined by the classification model;providing a prompt message according to a loss level of the predicted result, wherein the loss level is related to a difference between the predicted result and a corresponding actual result, and the prompt message is provided to query a correlation between the audio data and the label; andmodifying the classification model according to a confirmation response of the prompt message, wherein the confirmation response is related to a confirmation of the correlation between the audio data and the label.2. The model construction method for audio recognition according to claim 1 , wherein the prompt message comprises the audio data and an inquiry content claim 1 , the inquiry content is to query whether the audio data belongs to the label claim 1 , and the steps of providing the ...

Подробнее
18-03-2021 дата публикации

DOCUMENT IDENTIFICATION DEVICE, DOCUMENT IDENTIFICATION METHOD, AND PROGRAM

Номер: US20210082415A1

A document identification device that improves class identification precision of multi-stream documents is provided. The document identification device includes: a primary stream expression generation unit that generates a primary stream expression, which is a fixed-length vector of a word sequence corresponding to each speaker's speech recorded in a setting including a plurality of speakers, for each speaker; a primary multi-stream expression generation unit that generates a primary multi-stream expression obtained by integrating the primary stream expression; a secondary stream expression generation unit that generates a secondary stream expression, which is a fixed-length vector generated based on the word sequence of each speaker and the primary multi-stream expression, for each speaker; and a secondary multi-stream expression generation unit that generates a secondary multi-stream expression obtained by integrating the secondary stream expression. 1. A document identification device comprising:a primary stream expression generation unit that generates a primary stream expression for each speaker, the primary stream expression being a fixed-length vector of a word sequence corresponding to each speaker's speech recorded in a setting including a plurality of speakers;a primary multi-stream expression generation unit that generates a primary multi-stream expression obtained by integrating the primary stream expression;a secondary stream expression generation unit that generates a secondary stream expression for each speaker, the secondary stream expression being a fixed-length vector generated based on the word sequence of each speaker and the primary multi-stream expression; anda secondary multi-stream expression generation unit that generates a secondary multi-stream expression obtained by integrating the secondary stream expression.2. The document identification device according to claim 1 , comprising:a class identification unit that calculates a posteriori ...

Подробнее
05-03-2020 дата публикации

SYSTEM FOR PROCESSING VOICE RESPONSES USING A NATURAL LANGUAGE PROCESSING ENGINE

Номер: US20200076949A1
Принадлежит:

A system for processing voice responses is disclosed. The system is configured to store a correlation table identifying relationships between self-service routines, tags, and corresponding actions. The system receives a call from a user and issues a query in response to the call. The system receives an utterance from the user in response to the user and determines whether the utterance matches a pre-defined response. If there is no match, the system analyzes the utterance with a pre-defined statistical language model and identifies a service tag for the utterance. The system then associates the utterance with the service tag and a self-service routine that is associated with the call. The system identifies an action from the correlation table that correlates to the service tag and the self-service routine. 1. A system for processing voice responses , comprising: a plurality of self-service routines, a plurality of service tags, and a plurality of actions, each action correlating to a pair of a self-service routine and a service tag; and', 'a plurality of pre-defined responses associated with the plurality of self-service routines, each self-service routine associated with a subset of the plurality of pre-defined responses;, 'a memory configured to storean interactive voice response engine communicatively coupled to the memory and configured to receive a first utterance from the user in response to a query; and analyze the first utterance with a pre-defined statistical language model;', 'identify one or more keywords of the first utterance based on the analysis;', 'determine a service tag of the first utterance based on the one or more keywords;', 'compare the service tag of the first utterance with each of the plurality of service tags;', 'in response to determining that the service tag of the first utterance matches a first service tag, associate the first utterance with the first service tag and the first self-service routine; and', 'identify a first action for ...

Подробнее
14-03-2019 дата публикации

LANGUAGE MODEL GENERATING DEVICE, LANGUAGE MODEL GENERATING METHOD, AND RECORDING MEDIUM

Номер: US20190080688A1
Автор: ITSUI Hiroyasu
Принадлежит: Mitsubishi Electric Corporation

A language model generating device according to the present invention includes: a paraphrase generating unit to generate, by using morphemes of a phrase included in learning example sentences that include a plurality of sentences and using synonyms for original expressions of the morphemes, a plurality of paraphrases that include a combination of an original expression of a morpheme and a synonym for an original expression of a morpheme and a combination of synonyms for original expressions of morphemes; and a language model generating unit to generate a language model that is based on an n-gram model from the plurality of paraphrases generated and the learning example sentences. 1. A language model generating device comprising:a paraphrase generator to generate, by using morphemes of a phrase included in learning example sentences that include a plurality of sentences and using synonyms for original expressions of the morphemes, a plurality of paraphrases that include a combination of an original expression of a morpheme and a synonym for an original expression of a morpheme and a combination of synonyms for original expressions of morphemes;a paraphrase sentence extractor to extract and output, from a corpus that includes a plurality of sentences, a paraphrase sentence that includes any of the plurality of paraphrases;an original sentence extractor to extract and output, from the learning example sentences, an original sentence that includes the phrase;a likelihood calculator to calculate likelihood that indicates whether the paraphrase sentence input from the paraphrase sentence extractor is similar in context to the original sentence input from the original sentence extractor;a paraphrase extractor to extract a paraphrase included in a paraphrase sentence whose likelihood has a value that indicates a higher degree of similarity in context to the original sentence than a threshold value; anda language model generator to generate a language model that is based on ...

Подробнее
22-03-2018 дата публикации

SYNTACTIC RE-RANKING OF POTENTIAL TRANSCRIPTIONS DURING AUTOMATIC SPEECH RECOGNITION

Номер: US20180082680A1
Принадлежит:

A system and method for syntactic re-ranking of possible transcriptions generated by automatic speech recognition are disclosed. A computer system accesses acoustic data for a recorded spoken language and generates a plurality of potential transcriptions for the acoustic data. The computer system scores the plurality of potential transcriptions to create an initial likelihood score for the plurality of potential transcriptions. For a particular potential transcription in the plurality of transcriptions, the computer system generates a syntactical likelihood score. The computer system creates an adjusted score for the particular potential transcription by combining the initial likelihood score and the syntactic likelihood score for the particular potential transcription. 1. A system for syntactic re-ranking in automatic speech recognition , the system comprising:a computer-readable memory storing computer-executable instructions that, when executed by one or more hardware processors, configure the system to:access acoustic data for a recorded spoken language;generate a plurality of potential transcriptions for the acoustic data;score the plurality of potential transcriptions to create an initial likelihood score for the plurality of potential transcriptions; and generate a syntactic likelihood score for the particular potential transcription, wherein the syntactic likelihood score is generated by evaluation of a syntactic structure for the particular potential transcription, and wherein the syntactic structure includes relationships between words included in the particular potential transcription;', 'create an adjusted score for the particular potential transcription by combining the initial likelihood score and the syntactic likelihood score for the particular potential transcription; and', 'output the particular potential transcription based on the adjusted score of the particular potential transcription being greater than adjusted scores of other members of the ...

Подробнее
22-03-2018 дата публикации

Bilingual corpus update method, bilingual corpus update apparatus, and recording medium storing bilingual corpus update program

Номер: US20180082681A1

A third sentence obtained by replacing a first phrase of a first sentence with a second phrase is input, and it is judged whether a third phrase is included in a first database including at least a phrase used in written text. If the third phrase is not included, a first evaluation value in the first database is calculated for a seventh phrase obtained by replacing the second phrase of the third phrase with a sixth phrase. It is judged whether the third phrase is included in a second database including at least a phrase used in spoken text and whether a second evaluation value calculated from the first evaluation value satisfies a predetermined condition. If the third phrase is included, and the second evaluation value satisfies the predetermined condition, the third sentence and the second sentence as a pair are added to a bilingual corpus.

Подробнее
24-03-2016 дата публикации

System and method for using semantic and syntactic graphs for utterance classification

Номер: US20160086601A1
Принадлежит: AT&T Intellectual Property II LP

Disclosed herein is a system, method and computer readable medium storing instructions related to semantic and syntactic information in a language understanding system. The method embodiment of the invention is a method for classifying utterances during a natural language dialog between a human and a computing device. The method comprises receiving a user utterance; generating a semantic and syntactic graph associated with the received utterance, extracting all n-grams as features from the generated semantic and syntactic graph and classifying the utterance. Classifying the utterance may be performed any number of ways such as using the extracted n-grams, a syntactic and semantic graphs or writing rules.

Подробнее
24-03-2016 дата публикации

Automated Speech Recognition Proxy System for Natural Language Understanding

Номер: US20160086606A1
Принадлежит:

An interactive response system mixes HSR subsystems with ASR subsystems to facilitate overall capability of voice user interfaces. The system permits imperfect ASR subsystems to nonetheless relieve burden on HSR subsystems. An ASR proxy is used to implement an IVR system, and the proxy dynamically determines how many ASR and HSR subsystems are to perform recognition for any particular utterance, based on factors such as confidence thresholds of the ASRs and availability of human resources for HSRs. 1. A computer-implemented system for processing an interaction , the interaction including an utterance requiring recognition before being usable for further computer-implemented processing , the system comprising:an application configured to provide the utterance, the utterance received from a device of a customer over a computer network; receive the utterance for recognition,', 'identify a grammar to which the utterance is expected to conform,', 'determine a time length of the utterance,', 'dynamically select, based at least in part on the identified grammar and the time length of the utterance, one or more recognizers from:', 'an automated speech recognizer, and', 'a second type of recognizer, different from the automated speech recognizer, and communicating over a computer network with devices located at locations remote from the computer-implemented system; and, 'a recognition decision engine configured toa results decision engine coupled with the one or more recognizers and configured to provide a recognition result responsive to results of processing by the one or more recognizers.2. The system of claim 1 , further comprising a system status subsystem operably connected to the recognition decision engine claim 1 , the recognition decision engine taking as input system load information from the system status subsystem for use in the dynamically selecting.3. The system of claim 1 , wherein a subset of the one or more recognizers is configured to provide a confidence ...

Подробнее
12-03-2020 дата публикации

Concealing phrases in audio traveling over air

Номер: US20200082837A1
Принадлежит: Intel Corp

An example apparatus for concealing phrases in audio includes a receiver to receive a detected phrase via a network. The detected phrase is based on audio captured near a source of an audio stream. The apparatus also includes a speech recognizer to generate a trigger in response to detecting that a section of the audio stream contains a confirmed phrase. The apparatus further includes a phrase concealer to conceal the section of the audio stream in response to the trigger.

Подробнее
25-03-2021 дата публикации

AUTOMATIC ASSIGNMENT OF COOPERATIVE PLATFORM TASKS

Номер: US20210090556A1
Принадлежит:

Systems and method for automatically assigning cooperative platform tasks to appropriate participants are disclosed. In embodiments, a method includes receiving new task data for a new task posted to a remote server; transforming the new task data by natural language processing to produce transformed new task data; representing the new task as a vector in a vector space based on the transformed new task data, wherein the vector space includes representations of completed tasks, and the completed tasks are associated with respective participants; calculating distances between the new task and the respective completed tasks represented in the vector space; ranking the respective participants based on the distances between the new task and the completed tasks associated with respective participants; determining a select participant of the respective participants to be assigned to the new task based on the ranking; and initiating automatic assignment of the new task to the select participant. 1. A computer-implemented method comprising:receiving, by a computing device, new task data for a new task posted to a remote server, via a network;transforming, by the computing device, the new task data by natural language processing to produce transformed new task data;representing, by the computing device, the new task as a vector in a vector space based on the transformed new task data, wherein the vector space includes representations of completed tasks, and the completed tasks are associated with respective participants;calculating, by the computing device, distances between the new task and the respective completed tasks represented in the vector space;ranking, by the computing device, the respective participants based on the distances between the new task and the completed tasks associated with respective participants;determining, by the computing device, a select participant of the respective participants to be assigned to the new task based on the ranking; andinitiating, ...

Подробнее
25-03-2021 дата публикации

DIALOGUE SYSTEM, DIALOGUE PROCESSING METHOD, TRANSLATING APPARATUS, AND METHOD OF TRANSLATION

Номер: US20210090557A1
Принадлежит:

A dialogue system includes: a speech recognizes configured to generate an input sentence by converting a speech of a user into a text; a dialogue manager configured to generate a meaning representation for the input sentence; and a result processor configured to generate a plurality of output sentences corresponding to the meaning representation. The dialogue manager generates a meaning representation for each of the plurality of output sentences. The result processor generates a system response based on the meaning representation for the input sentence and the meaning representation for each of the plurality of output sentences. 1. A dialogue system , comprising:a speech recognizer configured to generate an input sentence by converting a speech of a user into a text;a dialogue manager configured to generate a meaning representation for the input sentence;a result processor configured to generate a plurality of output sentences corresponding to the meaning representation for the input sentence, whereinthe dialogue manager generates a meaning representation for each of the plurality of output sentences, and whereinthe result processor generates a system response based on the meaning representation for the input sentence and the meaning representation for each of the plurality of output sentences.2. The system according to claim 1 , whereinthe result processor determines a rank of the plurality of output sentences using a N-best algorithm.3. The system according to claim 2 , whereinthe result processor determines the rank of the plurality of output sentences again based on a similarity degree between the meaning representation for the input sentence and the meaning representation for each of the plurality of output sentences.4. The system according to claim 1 , whereinthe result processor assigns a confidence score to each of the plurality of output sentences using the N-best algorithm.5. The system according to claim 4 , whereinthe result processor assigns a ...

Подробнее
25-03-2021 дата публикации

SPEECH RECOGNITION WITH SELECTIVE USE OF DYNAMIC LANGUAGE MODELS

Номер: US20210090569A1
Принадлежит: Google LLC

A computer-implemented method for transcribing an utterance includes receiving, at a computing system, speech data that characterizes an utterance of a user. A first set of candidate transcriptions of the utterance can be generated using a static class-based language model that includes a plurality of classes that are each populated with class-based terms selected independently of the utterance or the user. The computing system can then determine whether the first set of candidate transcriptions includes class-based terms. Based on whether the first set of candidate transcriptions includes class-based terms, the computing system can determine whether to generate a dynamic class-based language model that includes at least one class that is populated with class-based terms selected based on a context associated with at least one of the utterance and the user. 1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising:receiving an audio signal characterizing an utterance captured by a computing device associated with a user;generating, using a class-based language model, a word lattice representing a transcription hypothesis path that traverses a sequence of terms corresponding to a respective transcription for the utterance, the sequence of terms including a class identifier;expanding the word lattice with a list of user-specific class-based terms derived from a context associated with the user, the list of user-specific class-based terms belonging to a class associated with the class identifier included in the sequence of terms; andproviding, using the expanded word lattice, a speech recognition result for the utterance.2. The computer-implemented method of claim 1 , wherein the list of user-specific class-based terms comprise names derived from a contact list of the user.3. The computer-implemented method of claim 1 , wherein the word lattice is represented by a finite state ...

Подробнее
31-03-2016 дата публикации

SYSTEM AND METHOD FOR GENERATING CUSTOMIZED TEXT-TO-SPEECH VOICES

Номер: US20160093287A1
Принадлежит:

A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice. 1. A method comprising:receiving a selection of an animated character to guide a user on a website;collecting text data from a pre-existing text data source, to yield collected text data, wherein the text data is associated with a domain of the website;selecting synthesis speech units specific to the domain from a pre-existing inventory of synthesis speech units using the collected text data;caching the synthesis speech units specific to the domain as an in-domain inventory of synthesis speech units; andgenerating, via a processor, a custom text-to-speech voice for a specific task in the domain utilizing the in-domain inventory of synthesis speech units, wherein the animated character will use the custom text-to-speech voice.2. The method of claim 1 , further comprising determining whether the custom text-to-speech voice conforms to a selected level of synthesis quality.3. The method of claim 2 , further comprising:when the custom text-to-speech voice does not conform to the selected level of synthesis quality, collecting additional text ...

Подробнее
31-03-2016 дата публикации

SYSTEM AND METHOD FOR MACHINE-MEDIATED HUMAN-HUMAN CONVERSATION

Номер: US20160093296A1
Автор: Bangalore Srinivas
Принадлежит:

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing speech. A system configured to practice the method monitors user utterances to generate a conversation context. Then the system receives a current user utterance independent of non-natural language input intended to trigger speech processing. The system compares the current user utterance to the conversation context to generate a context similarity score, and if the context similarity score is above a threshold, incorporates the current user utterance into the conversation context. If the context similarity score is below the threshold, the system discards the current user utterance. The system can compare the current user utterance to the conversation context based on an n-gram distribution, a perplexity score, and a perplexity threshold. Alternately, the system can use a task model to compare the current user utterance to the conversation context. 1. A method comprising:generating a conversation context model based on user utterances and facial recognition data, wherein the conversation context model comprises a model of a speech dialog occurring between a speech dialog system and a speaker;continuously comparing the speech dialog to the conversation context model, to yield a context similarity score;modifying the context similarity score based on a head orientation of the speaker, to yield a modified context similarity score;when the modified context similarity score is above a threshold, incorporating a current user utterance into the conversation context model for use in the speech dialog; andwhen the modified context similarity score is one of equaling the threshold and below the threshold, suppressing the current user utterance such that the current user utterance is not incorporated into the conversation context model and the speech dialog produces speech as though the current user utterance is not in the conversation context model.2. The method of claim ...

Подробнее
31-03-2016 дата публикации

PARSIMONIOUS HANDLING OF WORD INFLECTION VIA CATEGORICAL STEM + SUFFIX N-GRAM LANGUAGE MODELS

Номер: US20160093301A1
Принадлежит:

Systems and processes are disclosed for predicting words using a categorical stem and suffix word n-gram language model. A word prediction includes determining a stem probability using a stem language model. The word prediction also includes determining a suffix probability using suffix language model decoupled from the stem model, in view of one or more stem categories. The word prediction also includes determine a probability of the stem belonging to the stem category. A joint probability is determined based on the foregoing, and one or more word predictions having sufficient likelihood. In this way, the categorical stem and suffix language model constraints predicted suffixes to those that would be grammatically valid with predicted stems, thereby producing word predictions with grammatically valid stem and suffix combinations. 1. A method for predicting words , the method comprising: receiving input from a user;', 'determining, using an n-gram language model, a probability of a predicted word based on a previously-input word in the received input, wherein the predicted word comprises a stem and a suffix;', 'determining, a probability of the suffix being grammatically valid for the stem;', 'determining an integrated probability of the predicted word based on the probability of the predicted word and the probability of the suffix being grammatically valid for the stem; and', 'providing output of the predicted word, based on the integrated probability., 'at an electronic device2. The method of claim 1 ,wherein the stem is associated with a stem category, andwherein determining the probability of the predicted word comprises determining a probability of the suffix based on the stem category and the previously-input word.3. The method of claim 2 ,wherein the stem is associated with a stem category, andwherein determining the probability of the suffix being grammatically valid for the stem comprises determining a probability of the stem being associated with the stem ...

Подробнее
29-03-2018 дата публикации

APPARATUS AND METHODS FOR DYNAMICALLY CHANGING A LANGUAGE MODEL BASED ON RECOGNIZED TEXT

Номер: US20180090147A1
Автор: Corfield Charles
Принадлежит:

The technology of the present application provides a method and apparatus to manage speech resources. The method includes using a text recognizer to detect a change in a speech application that requires the use of different resources. On detection of the change, the method loads the different resources without the user needing to exit the currently executing speech application. 1. A method performed on at least one processor for managing speech resources of a speech recognition engine , the method comprising the steps of:initiating a speech recognition engine with a first language model;converting audio received by the speech recognition engine to interim text;determine whether the interim text matches at least one trigger; andif it is determined that the interim text does not match the at least one trigger, outputting the interim text as recognized text;if it is determined that the interim text does match the at least one trigger, replacing the first language model with a second language model.2. The method of wherein the initiating step comprises initiating the speech recognition engine with a first user profile and the replacing step further comprises replacing the first user profile with a second user profile.3. The method of wherein claim 1 , if it is determined that the interim text does match the at least one trigger claim 1 , the method comprises the steps of:pausing the converting step until the first language model is replaced with the second language model and resuming the converting step.4. The method of wherein the step of converting the audio to interim text comprises correlating the audio and the text.5. The method of wherein correlating the audio and the text comprises creating a plurality of small audio files from the audio and converting the plurality of small audio files into a corresponding plurality of interim text files and wherein the outputted recognized text is concatenated from the plurality of interim text files.6. The method of wherein ...

Подробнее
05-05-2022 дата публикации

PREDICTION DEVICE, PREDICTION METHOD, AND PROGRAM

Номер: US20220139381A1
Автор: IJIMA Yusuke

An estimation device (), which is an estimation device that estimates a duration of a speech section, includes: a representation conversion unit () that performs representation conversion of a plurality of words included in learning utterance information to a plurality of pieces of numeric representation data; an estimation data generation unit () that generates estimation data by using a plurality of pieces of the learning utterance information and the plurality of pieces of numeric representation data; an estimation model learning unit () that learns an estimation model by using the estimation data and the durations of the plurality of words; and an estimation unit () that estimates the duration of a predetermined speech section based on utterance information of a user by using the estimation model.

Подробнее
19-03-2020 дата публикации

Recognizing transliterated words using suffix and/or prefix outputs

Номер: US20200089759A1
Принадлежит: International Business Machines Corp

A computer-implemented method includes: receiving, by a computing device, an input file defining correct spellings of one or more transliterated words; generating, by the computing device, suffix outputs based on the one or more transliterated words; generating, by the computing device, a dictionary that maps the suffix outputs to the one or more transliterated words; recognizing, by the computing device, an alternatively spelled transliterated word included in a document as one of the one or more correctly spelled transliterated words using the dictionary; and outputting, by the computing device, information corresponding to the recognized transliterated word.

Подробнее
19-03-2020 дата публикации

MANAGEMENT SERVER, COMMUNICATION METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Номер: US20200092414A1
Автор: MORISHITA Shota
Принадлежит: NEC Corporation

A management server includes a call detection unit that detects a call from a first communication terminal to a second communication terminal, an information acquisition unit that acquires, when the call detection unit detects the call, essential information associated with the first communication terminal, an inquiry unit that transmits inquiry information to the first communication terminal to inquire about an information transmission means to the second communication terminal, and a transmission unit that transmits the essential information to the second communication terminal using an information transmission means indicated by response information transmitted from the first communication terminal in response to the inquiry information. 1. A management server comprising:a call detection unit configured to detect a call from a first communication terminal to a second communication terminal;an information acquisition unit configured to acquire, when the call detection unit detects the call, essential information associated with the first communication terminal;an inquiry unit configured to transmit, to the first communication terminal, inquiry information for inquiring about an information transmission means to the second communication terminal; anda transmission unit configured to transmit the essential information to the second communication terminal using an information transmission means indicated by response information transmitted from the first communication terminal in response to the inquiry information.2. The management server according to claim 1 , wherein the inquiry unit transmits inquiry information for inquiring which of a text communication or a voice communication is to be selected as the information transmission means.3. The management server according to claim 1 , wherein the transmission unit transmits claim 1 , to the second communication terminal claim 1 , input information transmitted from the first communication terminal using the ...

Подробнее
06-04-2017 дата публикации

METHODS AND SYSTEMS TO TRAIN CLASSIFICATION MODELS TO CLASSIFY CONVERSATIONS

Номер: US20170098443A1
Принадлежит:

Methods and systems for training a conversation-classification model are disclosed. A first set of conversations in a source domain and a second set of conversation in a target domain are received. Each of the first set of conversations has an associated predetermined tag. One or more features are extracted from the first set of conversations and from the second set of conversations. Based on the similarity of content in the first set of conversations and the second set of conversations, a first weight is assigned to each conversation of the first set of conversations. Further, a second weight is assigned to the one or more features of the first set of conversations based on the similarity of the one or more features of the first set of conversations and of the second set of conversations. A conversation-classification model is trained based on the first weight and the second weight. 1. A method for training a conversation classification model , said method comprising:receiving, by a transceiver, a first set of conversations corresponding to a source domain and a second set of conversations corresponding to a target domain, wherein each conversation in said first set of conversations has an associated predetermined tag, and wherein each conversation in the first set of conversations and the second set of conversations corresponds to an audio conversation;generating, by one or more processors, a transcript for each conversation in the first set of conversations and the second set of conversations based on a speech to text conversion technique;extracting, by the one or more processors, one or more features from the transcript of each of said first set of conversations and said second set of conversations;assigning, by the one or more processors, a first weight to each conversation in said first set of conversations based on at least a similarity between content of said first set of conversations and content of said second set of conversations, wherein the similarity ...

Подробнее
06-04-2017 дата публикации

System and Method of Automated Language Model Adaptation

Номер: US20170098445A1
Принадлежит: Verint Systems Ltd

Systems and methods of automated adaptation of a language model for transcription of audio data include obtaining audio data. The audio data is transcribed with a language model to produce a plurality of audio tile transcriptions. A quality of the plurality of audio file transcriptions is evaluated. At least one best transcription from a plurality of audio tile transcriptions is selected based upon the evaluated quality. Statistics are calculated from the selected at least one best transcription from the plurality of audio file transcriptions. The language model is modified from the calculated statistics.

Подробнее
28-03-2019 дата публикации

Multiple Voice Recognition Model Switching Method And Apparatus, And Storage Medium

Номер: US20190096396A1

Embodiments of the present disclosure disclose a method and apparatus for switching multiple speech recognition models. The method includes: acquiring at least one piece of speech information in user input speech; recognizing the speech information and matching a linguistic category for the speech information to determine a corresponding target linguistic category based on a matching degree; and switching a currently used speech recognition model to a speech recognition model corresponding to the target linguistic category. The embodiments of the present disclosure determine the corresponding target linguistic category based on the matching degree by recognizing the speech information and matching the linguistic category for the speech information, and switch the currently used speech recognition model to the speech recognition model corresponding to the target linguistic category. 1. A method for switching multiple speech recognition models , the method comprising:acquiring at least one piece of speech information in user input speech;recognizing the speech information and matching a linguistic category for the speech information to determine a corresponding target linguistic category based on a matching degree; andswitching a currently used speech recognition model to a speech recognition model corresponding to the target linguistic category.2. The method according to claim 1 , wherein the recognizing the speech information and matching a linguistic category for the speech information to determine a corresponding target linguistic category based on a matching degree comprises:recognizing the speech information based on features of at least two linguistic categories to obtain a similarity between the speech information and each of the linguistic categories, and defining the similarity as the matching degree of the linguistic category.3. The method according to claim 1 , wherein the recognizing the speech information and matching a linguistic category for the speech ...

Подробнее
14-04-2016 дата публикации

PHRASE-BASED DIALOGUE MODELING WITH PARTICULAR APPLICATION TO CREATING RECOGNITION GRAMMARS FOR VOICE-CONTROLLED USER INTERFACES

Номер: US20160104481A1
Принадлежит: NANT HOLDINGS IP, LLC

The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). More specifically, the invention concerns a phrase-based modeling of generic structures of verbal interaction and use of these models for the purpose of automating part of the design of such grammar networks. 169-. (canceled)70. A speech recognition computer system comprising:an acoustic signal analyzer that generates acoustic features from a digital speech signal representing spoken input;at least one recognition grammar specifying legitimate word sequences representing abstract meaning from alternative phrase variants of a language;a phonetic dictionary having phonetic transcriptions of acoustic features; and computes a set of most probable word hypotheses based on the acoustic features from the digital speech signal and the phonetic transcriptions in the phonetic dictionary;', 'constrains the word hypotheses to a narrowed set of commands according to the abstract meaning of the legitimate word sequences from the recognition grammar; and', 'provides the narrowed set of commands to a natural language processing component that translates the narrowed set of commands to a formalized set of instructions that can be processed by an application., 'an acoustic decoder coupled with the acoustic signal analyzer, the recognition grammar, and the phonetic dictionary and that71. The system of claim 70 , wherein the acoustic signal analyzer is configured to obtain the digital speech signal from a microphone.72. The system of claim 70 , wherein the acoustic signal analysis is configured to obtain a speech signal from a telephone.73. The system of claim 70 , further comprising a speech recognition front end that comprises the acoustic signal analyzer claim 70 , the acoustic decoder claim 70 , the recognition grammar claim 70 , and the phonetic dictionary.74. The system of claim 70 , wherein ...

Подробнее
08-04-2021 дата публикации

INTENT-BASED CONVERSATIONAL KNOWLEDGE GRAPH FOR SPOKEN LANGUAGE UNDERSTANDING SYSTEM

Номер: US20210104234A1
Принадлежит: PricewaterhouseCoopers LLP

Described are systems, methods, apparatuses, and computer program product embodiments for automatically processing intent-based spoken language for SLU. The disclosed solution uses a scale-free network structured conversational knowledge graph that stores nodes representative of actions, objects, and intent names and edges representative of relationships between the nodes. For all phrases (including a sentence) from the same intent, the system calculates a mean feature vector using a Universal Sentence Embedding (USE) model as a feature element. The system also employs a multi-step intent detection strategy. A graph query technique may be used to match all potential intent nodes from the trained knowledge graph. The system may compute a covariance matrix between the feature element of an input phrase and feature elements of all potential intents. The major component of the covariance matrix along with the maximum covariance may be used to determine the final intent. 1. A method for training a spoken language understanding (SLU) system , the method comprising:receiving one or more phrases; determining one or more words in the one or more phrases,', 'adding one or more nodes to the knowledge graph, the one or more nodes corresponding to the one or more words in the one or more phrases,', 'determining one or more intents of the one or more phrases,', 'adding one or more intent nodes to the knowledge graph, the one or more intent nodes corresponding to the one or more determined intents,, 'creating a knowledge graph, the creation comprisingadding one or more edges to the knowledge graph, andusing the one or more edges to form connections between the one or moredetermined intents and the corresponding one or more words; andwherein the SLU system is configured to determine an intent of a user input phrase using the connections between the one or more determined intents and the corresponding one or more words.2. The method of claim 1 , further comprising:receiving a user ...

Подробнее
08-04-2021 дата публикации

Techniques for incremental computer-based natural language understanding

Номер: US20210104236A1
Принадлежит: Disney Enterprises Inc

Various embodiments disclosed herein provide techniques for performing incremental natural language understanding on a natural language understanding (NLU) system. The NLU system acquires a first audio speech segment associated with a user utterance. The NLU system converts the first audio speech segment into a first text segment. The NLU system determines a first intent based on a text string associated with the first text segment, wherein the text string represents a portion of the user utterance. The NLU system generates a first response based on the first intent prior to when the user utterance completes.

Подробнее
04-04-2019 дата публикации

Method and apparatus for correcting input speech based on artificial intelligence, and storage medium

Номер: US20190103097A1
Автор: Kuai LI

The present disclosure provides a method and an apparatus for correcting an input speech based on artificial intelligence. The method includes: receiving a speech input by a user; performing recognition on the speech to obtain a current recognition text; obtaining at least one candidate phrase of a first phrase to be corrected in the current recognition text and displaying the at least one candidate phrase to the user; detecting a select operation of the user, the select operation being configured to select one of the at least one candidate phrase as a target candidate phrase; and correcting the first phrase in the current recognition text by using the target candidate phrase, to obtain a target recognition text.

Подробнее
02-06-2022 дата публикации

MACHINE LEARNING TO PROPOSE ACTIONS IN RESPONSE TO NATURAL LANGUAGE QUESTIONS

Номер: US20220172712A1
Принадлежит: INTUIT INC.

A method including embedding, by a trained issue MLM (machine learning model), a new natural language issue statement into an issue vector. An inner product of the issue vector with an actions matrix is calculated. The actions matrix includes centroid-vectors calculated using a clustering method from a second output of a trained action MLM which embedded prior actions expressed in natural language action statements taken as a result of prior natural issue statements. Calculating the inner product results in probabilities associated with the prior actions. Each of the probabilities represents a corresponding estimate that a corresponding prior action is relevant to the issue vector. A list of proposed actions relevant to the issue vector is generated by comparing the probabilities to a threshold value and selecting a subset of the prior actions with corresponding probabilities above the threshold. The list of proposed actions is transmitted to a user device. 17-. (canceled)8. A method of using a trained issue machine learning model (MLM) , comprising:embedding, by the trained issue MLM, a new natural language issue statement into an issue vector; wherein the actions matrix comprises a plurality of centroid-vectors calculated using a clustering method from a second output of a trained action MLM which embedded a plurality of prior actions expressed in natural language action statements taken as a result of prior natural issue statements,', 'wherein calculating the inner product results in a plurality of probabilities associated with the plurality of prior actions, and', 'wherein each of the plurality of probabilities represents a corresponding estimate that a corresponding prior action is relevant to the issue vector;, 'calculating an inner product of the issue vector with an actions matrix,'}generating a list of proposed actions relevant to the issue vector by comparing the plurality of probabilities to a threshold value and selecting a subset of the plurality of ...

Подробнее
19-04-2018 дата публикации

METHOD AND APPARATUS FOR DISCOVERING TRENDING TERMS IN SPEECH REQUESTS

Номер: US20180108346A1
Принадлежит:

Systems and processes are disclosed for discovering trending terms in automatic speech recognition. Candidate terms (e.g., words, phrases, etc.) not yet found in a speech recognizer vocabulary or having low language model probability can be identified based on trending usage in a variety of electronic data sources (e.g., social network feeds, news sources, search queries, etc.). When candidate terms are identified, archives of live or recent speech traffic can be searched to determine whether users are uttering the candidate terms in dictation or speech requests. Such searching can be done using open vocabulary spoken term detection to find phonetic matches in the audio archives. As the candidate terms are found in the speech traffic, notifications can be generated that identify the candidate terms, provide relevant usage statistics, identify the context in which the terms are used, and the like. 1. A method for discovering trending terms in automatic speech recognition , the method comprising: identifying a candidate term based on a frequency of occurrence of the term in one or more electronic data sources;', 'in response to identifying the candidate term, searching for the candidate term in an archive of speech traffic of an automatic speech recognizer using phonetic matching; and', 'in response to finding the candidate term in the archive, updating a vocabulary of the automatic speech recognizer with the candidate term., 'at an electronic device having a processor and memory2. The method of claim 1 , wherein identifying the candidate term comprises:identifying one or more terms in the one or more electronic data sources;determining a frequency of occurrence of the one or more terms in the one or more electronic data sources; andselecting the candidate term based on the determined frequency of occurrence of the one or more terms.3. The method of claim 1 , wherein the one or more electronic data sources include a news source or a social media feed.4. The method of ...

Подробнее
29-04-2021 дата публикации

Voice recognition method and device

Номер: US20210125611A1
Принадлежит: LG ELECTRONICS INC

Disclosed is a voice recognition device and method. According to the disclosure, the voice recognition device, upon failing to grasp the intent of the user's utterance from the original utterance which is divided into a head utterance and a tail utterance, figures out the intent from the head utterance to thereby complete the original utterance and provides the result of voice recognition processing on the original utterance. According to an embodiment, the voice recognition device may be related to artificial intelligence (AI) modules, robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.

Подробнее
26-04-2018 дата публикации

METHOD OF SELECTING TRAINING TEXT FOR LANGUAGE MODEL, AND METHOD OF TRAINING LANGUAGE MODEL USING THE TRAINING TEXT, AND COMPUTER AND COMPUTER PROGRAM FOR EXECUTING THE METHODS

Номер: US20180114524A1
Принадлежит:

Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods. The present invention provides for selecting training text for a language model that includes: generating a template for selecting training text from a corpus in a first domain according to generation techniques of: (i) replacing one or more words in a word string selected from the corpus in the first domain with a special symbol representing any word or word string, and adopting the word string after replacement as a template for selecting the training text; and/or (ii) adopting the word string selected from the corpus in the first domain as the template for selecting the training text; and selecting text covered by the template as the training text from a corpus in a second domain different from the first domain. 1. A computer-implemented method for selecting training text for a language model , the method comprising: identifying a first plurality substrings in a word string selected from the first corpus;', 'replacing a respective word in each substring of the first plurality of substrings with a special symbol to generate a second plurality of substrings; and', 'adding the second plurality of substrings to the template;, 'generating, from a first corpus in a first domain, a template for selecting the training text, wherein generating the template comprisesidentifying text that is included in a second corpus in a second domain different from the first domain;determining that the text is covered by the template;selecting the text as at least a portion of the training text; andtraining the language model using the selected training text.2. The method of claim 1 , wherein the word string is a first word string claim 1 , and wherein selecting the text as the at least a portion of the training text comprises:generating a third plurality of substrings of a second word string selected from the ...

Подробнее
09-06-2022 дата публикации

Dialogue system, dialogue processing method, translating apparatus, and method of translation

Номер: US20220180864A1
Принадлежит: Hyundai Motor Co, Kia Corp

A translating apparatus includes: a translator configured to translate an input sentence of a first language into a second language to generate a plurality of output sentences; a first dialogue manager configured to generate a meaning representation for the input sentence of the first language; a second dialogue manager configured to generate a meaning representation for each of the plurality of output sentences of the second language; and a determiner configured to determine a final output sentence among the plurality of output sentences of the second language based on the meaning representation for the input sentence of the first language and the meaning representation for each of the plurality of output sentences of the second language.

Подробнее
04-05-2017 дата публикации

Language model training method and device

Номер: US20170125013A1
Автор: Zhiyong Yan

The present disclosure provides a language model training method and device, including: obtaining a universal language model in an offline training mode, and clipping the universal language model to obtain a clipped language model; obtaining a log language model of logs within a preset time period in an online training mode; fusing the clipped language model with the log language model to obtain a first fusion language model used for carrying out first time decoding; and fusing the universal language model with the log language model to obtain a second fusion language model used for carrying out second time decoding. The method is used for solving the problem that a language model obtained offline in the prior art has poor coverage on new corpora, resulting in a reduced language recognition rate.

Подробнее
25-08-2022 дата публикации

INTELLIGENT NATURAL LANGUAGE DIALOGUE SYSTEMS AND METHODS FOR CREATING INTELLIGENT NATURAL LANGUAGE DIALOGUES FOR EFFICIENT RETRIEVAL OF ITEMS IN ONE OR MORE LARGE DATABASES

Номер: US20220269734A1
Автор: Russell Dale W.
Принадлежит:

Intelligent natural language dialogue systems and methods are described for creating intelligent natural language dialogues for efficient retrieval of items in one or more large databases. An intelligent dialogue model is trained with feature-value pair(s) corresponding to natural language phrase(s) defining respective feature-specific intents. The intelligent dialogue model interprets the audible utterance of a user to determine user-specific feature-value pair(s), where the audible utterance comprises a natural language phrase corresponding to a feature-specific intent. An item is identified based on a matching score determined from the user-specific feature-value pair(s) compared with one or more item definitions defined in a database. 1. An intelligent natural language dialogue system configured to create intelligent natural language dialogues for efficient retrieval of items in one or more large databases , the intelligent natural language dialogue system comprising:a computing device comprising one or more processors;an intelligent dialogue model accessible by the one or more processors, the intelligent dialogue model trained with one or more of feature-value pairs corresponding to one or more natural language phrases defining respective feature-specific intents; anda voice assistant application (app) comprising computing instructions, wherein the computing instructions, when executed by the one or more processors, are configured to cause the one or more processors to:interpret, with the intelligent dialogue model, an audible utterance of a user to determine one or more user-specific feature-value pairs, wherein the audible utterance comprises a natural language phrase corresponding to a feature-specific intent, andidentify an item based on a matching score determined from the user-specific feature-value pairs compared with one or more item definitions defined in a database.2. The intelligent natural language dialogue system of claim 1 , wherein the one or ...

Подробнее
25-08-2022 дата публикации

VIDEO-AIDED UNSUPERVISED GRAMMAR INDUCTION

Номер: US20220270596A1
Автор: SONG Linfeng
Принадлежит: Tencent America LLC

A method of training a natural language neural network comprises obtaining at least one constituency span; obtaining a training video input; applying a multi-modal transform to the video input, thereby generating a transformed video input; comparing the at least one constituency span and the transformed video input using a compound Probabilistic Context-Free Grammar (PCFG) model to match the at least one constituency span with corresponding portions of the transformed video input; and using results from the comparison to learn a constituency parser. 1. A natural language neural network training method , performed by a computer device , the method comprising:obtaining at least one constituency span;obtaining a training video input;applying a multi-modal transform to the video input, thereby generating a transformed video input;comparing the at least one constituency span and the transformed video input using a compound Probabilistic Context-Free Grammar (PCFG) model to match the at least one constituency span with corresponding portions of the transformed video input; andusing results from the comparison to learn a constituency parser.2. The natural language neural network training method according to claim 1 , wherein after obtaining the training video input claim 1 , and before the multi-modal transform is applied claim 1 , the training video input is divided into feature sequence projections (F) according to the formula F={f} claim 1 , where fand Lare an ith feature and a total number of features of an mth expert claim 1 , the expert being an extraction of a video representation from M models trained on different tasks.3. The natural language neural network training method according to claim 2 , wherein the feature sequence projections (F) are used as an input to the multi-modal transform.4. The natural language neural network training method according to claim 3 , wherein the feature sequence projections (F) claim 3 , before being used as the input to the multi- ...

Подробнее
27-05-2021 дата публикации

SYSTEM AND METHOD TO IMPROVE PERFORMANCE OF A SPEECH RECOGNITION SYSTEM BY MEASURING AMOUNT OF CONFUSION BETWEEN WORDS

Номер: US20210158804A1
Автор: SHU Chang, Tiwari Sanchita
Принадлежит:

Systems and methods to improve the performance of an automatic speech recognition (ASR) system using a confusion index indicative of the amount of confusion between words are described, where a confusion index (CI) or score is calculated by receiving a first word (Word) and a second word (Word), calculating an acoustic score (A) indicative of the phonetic difference between Word and Word, calculating a weighted language score (W (U+U)) indicative of a weighted likelihood (or word frequency) of Word and Word occurring in the corpus, the confusion index CI incorporating both the acoustic score and the weighted language score, such that the CI for words that sound alike and have a high likelihood of occurring in the corpus will be higher than the CI for words that sound alike and do not have a high likelihood of occurring in the corpus. In some embodiments, the CI may be used to artificially boost uncommon words in a corpus to improve their visibility, to add context to uncommon words in a corpus to avoid conflict with common words, and to remove unimportant words from the lexicon to avoid conflicts with other corpus words. 1. A method for improving the performance of an automatic speech recognition (ASR) system that uses a language model based on a corpus data set which includes words from a generic corpus data set and a domain-specific data set , using a confusion index indicative of the amount of confusion between words from the data sets , comprising: receiving a first word from the generic corpus data set and a second word from the domain specific data set;', {'b': '12', 'calculating an acoustic score A indicative of an acoustic distance between the first word and the second word using a lexicon having a phonetic breakdown of the first word and the second word;'}, {'b': 1', '2', '1', '2, 'calculating a weighted language score indicative of the likelihood of the first word and the second word occurring in the corpus data set, comprising performing an equation: W(U+ ...

Подробнее
27-05-2021 дата публикации

AUTOMATIC TURN DELINEATION IN MULTI-TURN DIALOGUE

Номер: US20210158812A1
Принадлежит: Microsoft Technology Licensing, LLC

A method of automatically delineating turns in a multi-turn dialogue between a user and a conversational computing interface. Audio data encoding speech of the user in the multi-turn dialogue is received. The audio data is analyzed to recognize, in the speech of the user, an utterance followed by a silence. The utterance is recognized as a last utterance in a turn of the multi-turn dialogue responsive to the silence exceeding a context-dependent duration dynamically updated based on a conversation history of the multi-turn dialogue and features of the received audio, wherein the conversation history includes one or more previous turns of the multi-turn dialogue taken by the user and one or more previous turns of the multi-turn dialogue taken by the conversational computing interface. 1. A method of automatically delineating turns in a multi-turn dialogue between a user and a conversational computing interface , comprising:receiving audio data encoding speech of the user in the multi-turn dialogue;analyzing the received audio to recognize, in the speech of the user, an utterance followed by a silence; andrecognizing the utterance as a last utterance in a turn of the multi-turn dialogue responsive to the silence exceeding a context-dependent duration dynamically updated based on a conversation history of the multi-turn dialogue and features of the received audio, wherein the conversation history includes one or more previous turns of the multi-turn dialogue taken by the user and one or more previous turns of the multi-turn dialogue taken by the conversational computing interface.2. The method of claim 1 , wherein the features of the received audio include one or more acoustic features.3. The method of claim 2 , wherein the one or more acoustic features include an intonation of the user relative to a baseline speaking pitch of the user.4. The method of claim 2 , wherein the one or more acoustic features include one or more of 1) a baseline speaking rate for the user ...

Подробнее