Поиск патентов

Настройки

Глубина выборки

Укажите год

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Ключевые слова. Может быть несколько по одной на строку

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка

Автор

Ведите корректный номера.

Владелец

Ведите корректный номера.

Классы IPC

Ведите корректный номера.

Классы CPC

Ведите корректный номера.

Начиная с года

Укажите год

Заканчивая годом

Укажите год

Применить Всего найдено 22356. Отображено 100.

12-01-2012 дата публикации

System and Method for Unsupervised and Active Learning for Automatic Speech Recognition

Номер: US20120010885A1

Автор: Dilek Zeynep Hakkani-Tur, Giuseppe Riccardi

Принадлежит: AT&T Intellectual Property II LP

A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.

Подробнее

Номер записи: 1

26-01-2012 дата публикации

Speech recognition circuit and method

Номер: US20120022862A1

Автор: Damian Kelly Harris-Dowsett, Guy Larri, Mark Catchpole, Timothy Brian Reynolds

Принадлежит: Individual

A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, the memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying the scores by adding the score updates to the scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to the scores.

Подробнее

Номер записи: 2

19-04-2012 дата публикации

Automatically providing a user with substitutes for potentially ambiguous user-defined speech commands

Номер: US20120095765A1

Автор: James R. Lewis, Leslie R. Wilson, William K. Bodin

Принадлежит: Nuance Communications Inc

A method for alleviating ambiguity issues of new user-defined speech commands. An original command for a user-defined speech command can be received. It can then be determined if the original command is likely to be confused with a set of existing speech commands. When confusion is unlikely, the original command can be automatically stored. When confusion is likely, a substitute command that is unlikely to be confused with existing commands can be automatically determined. The substitute can be presented as an alternative to the original command and can be selectively stored as the user-defined speech command.

Подробнее

Номер записи: 3

26-04-2012 дата публикации

Multi-state barge-in models for spoken dialog systems

Номер: US20120101820A1

Автор: Andrej Ljolje

Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

Подробнее

Номер записи: 4

14-06-2012 дата публикации

Male acoustic model adaptation based on language-independent female speech data

Номер: US20120150541A1

Автор: Gaurav Talwar, Rathinavelu Chengalvarayan

Принадлежит: GENERAL MOTORS LLC

A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.

Подробнее

Номер записи: 5

14-06-2012 дата публикации

Method and system for reconstructing speech from an input signal comprising whispers

Номер: US20120150544A1

Автор: Farzaneh Ahmadi, Hamid Reza Sharifzadeh, Ian Vince Mcloughlin

Принадлежит: NANYANG TECHNOLOGICAL UNIVERSITY

A system for reconstructing speech from an input signal comprising whispers is disclosed. The system comprises an analysis unit configured to analyse the input signal to form a representation of the input signal; an enhancement unit configured to modify the representation of the input signal to adjust a spectrum of the input signal, wherein the adjusting of the spectrum of the input signal comprises modifying a bandwidth of at least one formant in the spectrum to achieve a predetermined spectral energy distribution and amplitude for the at least one formant; and a synthesis unit configured to reconstruct speech from the modified representation of the input signal.

Подробнее

Номер записи: 6

05-07-2012 дата публикации

Dialect translator for a speech application environment extended for interactive text exchanges

Номер: US20120173225A1

Автор: Baiju D. Mandalia, Victor S. Moore, Wendi L. Nusbickel, William V. Da Palma

Принадлежит: Nuance Communications Inc

The present solution includes a real-time automated communication method. In the method, a real-time communication session can be established between a text exchange client and a speech application. A translation table can be identified that includes multiple entries, each entry including a text exchange item and a corresponding conversational translation item. A text exchange message can be received that was entered into a text exchange client. Content in the text exchange message that matches a text exchange item in the translation table can be substituted with a corresponding conversational item. The translated text exchange message can be sent as input to a voice server. Output from the voice server can be used by the speech application, which performs an automatic programmatic action based upon the output.

Подробнее

Номер записи: 7

05-07-2012 дата публикации

Subspace Speech Adaptation

Номер: US20120173240A1

Автор: Daniel Povey, Kaisheng Yao, Yifan Gong

Принадлежит: Microsoft Corp

Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.

Подробнее

Номер записи: 8

05-07-2012 дата публикации

Apparatus and method for voice command recognition based on a combination of dialog models

Номер: US20120173244A1

Автор: Byung-kwan Kwak, Chi-youn PARK, Jeong-mi Cho, Jeong-Su Kim

Принадлежит: SAMSUNG ELECTRONICS CO LTD

Provided are a voice command recognition apparatus and method capable of figuring out the intention of a voice command input through a voice dialog interface, by combining a rule based dialog model and a statistical dialog model rule. The voice command recognition apparatus includes a command intention determining unit configured to correct an error in recognizing a voice command of a user, and an application processing unit configured to check whether the final command intention determined in the command intention determining unit comprises the input factors for execution of an application.

Подробнее

Номер записи: 9

19-07-2012 дата публикации

System and method of performing user-specific automatic speech recognition

Номер: US20120185237A1

Автор: Aaron Edward Rosenberg, Bojana Gajic, Richard Cameron Rose, Sarangarajan Parthasarathy, Shrikanth Sambasivan Narayanan

Принадлежит: AT&T Intellectual Property II LP

Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.

Подробнее

Номер записи: 10

23-08-2012 дата публикации

Hearing assistance system for providing consistent human speech

Номер: US20120215532A1

Автор: Edwin W. Foo, Gregory F. Hughes

Принадлежит: Apple Inc

Broadly speaking, the embodiments disclosed herein describe an apparatus, system, and method that allows a user of a hearing assistance system to perceive consistent human speech. The consistent human speech can be based upon user specific preferences.

Подробнее

Номер записи: 11

20-09-2012 дата публикации

Erroneous detection determination device, erroneous detection determination method, and storage medium storing erroneous detection determination program

Номер: US20120239394A1

Автор: Chikako Matsumoto

Принадлежит: Fujitsu Ltd

An erroneous detection determination device includes: a signal acquisition unit configured to acquire, from each of microphones, a plurality of audio signals relating to ambient sound including sound from a sound source in a certain direction; a result acquisition unit configured to acquire a recognition result including voice activity information indicating the inclusion of a voice activity relating to at least one of the audio signals; a calculation unit configured to calculate, for each of audio signals on the basis of the signals in respective unit times and the certain direction, a speech arrival rate representing the proportion of the sound from the certain direction to the ambient sound in each of the unit times; and an error detection unit configured to determine, on the basis of the recognition result and the speech arrival rate, whether or not the voice activity information is the result of erroneous detection.

Подробнее

Номер записи: 12

04-10-2012 дата публикации

System and method for rapid customization of speech recognition models

Номер: US20120253799A1

Автор: Diamantino Antonio CASEIRO, Mazin Gilbert, Patrick Haffner, Robert Bell, Srinivas Bangalore

Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.

Подробнее

Номер записи: 13

18-10-2012 дата публикации

Apparatus and method for processing voice command

Номер: US20120265536A1

Автор: Soon Kwon Paik, Sueng Wan Yang

Принадлежит: Hyundai Motor Co

Disclosed is a technique for processing voice commands. In particular, the disclose technique increases a voice recognition rate without performing a process of inputting separate voice commands by updating a voice command table based on interaction with a user by storing similar commands input by the user once those commands have been confirmed by the user as similar command.

Подробнее

Номер записи: 14

08-11-2012 дата публикации

Enhanced accuracy for speech recognition grammars

Номер: US20120284025A1

Автор: Harry Blanchard, Lan Zhang, Shankarnarayan Sivaprasad, Steven Lewis

Принадлежит: AT&T Intellectual Property II LP

Disclosed herein are methods and systems for recognizing speech. A method embodiment comprises comparing received speech with a precompiled grammar based on a database and if the received speech matches data in the precompiled grammar then returning a result based on the matched data. If the received speech does not match data in the precompiled grammar, then dynamically compiling a new grammar based only on new data added to the database after the compiling of the precompiled grammar The database may comprise a directory of names.

Подробнее

Номер записи: 15

29-11-2012 дата публикации

Number-assistant voice input system, number-assistant voice input method for voice input system and number-assistant voice correcting method for voice input system

Номер: US20120303368A1

Автор: Ting Ma

Принадлежит: Mitac International Corp

The present invention discloses a number-assistant voice input system, a number-assistant voice input method for a voice input system and a number-assistant voice correcting method for a voice input system, which apply software to drive a voice input system of an electronic device to provide a voice input logic circuit module. The voice input logic circuit module defines the pronunciation of numbers 1 to 26 as the paths to respectively input letters A to Z in the voice input system and allows users to selectively input or correct a letter by reading a number from 1 to 26 instead of a letter from A to Z.

Подробнее

Номер записи: 16

06-12-2012 дата публикации

Pattern processing system specific to a user group

Номер: US20120310647A1

Автор: Peter Beyerlein

Принадлежит: Nuance Communications Inc

Methods and apparatus for identifying a user group in connection with user group-based speech recognition. An exemplary method comprises receiving, from a user, a user group identifier that identifies a user group to which the user was previously assigned based on training data. The user group comprises a plurality of individuals including the user. The method further comprises using the user group identifier, identifying a pattern processing data set corresponding to the user group, and receiving speech input from the user to be recognized using the pattern processing data set.

Подробнее

Номер записи: 17

13-12-2012 дата публикации

Voice recognition grammar selection based on context

Номер: US20120316878A1

Автор: David Singleton, Debajit Ghosh

Принадлежит: Google LLC

The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.

Подробнее

Номер записи: 18

03-01-2013 дата публикации

Automatic Language Model Update

Номер: US20130006640A1

Автор: Michael H. Cohen, Pedro J. Moreno, Shumeet Baluja

Принадлежит: Google LLC

A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.

Подробнее

Номер записи: 19

07-02-2013 дата публикации

Apparatus and method for recognizing voice

Номер: US20130035938A1

Автор: Ho Young JUNG

Принадлежит: Electronics and Telecommunications Research Institute ETRI

The present invention includes a hierarchical search process. The hierarchical search process includes three steps. In a first step, a word boundary is determined using a recognition method of determining a following word dependent on a preceding word, and a word boundary detector. In a second step, word unit based recognition is performed in each area by dividing an input voice into a plurality of areas based on the determined word boundary. Finally, in a third step, a language model is applied to induce an optimal sentence recognition result with respect to a candidate word that is determined for each area. The present invention may improve the voice recognition performance, and particularly, the sentence unit based consecutive voice recognition performance.

Подробнее

Номер записи: 20

28-02-2013 дата публикации

Truly handsfree speech recognition in high noise environments

Номер: US20130054235A1

Автор: Jeff Rogers, Jonathan Shaw, Pieter J. Vermeulen, Todd F. Mozer

Принадлежит: Sensory Inc

Embodiments of the present invention improve content manipulation systems and methods using speech recognition. In one embodiment, the present invention includes a method comprising configuring a recognizer to recognize utterances in the presence of a background audio signal having particular audio characteristics. A composite signal comprising a first audio signal and a spoken utterance of a user is received by the recognizer, where the first audio signal comprises the particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal. The spoke utterance is recognized in the presence of the first audio signal when the spoken utterance is one of the predetermined utterances. An operation is performed on the first audio signal.

Подробнее

Номер записи: 21

04-04-2013 дата публикации

System and Method of Semi-Supervised Learning for Spoken Language Understanding Using Semantic Role Labeling

Номер: US20130085756A1

Автор: Chotimongkol Ananlada, Hakkani-Tur Dilek Z., Tur Gokhan

Принадлежит: AT&T Corp.

A system and method are disclosed for providing semi-supervised learning for a spoken language understanding module using semantic role labeling. The method embodiment relates to a method of generating a spoken language understanding module. Steps in the method comprise selecting at least one predicate/argument pair as an intent from a set of the most frequent predicate/argument pairs for a domain, labeling training data using mapping rules associated with the selected at least one predicate/argument pair, training a call-type classification model using the labeled training data, re-labeling the training data using the call-type classification model and iteratively several of the above steps until training set labels converge. 1. A method comprising:selecting an intent from a list of predicate/argument pairs associated with a spoken dialog system;labeling training data using mapping rules associated with the intent, wherein the mapping rules specify rules for selecting a call-type label for an utterance; and training the classification model using the training data; and', 're-labeling the training data using the classification model., 'while the training data and a classification model associated with the call-type label have a divergence above a threshold, iteratively2. The method of claim 1 , further comprising assigning the verbs “be” and “have” as special predicates.3. The method of claim 1 , further comprising distinguishing verbs from utterances which do not have a predicate by assigning the verbs to a special class.4. The method of claim 1 , wherein the method is semi-supervised.5. The method of claim 1 , further comprising capturing infrequent call types using an active-learning approach.6. The method of claim 1 , wherein the selecting of the intent is performed independent of a domain.7. The method of claim 1 , wherein the mapping rules specify that the call-type is represented by multiple predicate/argument pairs.8. A system comprising:a processor; and ...

Подробнее

Номер записи: 22

11-04-2013 дата публикации

PHONOLOGICALLY-BASED BIOMARKERS FOR MAJOR DEPRESSIVE DISORDER

Номер: US20130090927A1

Автор: Malyska Nicolas, Quatieri Thomas Francis, Trevino Andrea Carolina

Принадлежит: Massachusetts Institute of Technology

A system and a method for assessing a condition in a subject. Phones from speech of the subject are recognized, one or more prosodic or speech-excitation-source features of the phones are extracted, and an assessment of a condition of the subject, is generated based on a correlation between the features of the phones and the condition. 1. A method of assessing a condition in a subject , the method comprising:recognizing phones from speech of the subject;extracting one or more prosodic or speech-excitation-source features of the phones from the speech of the subject; andgenerating an assessment of a condition of the subject, based on a correlation between the one or more features of the phones and the condition.2. The method of claim 1 , wherein the condition has a measure and the correlation is between the measure and the one or more features of the phones.3. The method of claim 1 , wherein the speech of the subject is running speech.4. The method of claim 1 , wherein the phones are recognized automatically.5. The method of claim 1 , wherein the condition is selected from traumatic brain injury claim 1 , post-traumatic stress disorder claim 1 , Parkinson's disease claim 1 , Aphasia claim 1 , Autism claim 1 , Alzeimer's disease claim 1 , stroke claim 1 , sleep disorders claim 1 , anxiety disorders claim 1 , multiple sclerosis claim 1 , cerebral palsy claim 1 , and major depressive disorder (MDD).6. The method of claim 2 , wherein the condition is MDD and the measure is a Hamilton-D (HAMD) score or a subscore thereof.7. The method of claim 1 , wherein the one or more features of the phones extracted from speech of the subject are selected from one or more of duration of phones claim 1 , energy of phones claim 1 , pitch of phones claim 1 , aspiration of phones claim 1 , glottal flow of phones or frequency-dependent energy of phones.8. The method of claim 1 , wherein the one or more features of the phones extracted from speech of the subject include durations of one or ...

Подробнее

Номер записи: 23

11-04-2013 дата публикации

SYSTEM AND METHOD FOR PROCESSING SPEECH RECOGNITION

Номер: US20130090928A1

Автор: NARAYANAN SHRIKANTH SAMBASIVAN, Parthasarathy Sarangarajan, Rose Richard C., Rosenberg Aaron Edward

Принадлежит: AT&T INTELLECTUAL PROPERTY II, L.P.

An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device. 1. A method comprising:receiving a speech utterance from a client device, the speech utterance spoken by a user;identifying a location associated with the speech utterance;determining a probability associated with the user and the location, the probability being an expected type of background noise for the user at the location; andbased on the probability, applying a background noise model to recognize the speech utterance.2. The method of claim 1 , wherein the client device is a mobile device.3. The method of claim 1 , wherein determining the probability is based on accessing a stored series of background noises from a background environment associated with the user.4. The method of claim 1 , wherein applying the background noise model comprises compensating a speech recognition model with the background noise model.5. The method of claim 1 , wherein determining the probability comprises accessing a profile of the user.6. The method of claim 1 , wherein applying the background noise model is performed based on characteristics of the client device.7. The method of claim 1 , further comprising:identifying a time associated with the speech utterance; anddetermining the ...

Подробнее

Номер записи: 24

02-05-2013 дата публикации

ENABLING SPEECH WITHIN A MULTIMODAL PROGRAM USING MARKUP

Номер: US20130110517A1

Автор: Cross, JR. Charles W., Wilson Leslie R., Woodward Steven G.

Принадлежит: NUANCE COMMUNICATIONS, INC.

A method for speech enabling an application can include the step of specifying a speech input within a speech-enabled markup. The speech-enabled markup can also specify an application operation that is to be executed responsive to the detection of the speech input. After the speech input has been defined within the speech-enabled markup, the application can be instantiated. The specified speech input can then be detected and the application operation can be responsively executed in accordance with the specified speech-enabled markup. 1. A method for speech enabling an application comprising the steps of:specifying a speech input with a speech-enabled markup;defining within said speech-enabled markup at least one operation of an application that is to be executed upon a detection of said specified speech input;after said defining step, instantiating said application;detecting said specified speech input; andexecuting said application operation responsive to said detecting step.2. The method of claim 1 , wherein said application is a multimodal Web browser.3. The method of claim 1 , further comprising the steps of:providing a speech-enabled markup interpreter within an operating system upon which said application executes, wherein said speech-enabled markup interpreter is used to detect said speech input and responsively initiate said application operation.4. The method of claim 3 , further comprising the steps of:rendering a Web page within said application, wherein said Web page includes speech-enabled markup for at least one element of said Web page, and wherein said speech-enabled markup interpreter speech-enables said Web page element.5. The method of claim 1 , further comprising the steps of:associating said speech-enabled markup with a graphical user interface element of said application;determining that said graphical user interface element receives focus; andresponsive to said determination, activating said speech-enabled markup so that said application ...

Подробнее

Номер записи: 25

02-05-2013 дата публикации

Active Input Elicitation by Intelligent Automated Assistant

Номер: US20130110518A1

Автор: Bastea-Forte Marcello, Brigham Christopher Dean, Cheyer Adam John, Giuli Richard Donald, Gruber Thomas Robert, Guzzoni Didier Rene, Saddler Harry Joseph

Принадлежит: Apple Inc.

Methods, systems, and computer readable storage medium related to operating an intelligent automated assistant are disclosed. A user request is received through a conversation interface of the intelligent automated assistant, the user request including at least a speech input received from a user. One or more candidate domains relevant to the user request are identified from a plurality of predefined domains, where each predefined domain presents a respective area of service offered by the intelligent automated assistant, and the identifying is based on respective degrees of match between words derived from the user request and words representing vocabulary and entities associated with each predefined domain. Feedback is provided to the user through the conversation interface of the intelligent automated assistant, where the feedback presents a paraphrase of the user request and elicits additional input from the user to specify one or more parameters associated with a particular candidate domain. 1. A method for operating an intelligent automated assistant , comprising: receiving a user request through a conversation interface of the intelligent automated assistant, the user request comprising at least a speech input received from a user;', 'identifying one or more candidate domains relevant to the user request from a plurality of predefined domains, wherein each predefined domain presents a respective area of service offered by the intelligent automated assistant, and wherein the identifying is based on respective degrees of match between words derived from the user request and words representing vocabulary and entities associated with each predefined domain; and', 'providing feedback to the user through the conversation interface of the intelligent automated assistant, wherein the feedback presents a paraphrase of the user request and elicits additional input from the user to specify one or more parameters associated with a particular candidate domain., 'at an ...

Подробнее

Номер записи: 26

02-05-2013 дата публикации

Intent Deduction Based on Previous User Interactions with Voice Assistant

Номер: US20130110520A1

Автор: Brigham Christopher Dean, Cheyer Adam John, Gruber Thomas Robert, Guzzoni Didier Rene

Принадлежит: Apple Inc.

Methods, systems, and computer readable storage medium related to operating an intelligent digital assistant are disclosed. A text string is obtained from a speech input received from a user. Information is derived from a communication event that occurred at the electronic device prior to receipt of the speech input. The text string is interpreted to derive a plurality of candidate interpretations of user intent. One of the candidate user intents is selected based on the information relating to the communication event. 1. A method for operating an automated assistant , comprising: obtaining a text string from a speech input received from a user;', 'deriving information from a communication event that occurred at the electronic device prior to receipt of the speech input;', 'interpreting the text string to derive a plurality of candidate interpretations of user intent; and', 'selecting one of the candidate user intents based on the information relating to the communication event., 'at an electronic device comprising a processor and memory storing instructions for execution by the processor2. The method of claim 1 , wherein the information includes a name of a person that is associated with the communication event.3. The method of claim 2 , wherein the text string includes a pronoun claim 2 , and wherein selecting one of the candidate user intents comprises determining that the pronoun refers to the person.4. The method claim 3 , wherein selecting the candidate user intent includes determining whether the candidate user intent satisfies a predetermined confidence threshold.5. The method of claim 1 , wherein the communication event is selected from the group consisting of:a telephone call;an email; anda text message.6. A system for operating an intelligent automated assistant claim 1 , comprising:one or more processors; and obtaining a text string from a speech input received from a user;', 'deriving information from a communication event that occurred at the ...

Подробнее

Номер записи: 27

16-05-2013 дата публикации

Providing Programming Information in Response to Spoken Requests

Номер: US20130124205A1

Автор: Genly Christopher H.

Принадлежит:

A system allows a user to obtain information about television programming and to make selections of programming using conversational speech. The system includes a speech recognizer that recognizes spoken requests for television programming information. A speech synthesizer generates spoken responses to the spoken requests for television programming information. A user may use a voice user interface as well as a graphical user interface to interact with the system to facilitate the selection of programming choices. 1. A computer executed method comprising:receiving an utterance related to a television selection;identifying, in the utterance, a where variable that identifies a requested action;identifying, in said utterance, a select variable that identifies an object of the action;storing any identified select or where variables in a structural history;if the utterance includes both a select and a where variable, processing the utterance without using structural history; andif one of the a select or where variable is missing from the utterance, using structural history to derive the missing variable.2. The method of including providing conversational speech recognition.3. The method of including providing a graphical user interface which generates information in a visual form about television programming and a voice user interface which responds to voice requests from the user claim 2 , and communicating the focus of one of said interfaces to the other of said interface.4. The method of including storing an indication when a generated response includes a recognized attribute from the spoken request.5. The method of including parsing a select variable and a where variable from a spoken request.6. The method of including storing meanings derived from current and historical requests and using the historical requests to supplement the meaning derived from said current requests.7. The method of including parsing and storing time attributes in a request.8. The method of ...

Подробнее

Номер записи: 28

16-05-2013 дата публикации

REAL-TIME DISPLAY OF SYSTEM INSTRUCTIONS

Номер: US20130124208A1

Автор: Danieli Oren, KOSTEPEN HAKAN, Odinak Gilad

Принадлежит: Intellisist, Inc.

A system and method for reviewing inputted voice instructions in a vehicle-based telematics control unit. The system includes a microphone, a speech recognition processor, and an output device. The microphone receives voice instructions from a user. Coupled to the microphone is the speech recognition processor that generates a voice signal by performing speech recognition processing of the received voice instructions. The output device outputs the generated voice signal to the user, The system also includes a user interface for allowing the user to approve the outputted voice signal, and a communication component for wirelessly sending the generated voice signal to a server over a wireless network upon approval by the user. 1. A method for reviewing inputted voice instructions in a vehicle-based telematics control unit , the method comprising:recording voice instructions from a user;generating a voice signal by performing speech recognition of the recorded voice instructions; andoutputting the generated voice signal over an output device associated with the telematics control unit for review.2. The method of claim 1 , further comprising wirelessly sending at least one of the generated voice signal or the inputted voice instructions to a server over a wireless network upon approval by a user.3. The method of claim 1 , further comprising:generating a digest including the generated voice signals;sending the digest to a human operator system; andconnecting the human operator system to the telematics control unit.4. The method of claim 1 , wherein outputting comprises generating and displaying text based on the generated voice signal.5. The method of claim 1 , wherein outputting comprises generating and outputting voice based on the generated voice signal.6. A system for reviewing inputted voice instructions in a vehicle-based telematics control unit claim 1 , the system comprising:a microphone for receiving voice instructions from a user;a speech recognition processor ...

Подробнее

Номер записи: 29

16-05-2013 дата публикации

SYSTEM AND METHOD FOR ENHANCED COMMUNICATIONS VIA SMALL DATA RATE COMMUNICATION SYSTEMS

Номер: US20130124211A1

Автор: McDonough John G.

Принадлежит: SHORTHAND MOBILE, INC.

A system and method for interacting with an interactive communication system include processing a profile associated with an interactive communication system; generating a user interface based on the processing of the profile to solicit a user response correlating to a response required by the interactive communication system; receiving the user response via the user interface; updating the user interface using the profile based on the user response; and sending a signal to the interactive communication system based on one or more user responses. 1processing a profile associated with an interactive communication system;generating a user interface based on the processing of the profile to solicit a user response correlating to a response required by the interactive communication system;receiving the user response via the user interface;updating the user interface using the profile based on the user response; andsending a signal to the interactive communication system based on one or more user responses.. A method for interacting with an interactive communication system comprising: This application is a continuation of U.S. patent application Ser. No. 12/122,619, filed May 16, 2008, which claims the benefit of priority of U.S. Provisional Pat. App. No. 60/938,969, filed May 18, 2007, entitled “System and Method For Communicating With Text Messaging Systems” and U.S. Provisional Pat. App. No. 60/938,965, filed May 18, 2007, entitled “System and Method for Communicating with Interactive Service Systems” all of which are hereby incorporated by reference.1. Field of the InventionThe present invention relates to communication with interactive service systems, such as service systems that use short message service (SMS), interactive voice response (IVR) systems, and websites or other data systems.2. Related ArtMany companies currently use interactive service systems, such as text messaging systems and IVR systems for various tasks as a first line of customer support and ...

Подробнее

Номер записи: 30

23-05-2013 дата публикации

VOICE ACTIVITY SEGMENTATION DEVICE, VOICE ACTIVITY SEGMENTATION METHOD, AND VOICE ACTIVITY SEGMENTATION PROGRAM

Номер: US20130132078A1

Автор: Arakawa Takayuki, TANAKA Daisuke

Принадлежит: NEC Corporation

Provided is a noise-robust voice activity segmentation device which updates parameters used in the determination of voice-active segments without burdening the user, and also provided are a voice activity segmentation method and a voice activity segmentation program. 1. A voice activity segmentation device comprising:a first voice activity segmentation unit which determines a voice-active segment, which is a first voice-active segment, and a voice-inactive segment, which is a first voice-inactive segment, in a time-series of input sound by comparing a threshold value and a feature value of the time-series of the input sound;a second voice activity segmentation unit which determines, after a reference speech acquired from a reference speech storage unit has been superimposed on a time-series of the first voice-inactive segment, a voice-active segment and a voice-inactive segment in the time-series of the superimposed first voice-inactive segment by comparing the threshold value and a feature value of the time-series of the superimposed first voice-inactive segment; anda threshold value update unit which updates the threshold value in such a way that a discrepancy rate between the determination result of the second voice activity segmentation unit and a correct segmentation calculated from the reference speech is decreased.2. The voice activity segmentation device according to further comprising:a gain and frequency characteristic correction unit which corrects a gain or a frequency characteristic of the reference speech, which is superimposed in the first voice-inactive segment, by use of at least either a gain or a frequency characteristic, which is acquired from the time-series of the input sound in the first voice-active segment, so that the gain or the frequency characteristic of the reference speech is equal to the gain or the frequency characteristic respectively, which is acquired from the time-series of the input sound in the first voice-active segment.3. The ...

Подробнее

Номер записи: 31

23-05-2013 дата публикации

System and method for crowd-sourced data labeling

Номер: US20130132080A1

Автор: Barbara B. Hollister, Ilya Dan MELAMED, Jason Williams, Tirso Alonso

Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until at least one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses.

Подробнее

Номер записи: 32

23-05-2013 дата публикации

GENERIC FRAMEWORK FOR LARGE-MARGIN MCE TRAINING IN SPEECH RECOGNITION

Номер: US20130132083A1

Автор: Acero Alejandro, Deng Li, He Xiaodong, Yu Dong

Принадлежит: MICROSOFT CORPORATION

A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met. 1. A method of training an acoustic model in a speech recognition system , comprising:utilizing a training corpus, having training tokens, to calculate an initial acoustic model;computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes;calculating a sample-adaptive window bandwidth for each training token;determining a value for a loss function based on the computed scores and the calculated sample-adaptive window bandwidth for each training token;updating parameters in the current acoustic model to create a revised acoustic model based upon the loss value; andoutputting the revised acoustic model.2. The method of and further comprising:deriving the loss function from a Bayesian viewpoint.3. The method of wherein deriving the loss function from a Bayesian viewpoint further comprises utilizing a margin-free Bayes risk function.4. The method of wherein deriving the loss function from a Bayesian viewpoint further ...

Подробнее

Номер записи: 33

23-05-2013 дата публикации

METHODS AND SYSTEMS FOR ADAPTING GRAMMARS IN HYBRID SPEECH RECOGNITION ENGINES FOR ENHANCING LOCAL SR PERFORMANCE

Номер: US20130132086A1

Автор: Feng Zhe, Weng Fuliang, Xu Kui

Принадлежит:

A speech recognition method includes providing a processor communicatively coupled to each of a local speech recognition engine and a server-based speech recognition engine. A first speech input is inputted into the server-based speech recognition engine. A first recognition result from the server-based speech recognition engine is received at the processor. The first recognition result is based on the first speech input. The first recognition result is stored in a memory device in association with the first speech input. A second speech input is inputted into the local speech recognition engine. The first recognition result is retrieved from the memory device. A second recognition result is produced by the local speech recognition engine. The second recognition result is based on the second speech input and is dependent upon the retrieved first recognition result. 1. A speech recognition method , comprising the steps of:providing a processor communicatively coupled to each of a local speech recognition engine and a server-based speech recognition engine;inputting a first speech input into the server-based speech recognition engine;receiving at the processor a first recognition result from the server-based speech recognition engine, the first recognition result being based on the first speech input;storing the first recognition result in a memory device, the first recognition result being stored in association with the first speech input;inputting a second speech input into the local speech recognition engine;retrieving the first recognition result from the memory device; andproducing a second recognition result by the local speech recognition engine, the second recognition result being based on the second speech input and being dependent upon the retrieved first recognition result.2. The method of claim 1 , comprising the further step of receiving at the processor a confidence score from the server-based speech recognition engine claim 1 , the confidence score ...

Подробнее

Номер записи: 34

23-05-2013 дата публикации

Voice Data Retrieval System and Program Product Therefor

Номер: US20130132090A1

Автор: KANDA Naoyuki

Принадлежит: Hitachi, Ltd.

A voice data retrieval system including an inputting device of inputting a keyword, a phoneme converting unit of converting the inputted keyword in a phoneme expression, a voice data retrieving unit of retrieving a portion of a voice data at which the keyword is spoken based on the keyword in the phoneme expression, a comparison keyword creating unit of creating a set of comparison keywords having a possibility of a confusion of a user in listening to the keyword based on a phoneme confusion matrix for each user, and a retrieval result presenting unit of presenting a retrieval result from the voice data retrieving unit and the comparison keyword from the comparison keyword creating unit to a user. 1. A voice data retrieval system comprising:an inputting device of inputting a keyword;a phoneme converting unit of converting the inputted keyword in a phoneme expression;a voice data retrieving unit of retrieving a portion of a voice data at which the keyword is spoken based on the keyword in the phoneme expression;a comparison keyword creating unit of creating a set of comparison keywords separately from the keyword having a possibility of a confusion of a user in listening to the keyword based on the keyword in the phoneme expression; anda retrieval result presenting unit of presenting a retrieval result from the voice data retrieving unit and the comparison keyword from the comparison keyword creating unit to the user.2. The voice data retrieval system according to claim 1 , further comprising:a phoneme confusion matrix for each user;wherein the comparison keyword creating unit creates the comparison keyword based on the phoneme confusion matrix.3. The voice data retrieval system according to claim 2 , further comprising:a language information inputting unit of inputting a piece of information of a language which the user can understand; anda phoneme confusion matrix creating unit of creating the phoneme confusion matrix based on the piece of information provided from ...

Подробнее

Номер записи: 35

23-05-2013 дата публикации

Systems and Techniques for Producing Spoken Voice Prompts

Номер: US20130132096A1

Автор: Drane Alexandra, Krull Ivy, Lavoie Lisa, Merrow Lucas, Rizzo Frank

Принадлежит: Eliza Corporation

Methods and systems are described in which spoken voice prompts can be produced in a manner such that they will most likely have the desired effect, for example to indicate empathy, or produce a desired follow-up action from a call recipient. The prompts can be produced with specific optimized speech parameters, including duration, gender of speaker, and pitch, so as to encourage participation and promote comprehension among a wide range of patients or listeners. Upon hearing such voice prompts, patients/listeners can know immediately when they are being asked questions that they are expected to answer, and when they are being given information, as well as the information that considered sensitive. 1. A method of producing spoken voice prompts for telephony-based informational interaction , the method comprising:for one or more voice prompts, determining words that receive an optimized speech parameter, based on context and/or meaning of the text of the one or more voice prompts;recording the one or more voice prompts, producing one or more spoken voice prompts; andconveying the one or more spoken voice prompts to a listener over a telephone system.2. The method of claim 1 , further comprising determining the number of words that receive an optimized speech parameter based on context and/or meaning of the one or more voice prompts;3. The method of claim 1 , wherein the optimized speech parameter comprises one or more pitch accents.4. The method of claim 3 , wherein the one or more pitch accents yield a pause lengthening pattern.5. The method of claim 3 , wherein the one or more pitch accents comprise a phrase-final lengthening pattern.6. The method of claim 3 , further comprising one or more boundary tones claim 3 , wherein the one or more pitch accents and boundary tones comprise a defined intonation pattern.7. The method of claim 6 , wherein the defined intonation pattern comprises specific rises or falls of the fundamental frequency of a spoken prompt.8. The ...

Подробнее

Номер записи: 36

30-05-2013 дата публикации

Speech recognition apparatus based on cepstrum feature vector and method thereof

Номер: US20130138437A1

Автор: Hoon-young Cho, Sanghun Kim, Youngik Kim

Принадлежит: Electronics and Telecommunications Research Institute ETRI

A speech recognition apparatus, includes a reliability estimating unit configured to estimate reliability of a time-frequency segment from an input voice signal; and a reliability reflecting unit configured to reflect the reliability of the time-frequency segment to a normalized cepstrum feature vector extracted from the input speech signal and a cepstrum average vector included for each state of an HMM in decoding. Further, the speech recognition apparatus includes a cepstrum transforming unit configured to transform the cepstrum feature vector and the average vector through a discrete cosine transformation matrix and calculate a transformed cepstrum vector. Furthermore, the speech recognition apparatus includes an output probability calculating unit configured to calculate an output probability value of time-frequency segments of the input speech signal by applying the transformed cepstrum vector to the cepstrum feature vector and the average vector.

Подробнее

Номер записи: 37

30-05-2013 дата публикации

VOICE-SCREEN ARS SERVICE SYSTEM, METHOD FOR PROVIDING SAME, AND COMPUTER-READABLE RECORDING MEDIUM

Номер: US20130138443A1

Автор: Kim David, KIM Yong Jin

Принадлежит: CALL GATE CO., LTD.

A method for providing a voice-screen ARS service on a terminal, according to an embodiment of the present invention, uses an application installed on the terminal to connect to an IVR system of a client company via a voice call and connects a data call to a VARS service server. Menu information including a plurality of menu items related to a client is received through a data call and displayed on a screen and voice information related to the menu is received through a voice call and output in audio. Accordingly, when a user uses the ARS, both services of voice and onscreen information are simultaneously provided and thereby decreases the limitations and inaccuracies of provided voice information increases user convenience. 1. A method for providing a voice-screen ARS (automatic response system) service in a terminal , the method comprising the steps of:providing a connection means for allowing a connection to be made to a client company IVR (interactive voice response) system;when a user makes a request for a connection to the client company IVR system by using the connection means, connecting a voice call to the client company IVR system through an Internet network or a mobile communication network and connecting a data call to a VARS (visual ARS) service server;receiving menu information including a plurality of menu items from the VARS service server through the data call and displaying the received menu information including the menu items;transmitting information on a menu item selected by the user from among the displayed menu items to the VARS service server through the data call and to the client company IVR system through the voice call at the same time;receiving screen information corresponding to the selected menu item from the VARS service server, and displaying the received screen information; andreceiving voice information corresponding to the selected menu item from the client company IVR system, and outputting the received voice information.2. ( ...

Подробнее

Номер записи: 38

30-05-2013 дата публикации

MODIFICATION OF OPERATIONAL DATA OF AN INTERACTION AND/OR INSTRUCTION DETERMINATION PROCESS

Номер: US20130138444A1

Автор: George Michael

Принадлежит: SANOFI-AVENTIS DEUTSCHLAND GMBH

It is inter alia disclosed to perform at least one of operating an interaction process with a user of the medical apparatus and determining, based on a representation of at least one instruction given by the user, at least one instruction operable by the medical apparatus. Therein, the at least one of the operating and the determining at least partially depends on operational data. It is further disclosed to receive modification information for modifying at least a part of the operational data, wherein the modification information is at least partially determined based on an analysis of a representation of at least one instruction given by the user. 115-. (canceled)16. A medical apparatus , comprising:a processor configured to perform at least one of operating an interaction process with a user of said medical apparatus and determining, based on a respective representation of at least one instruction given by said user, at least one instruction operable by said medical apparatus, wherein said at least one of said operating an interaction process and said determining at least one instruction at least partially depends on operational data;a communication unit configured to receive modification information for modifying at least a part of said operational data, said modification information at least partially determined based on an analysis of a respective representation of at least one instruction given by a user.17. The medical apparatus according to claim 16 , wherein said at least one instruction is given acoustically by said user and wherein said determining is at least partially based on speech recognition of said respective representation of said at least one instruction given by said user.18. The medical apparatus according to claim 17 , wherein said speech recognition at least partially depends on said operational data claim 17 , and wherein at least a part of said modification information is determined to improve said speech recognition with respect to said ...

Подробнее

Номер записи: 39

13-06-2013 дата публикации

SYSTEM AND METHOD FOR STANDARDIZED SPEECH RECOGNITION

Номер: US20130151252A1

Автор: Ljolje Andrej, Renger Bernard S., Tischer Steven Neil

Принадлежит: AT&T Intellectual Property I, L.P.

Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure. 1. A method comprising:receiving speech from a user;determining, via a processor, to apply one of supervised training and unsupervised training; and determining whether available data are sufficient to build a new speech recognition model;', 'when the available data is sufficient to build the new speech recognition model, building the new speech recognition model using the available data; and', selecting an existing speech recognition model; and', 'generating an adapted speech recognition model based on transformations generated from the existing speech recognition model based on the speech and associated transcriptions., 'when the available data is not sufficient to build the new speech recognition model], 'when supervised training is selected2. The method of claim 1 , wherein the new speech recognition model claim 1 , the existing speech recognition model and the adapted speech recognition model are standardized speech models.3. The method of claim 1 , wherein one of the new speech recognition ...

Подробнее

Номер записи: 40

13-06-2013 дата публикации

System and Method for Targeted Tuning of a Speech Recognition System

Номер: US20130151253A1

Автор: Bushey Robert R., Knott Benjamin Anthony, Martin John Mills

Принадлежит: AT&T Intellectual Property I, L.P. (formerly known as SBC Knowledge Ventures, L.P.)

A system and method of targeted tuning of a speech recognition system are disclosed. A particular method includes detecting that a frequency of occurrence of a particular type of utterance satisfies a threshold. The method further includes tuning a speech recognition system with respect to the particular type of utterance. 1. A method comprising:detecting that a frequency of occurrence of a particular type of utterance satisfies a threshold; andin response to detecting that the frequency satisfies the threshold, tuning a speech recognition system with respect to the particular type of utterance.2. The method of claim 1 , further comprising determining the frequency based on a group of received utterances.3. The method of claim 1 , wherein the threshold is determined by a system administrator.4. The method of claim 1 , wherein the threshold is user programmable.5. The method of claim 1 , wherein tuning the speech recognition system includes inputting a collection of utterances of the particular type of utterance into a learning module of the speech recognition system.6. The method of claim 5 , wherein inputting the collection of utterances includes playing one or more files that represent recordings of the particular type of utterance.7. The method of claim 1 , wherein system recognition of the particular type of utterance is dependent on a particular speaker.8. The method of claim 1 , wherein system recognition of the particular type of utterance is independent of a particular speaker.9. The method of claim 1 , wherein the utterance is one of a single word spoken by a speaker claim 1 , a phrase spoken by the speaker claim 1 , or a sentence spoken by the speaker.10. The method of claim 1 , wherein the utterance corresponds to a request that indicates an action to be taken on an object.11. The method of claim 10 , wherein the request is one of a request to pay a bill claim 10 , a request for an account balance claim 10 , a request to change services claim 10 , a ...

Подробнее

Номер записи: 41

20-06-2013 дата публикации

VOICE RECOGNITION APPARATUS AND NAVIGATION SYSTEM

Номер: US20130158999A1

Автор: ISHII Jun, Maruta Yuzo

Принадлежит: Mitsubishi Electric Corporation

A voice recognition apparatus creates a voice recognition dictionary of words which are cut out from address data constituting words that are a voice recognition target, and which have an occurrence frequency not less than a predetermined value, compares a time series of acoustic features of an input voice with the voice recognition dictionary, selects the most likely word string as the input voice from the voice recognition dictionary, carries out partial matching between the selected word string and the address data, and outputs the word that partially matches as a voice recognition result. 13.-. (canceled)4. A voice recognition apparatus comprising:an acoustic analyzer unit for carrying out acoustic analysis of an input voice signal to convert the input voice signal to a time series of acoustic features;a vocabulary storage unit for recording words which are a voice recognition target;a dictionary storage unit for storing a voice recognition dictionary composed of a prescribed category of words;an acoustic data matching unit for comparing the time series of acoustic features of the input voice acquired by the acoustic analyzer unit with the voice recognition dictionary read out of the dictionary storage unit, and for selecting a most likely word string as the input voice from the voice recognition dictionary; anda partial matching unit for carrying out partial matching between the word string selected by the acoustic data matching unit and the words the vocabulary storage unit stores, and for selecting as a voice recognition result a word that partially matches to the word string selected by the acoustic data matching unit from among the words the vocabulary storage unit stores.5. The voice recognition apparatus according to claim 4 , wherein the prescribed category of words is a numeral.6. The voice recognition apparatus according to claim 4 , further comprising:a garbage model storage unit for storing a garbage model; anda recognition dictionary creating unit ...

Подробнее

Номер записи: 42

27-06-2013 дата публикации

AUTOMATIC DISCLOSURE DETECTION

Номер: US20130166293A1

Автор: Kim Yeon-Jun, Ljolje Andrej, MELAMED I. Dan, RENGER Bernard, Smith David J.

Принадлежит: AT&T Intellectual Property I, L.P.

A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication. 1. A method of detecting pre-determined phrases to determine compliance quality , comprising:specifying a plurality of pre-determined phrases in association with an event and a precursor event;receiving audible input from a sender and a recipient in a communication over a communications network;determining by a processor whether the event has occurred based on the communication;if the event has not occurred, determining whether a trigger phrase associated with the precursor event is present in the audible input received;if the trigger phrase associated with the precursor event is present, determining whether a pre-determined phrase of the plurality of pre-determined phrases associated with the precursor event is present in the-audible input received; andrating the recipient based on a presence of the pre-determined phrase associated with the precursor event in the communication.2. The method of claim 1 , further comprising:if the event has occurred, determining whether a pre-determined phrase of the plurality of pre-determined phrases associated with the event is present in the audible input received by selecting a pre-determined phrase of the plurality of pre-determined phrases associated with the event, comparing the pre-determined phrase selected with a transcript of the audible input received, and determining whether the pre-determined phrase selected is present in the transcript.3. The method of claim 1 ,wherein the determining whether a ...

Подробнее

Номер записи: 43

27-06-2013 дата публикации

Frame Erasure Concealment Technique for a Bitstream-Based Feature Extractor

Номер: US20130166294A1

Автор: Hong Kook Kim, Richard Vandervoort Cox

Принадлежит: AT&T Intellectual Property II LP

A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Подробнее

Номер записи: 44

27-06-2013 дата публикации

Discriminative Training of Document Transcription System

Номер: US20130166297A1

Автор: Fritsch Juergen, Mathias Lambert, Yegnanarayanan Girija

Принадлежит: MULTIMODAL TECHNOLOGIES, LLC

A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript. 1. In a system including a first document containing at least some information in common with a spoken audio stream , a method comprising steps of:(A) identifying text in the first document representing a concept having a plurality of spoken forms;(B) replacing the identified text with a context-free grammar specifying the plurality of spoken forms of the concept to produce a second document;(C) generating a first language model based on the second document;(D) using the first language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a third document;(E) filtering text from the third document by reference to the second document to produce a filtered document in which text filtered from the third document is marked as unreliable; and (F)(1) applying a first speech recognition process to the spoken audio stream using a set of base acoustic models and a grammar network based on the filtered document to produce a first set of recognition structures;', '(F)(2) applying a second speech recognition process to the spoken audio stream ...

Подробнее

Номер записи: 45

11-07-2013 дата публикации

SPEECH RECOGNITION APPARATUS

Номер: US20130179154A1

Автор: OKUNO Hiroyuki

Принадлежит: Denso Corporation

A speech recognition apparatus includes a first recognition dictionary, a speech input unit, a speech recognition unit, a speech transmission unit, a recognition result receipt unit, and a control unit. The speech recognition unit recognizes a speech based on a first recognition dictionary, and outputs a first recognition result. A server recognizes the speech based on a second recognition dictionary, and outputs a second recognition result. The control unit determines a likelihood level of a selected candidate obtained based on the first recognition result, and accordingly controls an output unit to output at least one of the first recognition result and the second recognition result. When the likelihood level of the selected candidate is equal to or higher than a threshold level, the control unit controls the output unit to output the first recognition result irrespective of whether the second recognition result is received from the server. 1. A speech recognition apparatus comprising:a first recognition dictionary that stores a plurality of first phoneme strings, which are respectively converted from a plurality of text data;a speech input unit that inputs a speech made by a user;a speech recognition unit that recognizes the speech based on the first recognition dictionary and outputs a first recognition result;a speech transmission unit that transmits the speech to a server, the server including a second recognition dictionary that stores a plurality of second phoneme strings respectively converted from the plurality of text data, the server recognizing the speech based on the second recognition dictionary and outputting a second recognition result;a recognition result receipt unit that receives the second recognition result from the server; anda control unit that determines a likelihood level of a selected candidate obtained based on the first recognition result, and controls an output unit to output at least one of the first recognition result and the second ...

Подробнее

Номер записи: 46

11-07-2013 дата публикации

METHOD AND APPARATUS FOR EXECUTING A USER FUNCTION USING VOICE RECOGNITION

Номер: US20130179173A1

Автор: Lee Dongyeol, PARK Sehwan

Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

A method and an apparatus for executing a user function using voice recognition. The method includes displaying a user function execution screen; confirming a function to be executed according to voice input; displaying a voice command corresponding to the confirmed function on the user function execution screen; recognizing a voice input by a user, while a voice recognition execution request is continuously received; and executing the function associated with the input voice command, when the recognized voice input is at least one of the displayed voice command. 1. A method for executing a user function by an electronic device using voice recognition , the method comprising:displaying a user function execution screen;confirming a function to be executed according to voice input;displaying a voice command corresponding to the confirmed function on the user function execution screen;recognizing a voice input by a user, while a voice recognition execution request is continuously received; andexecuting the function associated with the input voice command, when the recognized voice input is at least one of the displayed voice command.2. The method of claim 1 , wherein the voice command is displayed around an image component of the user function execution screen or in a blanket of the user function execution screen.3. The method of claim 1 , wherein the voice command is displayed around an image component associated with a function corresponding to the voice command.4. The method of claim 1 , wherein the voice command is displayed around a mounted location of a key input unit generating a key input event claim 1 , when a function executed according to the voice input is a function executed by the key input event.5. The method of claim 1 , further comprises determining whether the voice input by the user corresponds to at least one of the displayed voice command.6. The method of claim 1 , wherein the function includes one of a function executed when a touch event and a ...

Подробнее

Номер записи: 47

18-07-2013 дата публикации

NOISE REDUCTION METHOD. PROGRAM PRODUCT AND APPARATUS

Номер: US20130185067A1

Автор: Ichikawa Osamu, Rennie Steven

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

A probability model represented as the product of the probability distribution of a mismatch vector g (or clean speech x) with an observed value y as a factor and the probability distribution of a mismatch vector g (or clean speech x) with a confidence index β for each band as a factor, executes MMSE estimation on the probability model, and estimates a clean speech estimated value x̂. As a result, each band influences the result of MMSE estimation, with a degree of contribution in accordance with the level of its confidence. Further, the higher the S/N ratio of observation speech, the more the output value becomes shifted to the observed value. As a result, the output of a front-end is optimized. 1. A noise reduction method comprising:a step of generating a confidence index for each band on the basis of a spectrum of observation speech;a step of generating a probability model represented as a mixture multi-dimensional normal distribution having a dimension for each band, each normal distribution being represented as a product of a first normal distribution and a second normal distribution; anda step of estimating a mismatch vector estimated value by executing MMSE estimation on the probability model, and deriving a clean speech estimated value on the basis of the mismatch vector estimated value,wherein the first normal distribution is a probability distribution of a mismatch vector generated based on the observation speech, andwherein the second normal distribution has a zero mean, and a variance defined as a function that outputs a smaller value as the confidence index becomes greater.2. A noise reduction method comprising:a step of generating a confidence index for each band on the basis of a spectrum of observation speech;a step of generating a probability model represented as a mixture multi-dimensional normal distribution having a dimension for each band, each normal distribution being represented as a product of a first normal distribution and a second normal ...

Подробнее

Номер записи: 48

18-07-2013 дата публикации

SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD AND PROGRAM

Номер: US20130185068A1

Автор: Arakawa Takayuki, TANAKA Daisuke

Принадлежит: NEC Corporation

The present invention provides a speech recognition device includes a threshold value candidate generation unit which extracts a feature indicating likeliness of being speech from a temporal sequence of input sound, and generates a plurality of threshold value candidates for discriminating between speech and non-speech; a speech determination unit which, by comparing the feature indicating likeliness of being speech with the plurality of threshold value candidates, determines respective speech sections, and outputs determination information as a result of the determination; a search unit which corrects each of the speech sections represented by the determination information, using a speech model and a non-speech model; and a parameter update unit which estimates a threshold value for determining a speech section, on the basis of distribution profiles of the feature respectively in utterance sections and in non-utterance sections, within each of the corrected speech sections, and makes an update with the threshold value. 110-. (canceled)11. A speech recognition device comprising:a threshold value candidate generation unit which extracts a feature indicating likeliness of being speech from a temporal sequence of input sound, and generates a threshold value candidate for discriminating between speech and non-speech;a speech determination unit which, by comparing said feature indicating likeliness of being speech with a plurality of said threshold value candidates, determines respective speech sections and outputs determination information as a result of the determination;a search unit which corrects each of said speech sections represented by said determination information using a speech model and a non-speech model; anda parameter update unit which estimates a threshold value for determining a speech section, on the basis of distribution profiles of said feature respectively in utterance sections and in non-utterance sections, within each of said corrected speech ...

Подробнее

Номер записи: 49

18-07-2013 дата публикации

USER SPEECH INTERFACES FOR INTERACTIVE MEDIA GUIDANCE APPLICATIONS

Номер: US20130185080A1

Автор: Berezowski David M., DeWeese Toby, Ellis Michael D., Reichardt M. Scott

Принадлежит: UNITED VIDEO PROPERTIES, INC.

A user speech interface for interactive media guidance applications, such as television program guides, guides for audio services, guides for video-on-demand (VOD) services, guides for personal video recorders (PVRs), or other suitable guidance applications is provided. Voice commands may be received from a user and guidance activities may be performed in response to the voice commands. 1153-. (canceled)154. A system for generating a customized voice control media interface , comprising: receive a voice command entered by a user at a first device, wherein the voice command comprises a media control request;', 'identify the user based on data stored in memory at the processing circuitry associating the user with the first device;', 'generate a customized feature based on the identified user; and', 'execute the media control requested by the user., 'processing circuitry configured to155. The system of claim 154 , wherein:the media control request comprises a request to store media content; andthe customized feature comprises storing the media content to a file associated with the identified user.156. The system of claim 154 , wherein:the media control request comprises a request to play media content; andthe customized feature comprises presenting media content from a media content source associated with the identified user.157. The system of claim 154 , wherein the customized feature comprises a targeted advertisement selected based on the identified user.158. The system of claim 154 , wherein the customized feature is a favorites list that comprises preferred media content or media sources associated with the identified user.159. The system of claim 154 , wherein the first device comprises a display and speaker configured to present media content to the user.160. The system of claim 159 , wherein:the media control request includes a request for information identifying media content that is available on the first device; andthe first device is configured to generate ...

Подробнее

Номер записи: 50

25-07-2013 дата публикации

COMPUTERIZED INFORMATION AND DISPLAY APPARATUS

Номер: US20130188055A1

Автор: Gazdzinski Robert F.

Принадлежит: WEST VIEW RESEARCH, LLC

Apparatus useful for obtaining and displaying information. In one embodiment, the apparatus includes a network interface, display device, and speech recognition apparatus configured to receive user speech input and enable performance of various tasks via a remote entity, such as obtaining desired information relating to maps or directions, or any number of other topics. The downloaded data may also, in one variant, be displayed with contextually related advertising or other content. 140-. (canceled)41. Computerized information and display apparatus , comprising:a network interface;processing apparatus in data communication with the network interface;a display device; anda storage apparatus comprising at least one computer program, said at least one program being configured to, when executed:obtain digitized speech generated based on speech received from a user, the digitized speech relating to a query for desired information which the user wishes to find; andcause, based at least in part on the digitized speech, access of a remote network entity to cause retrieval of the desired information;wherein the apparatus is further configured to display advertising content on the display device, the content received via the network interface and selected based at least in part on the digitized speech.42. The apparatus of claim 41 , wherein the received content is selected from a plurality of advertising content that is contextually related to the desired information.43. The apparatus of claim 42 , wherein the desired information comprises information relating to an entity or location.44. The apparatus of claim 43 , wherein the desired information comprises information relating to a business entity or organization claim 43 , and the contextual relationship comprises a contextual relationship between the selected content and an industry or type of the business entity or organization.45. The apparatus of claim 41 , wherein the desired action comprises obtaining information ...

Подробнее

Номер записи: 51

25-07-2013 дата публикации

Automatic Door

Номер: US20130191123A1

Автор: Clough Bradford A.

Принадлежит: Altorr Corporation

In some implementations a storage device having a voice-recognition engine stored thereon is coupled to a microcontroller, a device-controller for an automatic door is operably coupled to the microcontroller. 1. An apparatus comprising:a command receiver that is operable to detect a command to open a door;a door opener that is operably coupled to the command receiver and that is operable to initiate opening of the door when the command receiver detects the command to open the door;an obstacle detector that is operably coupled to the command receiver and that is operable to be initiated when the command receiver detects the command to open the door, obstacle detector also operable to perform an obstacle detection process while the door is opening, the obstacle detector also operable to evaluate an obstacle warning parameter when an obstacle is being detected; anda device controller that is operably coupled to the door opener and the obstacle detector, the device controller being is operable to halt the door opening when the obstacle warning parameter is set to NO, the device controller also operable to initialize a warning counter to a maximum number of iterations of a warning when the obstacle warning parameter is set to YES, the device controller also operable to perform a loop the maximum the number of iterations indicated by the warning counter when the warning counter is initialized, the loop providing an obstacle warning, and polling for a response, the device controller also operable to perform a predetermined default action when no response to the obstacle warning is received, the device controller also operable to perform a door command in accordance with the response when a response to the obstacle warning is received.2. The apparatus of claim 1 , wherein the loop further comprises:responsive to the warning counter being initialized, decrement the warning counter by 1;provide the obstacle warning;start a timer;performing a door command in accordance with ...

Подробнее

Номер записи: 52

25-07-2013 дата публикации

VOICE PROCESSING APPARATUS, METHOD AND PROGRAM

Номер: US20130191124A1

Автор: Chinen Toru, Honma Hiroyuki

Принадлежит: SONY CORPORATION

Provided is a voice processing apparatus including a feature quantity calculation section extracting a feature quantity from a target frame of an input voice signal, a sound pressure estimation candidate point updating section making each frame of the input voice signal a sound pressure estimation candidate point, retaining the feature quantity of each sound pressure estimation candidate point, and updating the sound pressure estimation candidate point based on the feature quantity of the sound pressure estimation candidate point and the feature quantity of the target frame, a sound pressure estimation section calculating an estimated sound pressure of the input voice signal, based on the feature quantity of the sound pressure estimation candidate point, a gain calculation section calculating a gain applied to the input voice signal based on the estimated sound pressure, and a gain application section performing a gain adjustment of the input voice signal based on the gain. 1. A voice processing apparatus , comprising:a feature quantity calculation section which extracts a feature quantity from a target frame of an input voice signal;a sound pressure estimation candidate point updating section which makes each of a plurality of frames of the input voice signal a sound pressure estimation candidate point, retains the feature quantity of each sound pressure estimation candidate point, and updates the sound pressure estimation candidate point based on the feature quantity of the sound pressure estimation candidate point and the feature quantity of the target frame;a sound pressure estimation section which calculates an estimated sound pressure of the input voice signal, based on the feature quantity of the sound pressure estimation candidate point;a gain calculation section which calculates a gain applied to the input voice signal based on the estimated sound pressure; anda gain application section which performs a gain adjustment of the input voice signal based on the ...

Подробнее

Номер записи: 53

25-07-2013 дата публикации

Information Processing Device, Large Vocabulary Continuous Speech Recognition Method, and Program

Номер: US20130191129A1

Автор: Kurata Gakuto, Nishimura Masafumi, SUZUKI Masayuki

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

System and method for performing speech recognition using acoustic invariant structure for large vocabulary continuous speech. An information processing device receives sound as input and performs speech recognition. The information processing device includes: a speech recognition processing unit for outputting a speech recognition score, a structure score calculation unit for calculation of a structure score that is a score that, with respect for each hypothesis concerning all phoneme pairs comprising the hypothesis, is found by applying phoneme pair-by-pair weighting to phoneme pair inter-distribution distance likelihood and then performing summation, and a ranking unit for ranking the multiple hypotheses based on a sum value of speech recognition score and structure score. 1. A large vocabulary continuous speech recognition method executed by a computer;the method comprises the steps of:(a) acquiring by said computer a speech data as input;(b) performing by said computer speech recognition with respect to said acquired speech data, and outputting a plurality of hypotheses that are a recognition result with a plurality of speech recognition scores, each speech recognition score being a score indicating correctness of a speech recognition result for each hypothesis;(c) calculating by said computer a structure score for each hypothesis, the structure score being obtained by, for all pairs of phonemes consisting of the hypothesis, multiplying a likelihood of inter-distribution distance of a pair of phonemes by weighting for said pair of phonemes and performing summation; and(d) determining by said computer a total value of said structure score and said speech recognition score for each hypothesis, and based on said total value, ranking said plurality of hypotheses.2. The large vocabulary continuous speech recognition method according to ;wherein said method further comprises the step of:(e) performing by said computer steps (b) and (c) with respect to speech data for ...

Подробнее

Номер записи: 54

15-08-2013 дата публикации

SYSTEM AND METHOD FOR PROVIDING A NATURAL LANGUAGE VOICE USER INTERFACE IN AN INTEGRATED VOICE NAVIGATION SERVICES ENVIRONMENT

Номер: US20130211710A1

Автор: Armstrong Lynn, Baldwin Larry, Cheung Catherine, DiChristo Philippe, Guttigoli Sheetal, Kennewick Michael R., Menaker Sam, Salomon Ari, Tjalve Michael, Zimmerman Bernie

Принадлежит: VoiceBox Technologies, Inc.

A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment. 1. A method for providing a natural language voice user interface , comprising:receiving a natural language utterance from an input device associated with a navigation device, wherein the natural language utterance relates to navigation;determining a current location of the computing device;selecting, from among a plurality of sets of location-specific grammar information, a set of location-specific grammar information based on proximity between the current location and a location associated with the set of location-specific grammar information;generating a recognition grammar with the set of location-specific grammar information;generating one or more interpretations of the natural language utterance using the recognition grammar;determining, from the one or more interpretations, a destination having a first full or partial address;determining a route from the current location associated with the navigation device to the first full or partial address of the destination;receiving subsequent natural language utterances from the input device;determining a second full or partial address from the subsequent natural language utterances; andupdating the destination with the second full or partial address.2. The method of ...

Подробнее

Номер записи: 55

15-08-2013 дата публикации

Speech recognition circuit and method

Номер: US20130211835A1

Автор: Damian Kelly Harris-Dowsett, Guy Larri, Mark Catchpole, Timothy Brian Reynolds

Принадлежит: Zentian Ltd

A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to said scores.

Подробнее

Номер записи: 56

22-08-2013 дата публикации

Sound Recognition Operation Apparatus and Sound Recognition Operation Method

Номер: US20130218562A1

Автор: Igarashi Yoshihiro

Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to one embodiment, a sound recognition operation apparatus includes a sound detection module, a keyword detection module, an audio mute module, and a transmission module. The sound detection module is configured to detect sound. The keyword detection module is configured to detect a particular keyword using voice recognition when the sound detection module detects sound. The audio mute module is configured to transmit an operation signal for muting audio sound when the keyword detection module detects the keyword. The transmission module is configured to recognize the voice command after the keyword is detected by the keyword detection module, and transmit an operation signal corresponding to the voice command. 1. (canceled)2. An electronic device comprising:a word recognizer configured to recognize a predetermined word by voice recognition;a command recognizer configured to recognize a voice command if the predetermined word is recognized; anda transmitter configured to transmit a signal corresponding to the recognized voice command.3. The electronic device of claim 2 , wherein the word recognizer is configured to recognize the predetermined word indicating an electronic device to be controlled.4. The electronic device of claim 3 , wherein the predetermined word comprises a word of “television”.5. The electronic device of claim 2 , wherein the predetermined word comprises a predetermined specific keyword.6. The electronic device of claim 2 , wherein the command recognizer is configured to recognize a voice command for controlling an electronic device.7. The electronic device of claim 6 , wherein the electronic device comprises a television broadcast receiving apparatus.8. The electronic device of claim 2 , further comprising a microphone configured to receive the predetermined word and the voice command.9. The electronic device of claim 2 , further comprising a notifier configured to notify one of a set state and an operation state of the electronic ...

Подробнее

Номер записи: 57

22-08-2013 дата публикации

System and Method for Providing a Natural Language Interface to a Database

Номер: US20130218564A1

Автор: Cox Richard Vandervoort, Eslambolchi Hossein, Nadji Behzad, Rahim Mazin G.

Принадлежит: AT&T INTELLECTUAL PROPERTY II, L.P.

A system and method for providing a natural language interface to a database or the Internet. The method provides a response from a database to a natural language query. The method comprises receiving a user query, extracting key data from the user query, submitting the extracted key data to a data base search engine to retrieve a top n pages from the data base, processing of the top n pages through a natural language dialog engine and providing a response based on processing the top n pages. 1. A method comprising:extracting, via a processor, key data from a user query;submitting the key data to a search engine to perform a search and to retrieve a set of top n pages from a database, wherein in response to a restriction to access a restricted page of the set of top n pages, the processor provides data to the database to overcome the restriction independent of a user navigation to the restricted page;providing, at a first time, a response to the user query;after providing the response at the first time, continuing, without further user input, to search for information associated with the user query using a machine learning process to expand the search; andpresenting an option to a device associated with a user, at a second time which is later than the first time, to view the related information separate from the response.2. The method of claim 1 , wherein the response is text-based and audible.3. The method of claim 1 , wherein the user query is one of a natural language speech query and a text-based query.4. The method of claim 1 , wherein the key data comprises one of keywords and key phrases.5. The method of claim 1 , wherein submitting of the key data to the search engine further comprises submitting the key data to a plurality of search engines.6. The method of claim 1 , wherein the user query is received via a speech recognizer.7. The method of claim 1 , wherein the response is a natural language response provided via synthetic speech.8. The method of claim 1 ...

Подробнее

Номер записи: 58

22-08-2013 дата публикации

Management and Prioritization of Processing Multiple Requests

Номер: US20130218574A1

Автор: Banay Dan, Falcon Stephen Russell, Miller David Michael, Yip Clement Chun Pong

Принадлежит: MICROSOFT CORPORATION

Systems and methods are described for systems that utilize an interaction manager to manage interactions—also known as requests or dialogues—from one or more applications. The interactions are managed properly even if multiple applications use different grammars. The interaction manager maintains a priority for each of the interactions, such as via an interaction list, where the priority of the interactions corresponds to an order in which the interactions are to be processed. Interactions are normally processed in the order in which they are received. However, the systems and method described herein may provide a grace period after processing a first interaction and before processing a second interaction. If a third interaction that is chained to the first interaction is received during this grace period, then the third interaction may be processed before the second interaction. 1. A system , comprising:memory;one or more processors; and assign processing priorities to a plurality of requests wherein each request corresponds to a priority in which the request is to be processed, such that a first request having a higher priority is processed before a second request having a lower priority; and', 'provide a grace period after processing the first request, wherein in response to determining that the system receives a third request chained to the first request during the grace period, the system is configured to process the third request prior to processing the second request., 'an interaction manager maintained in the memory and executable by the one or more processors to2. The system as recited in claim 1 , wherein the interaction manager interrupts a request currently being processed when a received request is assigned a priority higher than the interrupted request claim 1 , the interrupted request resuming processing after the received request is processed.3. The system as recited in claim 1 , wherein the interaction manager interrupts a request currently being ...

Подробнее

Номер записи: 59

29-08-2013 дата публикации

SPOKEN CONTROL FOR USER CONSTRUCTION OF COMPLEX BEHAVIORS

Номер: US20130226580A1

Автор: Witt-Ehsani Silke Maren

Принадлежит: Fluential, LLC

A device interface system is presented. Contemplated device interfaces allow for construction of complex device behaviors by aggregating device functions. The behaviors are triggered based on conditions derived from environmental data about the device. 1. A device interface comprising:a dialog interface module disposed within a device and configured to accept a signal comprising a representation of a spoken utterance;a data source connection configured to acquire environment data from a plurality of data sources and representative of a device environment;a device function database storing primitive device functions indexed by device state attributes;an interaction history database storing previous interactions indexed by environment data attributes; and obtain previous interaction from the interaction history database by submitting a query to the interacting history database, the query instantiated based on environment data attributes derived from the environment data;', 'derive a device state from at least one of the environment data and previous interactions;', 'obtain a set of primitive device functions from the device function database based on the device state;', 'instantiate a future device behavior constructed from the set of primitive functions and the dialog signal;', 'create a trigger as a function of the future device behavior and the device state; and', 'configure the device to exhibit the future device behavior upon satisfaction of the trigger., 'a triggering module coupled with the dialog interface, data source interface, and the interaction history database and configured to2. The interface of claim 1 , wherein the device state comprises a current device state.3. The interface of claim 1 , wherein the device state comprises a previous device state.4. The interface of claim 1 , wherein the device state comprises a future device state.5. The interface of claim 1 , wherein the device state comprises a functional state of the device.6. The interface of ...

Подробнее

Номер записи: 60

05-09-2013 дата публикации

SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND COMPUTER READABLE MEDIUM

Номер: US20130231929A1

Автор: Arakawa Takayuki, Komeji Shuji, Koshinaka Takafumi

Принадлежит: NEC Corporation

The present invention can increase the types of noises that can be dealt with enough to enable speech recognition with a speech recognition rate of high accuracy. 1. Speech recognition device comprising:a coefficient storage unit which stores a suppression coefficient representing an amount of noise suppression and an adaptation coefficient representing an amount of adaptation which is generated on the basis of a predetermined noise and is synthesized to a clean acoustic model generated on the basis of a voice which does not include noise, in a manner to relate them to each other;a noise estimation unit which estimates noise from an input signal;a noise suppression unit which suppresses a portion of the noise specified by a suppression amount specified on the basis of the suppression coefficient, among from the noise estimated by said noise estimation unit, from the input signal;an acoustic model adaptation unit which generates an adapted acoustic model which is noise-adapted, by synthesizing the noise model, which is generated on the basis of the noise estimated by said noise estimation unit in accordance with an amount of adaptation specified on the basis of the adaptation coefficient, to the clean acoustic model; anda search unit which recognizes voice on the basis of the input suppressed noise by said noise suppression unit and the adapted acoustic model generated by said acoustic model adaptation unit.2. The speech recognition device according to claim 1 , whereinthe sum of the suppression coefficient and the adaptation coefficient is a predetermined value or matrix.3. The speech recognition device according to claim 1 , comprising:a training data storage unit which stores training data including at least one noise; anda coefficient determination unit which updates the suppression coefficient and the adaptation coefficient stored in said coefficient storage unit; whereinsaid coefficient determination unit takes a noise included in the training data as observed ...

Подробнее

Номер записи: 61

05-09-2013 дата публикации

Speech Recognition on Large Lists Using Fragments

Номер: US20130231934A1

Автор: Schwarz Markus

Принадлежит: NUANCE COMMUNICATIONS, INC.

A system and method is provided for recognizing a speech input and selecting an entry from a list of entries. The method includes recognizing a speech input. A fragment list of fragmented entries is provided and compared to the recognized speech input to generate a candidate list of best matching entries based on the comparison result. The system includes a speech recognition module, and a data base for storing the list of entries and the fragmented list. The speech recognition module may obtain the fragmented list from the data base and store a candidate list of best matching entries in memory. A display may also be provided to allow a user to select from a list of best matching entries. 1. A speech recognition method in which an entry corresponding to a speech input is selected from a list of entries , the method comprising:detecting the speech input;recognizing a phoneme sequence of the speech input;providing a list of fragments of entries in a list of entries, the fragments being based on a subword or phoneme level; andcomparing the phoneme sequence of the recognized speech input to the list of fragments to generate a candidate list of best matching entries based on comparison scores,wherein a comparison score is calculated for a fragment when the recognized speech input is compared to the fragment, the comparison score being a measure of how well the recognized speech input fits to the fragment, wherein a score for one list entry is calculated based on the comparison scores of all the fragments that build the list entry.2. The method of claim 1 , where at least one fragment is provided for each entry of the list.3. The method of further comprising:providing a list of fragments containing substantially all different fragments of the entries, where for generating the candidate list, the recognized speech input is compared to the list of fragments.4. The method of further comprising calculating a score for each fragment of the list of fragments.5. The method of ...

Подробнее

Номер записи: 62

05-09-2013 дата публикации

Context Sensitive Overlays In Voice Controlled Headset Computer Displays

Номер: US20130231937A1

Автор: Parkinson Christopher, Woodall James

Принадлежит: Kopin Corporation

In headset computers that leverage voice commands, often the user does not know what voice commands are available. In one embodiment, a method includes providing a user interface in a headset computer and, in response to user utterance of a cue toggle command, displaying at least one cue in the user interface. Each cue can correspond to a voice command associated with code to execute. In response to user utterance of the voice command, the method can also include executing the code associated with the voice command. The user can therefore ascertain what voice commands are available. 1. A method comprising:providing a user interface in a headset computer;in response to user utterance of a cue toggle command, displaying at least one cue, each cue corresponding to a voice command associated with code to execute, in the user interface; andin response to user utterance of the voice command, executing the code associated with the voice command.2. The method of claim 1 , further comprising:displaying the interface without the cue at least one of prior to the cue toggle command and after a subsequent cue toggle command.3. The method of claim 1 , wherein displaying the cue includes displaying words that activate the voice command.4. The method of claim 1 , wherein displaying the cue includes displaying the cue in the user interface corresponding to the voice command associated with the control claim 1 , the control displayed in the user interface.5. The method of claim 1 , wherein displaying the cue includes displaying the cue in the user interface corresponding to the voice command associated with the control claim 1 , the control hidden from the user interface.6. The method of claim 1 , wherein displaying the cue includes displaying the cue in the user interface corresponding to the voice command associated with the control claim 1 , the control being a global headset control.7. The method of claim 1 , wherein the cue is loaded from a control claim 1 , the control ...

Подробнее

Номер записи: 63

12-09-2013 дата публикации

System and Method for Automatically Generating a Dialog Manager

Номер: US20130238333A1

Автор: Balakrishnan Suhrid, LI Lihong, William Jason

Принадлежит: AT&T Intellectual Property I, L.P.

Disclosed herein are systems, methods, and computer-readable storage media for automatically generating a dialog manager for use in a spoken dialog system. A system practicing the method receives a set of user interactions having features, identifies an initial policy, evaluates all of the features in a linear evaluation step of the algorithm to identify a set of most important features, performs a cubic policy improvement step on the identified set of most important features, repeats the previous two steps one or more times, and generates a dialog manager for use in a spoken dialog system based on the resulting policy and/or set of most important features. Evaluating all of the features can include estimating a weight for each feature which indicates how much each feature contributes to at least one of the identified policies. The system can ignore features not in the set of most important features. 1. A method comprising:identifying, via a processor, features from a set of user interactions;identifying a policy for using the features in developing a dialog manager;performing, based on the policy, a linear evaluation on the features, to yield a set of features;repeating a cubic policy process on the set of features until the set of features results in a reduced set of features having a quantity below a threshold; andgenerating the dialog manager using a modified set of user interactions, the modified set of user interactions being selected based on the reduced set of features.2. The method of claim 1 , wherein the cubic policy process comprises a least-squares policy iteration algorithm.3. The method of claim 1 , wherein the linear evaluation comprises estimating a weight for each feature in the features.4. The method of claim 3 , wherein the weight of each feature indicates how much each feature contributes to the policy.5. The method of claim 1 , further comprising ignoring claim 1 , during generation of the dialog manager claim 1 , features which are not in the ...

Подробнее

Номер записи: 64

12-09-2013 дата публикации

ENDPOINT DETECTION APPARATUS FOR SOUND SOURCE AND METHOD THEREOF

Номер: US20130238335A1

Автор: HYUN Kyung Hak, Kim Ki Beom, Shin Ki Hoon

Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

An apparatus for detecting endpoints of sound signals when sound sources vocalized from a remote site are processed even if a plurality of speakers exists and an interference sound being input from a direction different from a direction of one speaker, and a method thereof, wherein in an environment in which a plurality of sound sources exists, the existence and the length of the sound source being input according to each direction is determined and the endpoint is found, thereby improving the performance of the post-processing, and speech being input from a direction other than a direction of speech from a speaker vocalized at a remote area from a sound source collecting unit is distinguished while the speech from the speaker is being recorded, thereby enabling a remote sound source recognition without restriction on the installation region of a microphone. 1. An apparatus for detecting endpoints of a plurality of sounds signals from a plurality of sound sources , the apparatus comprising:a plurality of microphones configured to receive the plurality of sound source signals from the plurality of sound sources;a sound source position detecting unit configured to detect positions of the plurality of sound sources from the sound source signals received through the plurality of microphones;a sound source position change determination unit configured to determine a change in position of the sound source according to each direction by reading the positions of the plurality of sound sources detected through the sound source position detecting unit;a sound source maintenance time calculating unit configured to calculate a sound source maintenance time of the sound source at a predetermined position by reading the positions of the plurality of sound sources detected through the sound source position detecting unit; andan endpoint determination unit configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by ...

Подробнее

Номер записи: 65

19-09-2013 дата публикации

System and Method of Providing a Spoken Dialog Interface to a Website

Номер: US20130246069A1

Автор: Junlan Feng, Mazin G. Rahim, Srinivas Bangalore

Принадлежит: AT&T Intellectual Property II LP

Disclosed is a method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes selecting anchor texts within a website based on a term density, weighting those anchor texts based on a percent of salient words to total words, and incorporating the weighted anchor texts into a live spoken dialog interface, the weights determining a level of incorporation into the live spoken dialog interface.

Подробнее

Номер записи: 66

19-09-2013 дата публикации

ELECTRONIC DEVICE AND METHOD FOR CONTROLLING POWER USING VOICE RECOGNITION

Номер: US20130246071A1

Автор: JUNG Chang-joo, LEE Ji-Hyun

Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

An electronic apparatus and a power controlling method are provided. The electronic apparatus includes: a voice input unit which receives an audio input in a stand-by mode of the electronic apparatus; a voice sensing unit which determines whether the received audio input is a user voice, and if the user voice is input, outputs a power control signal; and a power control voice recognition unit which, if the power control signal is received from the voice recognition unit, turns on and performs voice recognition regarding the input user voice. 1. An electronic apparatus , comprising:a voice input unit which receives an audio input in a stand-by mode of the electronic apparatus;a voice sensing unit which determines whether the received audio input is a user voice , and outputs a first power control signal in response to determining that the received audio input is the user voice; anda power control voice recognition unit which, in response to receiving the first power control signal from the voice recognition unit, turns on and performs voice recognition regarding the received audio input.2. The apparatus as claimed in claim 1 , wherein the power control voice recognition unit determines whether the received audio input is to control power of the electronic apparatus.3. The apparatus as claimed in claim 2 , further comprising:a main control unit which controls the electronic apparatus,wherein the power control voice recognition unit transmits a second power control signal to the main control unit in response to determining that the received audio input is to control the power of the electronic apparatus, andwherein the main control unit converts a mode of the electronic apparatus from the stand-by mode into an operation mode in response to receiving the second power control signal from the power control voice recognition unit.4. The apparatus as claimed in claim 3 , wherein the power control voice recognition unit turns off after a predetermined time elapses upon ...

Подробнее

Номер записи: 67

19-09-2013 дата публикации

System and Method for Customized Voice Response

Номер: US20130246072A1

Автор: Duffield Nicholas

Принадлежит: AT&T Intellectual Property I, L.P.

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages. 1. A method comprising:collecting a user-specific services list associated with a user about to use an interactive voice response system;for each service in the user-specific services list, generating country-specific weights;selecting an interactive voice response system language model based on an aggregation of the country-specific weights; andrecognizing speech received from the user via the interactive voice response system based on the interactive voice response system language model.2. The method of claim 1 , wherein the interactive voice response system changes a user interface based on the interactive voice response language model.3. The method of claim 1 , wherein the interactive voice response system selects language options for a splash screen based on the country-specific weights.4. The method of claim 1 , wherein the interactive voice response system tunes a voice recognition algorithm based on the country-specific weights.5. The method of claim 1 , wherein the ...

Подробнее

Номер записи: 68

26-09-2013 дата публикации

Speech Conversation Support Apparatus, Method, and Program

Номер: US20130253924A1

Автор: ICHIMURA Yumi, Sakai Masaru, SUMITA Kazuo

Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to one embodiment, a speech conversation support apparatus includes a division unit, an analysis unit, a detection unit, an estimation unit and an output unit. The division unit divides a speech data item including a word item and a sound item into a plurality of divided speech data items. The analysis unit obtains an analysis result. The detection unit detects, for each divided speech data item, at least one clue expression indicating one of an instruction by a user and a state of the user. The estimation unit estimates, if the clue expression is detected, playback data item from at least one divided speech data item corresponding to a speech uttered before the clue expression is detected. The output unit outputs the playback data item. 1. A speech conversation support apparatus , comprising:a division unit configured to divide, a speech data item including a word item and a sound item, into a plurality of divided speech data items, in accordance with at least one of a first characteristic of the word item and a second characteristic of the sound item;an analysis unit configured to obtain an analysis result on the at least one of the first characteristic and the second characteristic, for each divided speech data item;a first detection unit configured to detect, for each divided speech data item, at least one clue expression indicating one of an instruction by a user and a state of the user in accordance with at least one of an utterance by the user and an action by the user;an estimation unit configured to estimate, if the clue expression is detected, at least one playback data item from at least one divided speech data item corresponding to a speech uttered before the clue expression is detected, based on the analysis result; andan output unit configured to output the playback data item.2. The apparatus according to claim 1 , further comprising an indication unit configured to generate claim 1 , if the clue expression detected by the first detection ...

Подробнее

Номер записи: 69

26-09-2013 дата публикации

Speech Recognition in a Lighting Apparatus

Номер: US20130253925A1

Автор: Jonsson Karl

Принадлежит:

Acoustic energy is received at a lighting apparatus to create acoustic data, and speech recognition is performed on the acoustic data to determine one or more words. A message based on the acoustic data is sent across a network from the lighting apparatus. 1. A method for controlling a lighting apparatus , the method comprising:receiving acoustic energy at a lighting apparatus to create acoustic data;performing speech recognition on the acoustic data to determine one or more words; andsending a message from the lighting apparatus across a network, wherein the message is based on the acoustic data.2. The method of claim 1 , wherein the lighting apparatus comprises a controller;the performing speech recognition is done by the controller; andthe message includes the one or more words.3. The method of claim 1 , further comprising controlling an aspect of the lighting apparatus based claim 1 , at least in part claim 1 , on the one or more words claim 1 , wherein the message comprises information about the controlling of the aspect of the lighting apparatus.4. The method of claim 1 , further comprising receiving the message at a computer;wherein the performing speech recognition is done by the computer.5. The method of claim 4 , further comprising sending a control message from the computer to the lighting apparatus to control an aspect of the lighting apparatus.6. The method of claim 4 , further comprising compressing at least some of the acoustic data to create compressed acoustic data claim 4 , wherein the message comprises the compressed acoustic data.7. The method of claim 1 , further comprising:determining an identity of a speaker of the one or more words using the processor; andcontrolling an aspect of the lighting apparatus based, at least in part, on the identity of the speaker of the one or more words.8. An article of manufacture comprising a non-transitory computer readable storage medium having instructions stored thereon that claim 1 , if executed by a ...

Подробнее

Номер записи: 70

26-09-2013 дата публикации

CONVERSATION SUPPORTING DEVICE, CONVERSATION SUPPORTING METHOD AND CONVERSATION SUPPORTING PROGRAM

Номер: US20130253932A1

Автор: ARIU Masahide, KAWAMURA Akinori, SUMITA Kazuo

Принадлежит: KABUSHIKI KAISHA TOSHIBA

A conversation supporting device of an embodiment of the present disclosure has a information storage unit, a recognition resource constructing unit, and a voice recognition unit. Here, the information storage unit stores the information disclosed by a speaker. The recognition resource constructing unit uses the disclosed information to construct the recognition resource including a voice model and a language model for recognition of voice data. The voice recognition unit uses the recognition resource to recognize the voice data. 1. A conversation supporting device comprising:a storage unit configured to store information disclosed by a speaker;a recognition resource constructing unit configured to use the disclosed information in constructing a recognition resource for voice recognition using one of an acoustic model and a language model; anda voice recognition unit configured to use the recognition resource to generate text data corresponding to the voice data.2. The conversation supporting device of claim 1 , further comprising:a voice information storage unit configured to store the voice data correlated to identification information, the identification information including an identity of a speaker of a talk contained in the voice data, and a time information of the talk contained in the voice data; anda conversation interval determination unit configured to use the voice data, the identification information, and the time information to determine a conversation interval in the voice data when the voice data contains a plurality of talks from a plurality of speakers;wherein the recognition resource constructing unit is further configured to use the information disclosed by the plurality of speakers who spoke during the conversation interval to construct the recognition resource, andthe voice recognition unit is further configured to recognize the voice data corresponding to the conversation interval determined by the conversation interval determination unit.3. ...

Подробнее

Номер записи: 71

03-10-2013 дата публикации

Voice-Enabled Touchscreen User Interface

Номер: US20130257780A1

Автор: Baron Charles

Принадлежит:

An electronic device may receive a touch selection of an element on a touch screen. In response, the electronic device may enter a listening mode for a voice command spoken by a user of the device. The voice command may specify a function which the user wishes to apply to the selected element. Optionally, the listening mode may be limited a defined time period based on the touch selection. Such voice commands in combination with touch selections may facilitate user interactions with the electronic device. 1. A method for controlling an electronic device , comprising:receiving a touch selection of a selectable element displayed on a touch screen of the electronic device;in response to receiving the touch selection, enabling the electronic device to listen for a voice command directed to the selectable element; andin response to receiving the voice command, applying a function associated with the voice command to the selectable element.2. The method of wherein the selectable element is one of a plurality of selectable elements represented on the touch screen.3. The method of including:receiving a second touch selection of a second selectable element of the plurality of selectable elements;in response to receiving the second touch selection, enabling the electronic device to listen for a second voice command directed to the second selectable element; andin response to receiving the second voice command, applying a function associated with the second voice command to the second selectable element.4. The method of including receiving the voice command using a microphone of the electronic device.5. The method of including claim 1 , prior to enabling the electronic device to listen for the voice command claim 1 , determining that an ambient sound level does not exceed a maximum noise level.6. The method of including claim 1 , prior to enabling the electronic device to listen for the voice command claim 1 , determining that an ambient sound type is not similar to spoken ...

Подробнее

Номер записи: 72

03-10-2013 дата публикации

SPOKEN DIALOG SYSTEM USING PROMINENCE

Номер: US20130262117A1

Автор: Heckmann Martin

Принадлежит: HONDA RESEARCH INSTITUTE EUROPE GMBH

The invention presents a method for analyzing speech in a spoken dialog system, comprising the steps of: accepting an utterance by at least one means for accepting acoustical signals, in particular a microphone, analyzing the utterance and obtaining prosodic cues from the utterance using at least one processing engine, wherein the utterance is evaluated based on the prosodic cues to determine a prominence of parts of the utterance, and wherein the utterance is analyzed to detect at least one marker feature, e.g. a negative statement, indicative of the utterance containing at least one part to replace at least one part in a previous utterance, the part to be replaced in the previous utterance being determined based on the prominence determined for the parts of the previous utterance and the replacement parts being determined based on the prominence of the parts in the utterance, and wherein the previous utterance is evaluated with the replacement part(s). 1. A method for analyzing speech in a spoken dialog system , comprising the steps of:accepting an utterance by at least one means for accepting acoustical signals, in particular a microphone,analyzing the utterance and obtaining prosodic cues from the utterance using at least one processing engine,wherein the utterance is evaluated based on the prosodic cues to determine a prominence of parts of the utterance, and wherein the utterance is analyzed to detect at least one marker feature, e.g. a negative statement, indicative of the utterance containing at least one part to replace at least one part in a previous utterance, the part to be replaced in the previous utterance being determined based on the prominence determined for the parts of the previous utterance and the replacement parts being determined based on the prominence of the parts in the utterance, and wherein the previous utterance is evaluated with the replacement part(s).2. The method of claim 1 , wherein the utterance is a correction of the previous ...

Подробнее

Номер записи: 73

10-10-2013 дата публикации

SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION METHOD, AND SPEECH RECOGNITION PROGRAM

Номер: US20130268271A1

Автор: Hanazawa Ken, Okabe Koji, Osada Seiya

Принадлежит: NEC Corporation

A speech recognition system has: hypothesis search means which searches for an optimal solution of inputted speech data by generating a hypothesis which is a bundle of words which are searched for as recognition result candidates; self-repair decision means which calculates a self-repair likelihood of a word or a word sequence included in the hypothesis which is being searched for by the hypothesis search means, and decides whether or not self-repair of the word or the word sequence is performed; and transparent word hypothesis generation means which, when it is decided that the self-repair is performed, generates a transparent word hypothesis which is a hypothesis which regards as a transparent word a word or a word sequence included in a disfluency interval or a repair interval of a self-repair interval including the word or the word sequence. 1. A speech recognition system comprising:a hypothesis search unit which searches for an optimal solution of inputted speech data by generating a hypothesis which is a bundle of words which are searched for as recognition result candidates;a self-repair decision unit which calculates a self-repair likelihood of a word or a word sequence included in the hypothesis which is being searched for by the hypothesis search unit, and decides whether or not self-repair of the word or the word sequence is performed; anda transparent word hypothesis generation unit which, when the self-repair decision unit decides that the self-repair is performed, generates a transparent word hypothesis which is a hypothesis which regards as a transparent word a word or a word sequence included in a disfluency interval or a repair interval of a self-repair interval including the word or the word sequence,wherein the hypothesis search unit searches for an optimal solution by including as search target hypotheses the transparent word hypothesis generated by the transparent word hypothesis generation unit.2. The speech recognition system according to ...

Подробнее

Номер записи: 74

10-10-2013 дата публикации

System and Method for Efficient Tracking of Multiple Dialog States with Incremental Recombination

Номер: US20130268274A1

Автор: Williams Jason

Принадлежит:

Disclosed herein are systems, methods, and computer-readable storage media for tracking multiple dialog states. A system practicing the method receives an N-best list of speech recognition candidates, a list of current partitions, and a belief for each of the current partitions. A partition is a group of dialog states. In an outer loop, the system iterates over the N-best list of speech recognition candidates. In an inner loop, the system performs a split, update, and recombination process to generate a fixed number of partitions after each speech recognition candidate in the N-best list. The system recognizes speech based on the N-best list and the fixed number of partitions. The split process can perform all possible splits on all partitions. The update process can compute an estimated new belief. The estimated new belief can be a product of ASR reliability, user likelihood to produce this action, and an original belief. 1. A method comprising:receiving a list of speech recognition candidates;receiving a list of current partitions, wherein each partition in the list of current partitions is a group of dialog states;in an outer loop, iterating over each of the speech recognition candidates in the list of speech recognition candidates;in an inner loop, performing a split process, an update process, and a recombination process, via a processor, wherein additional partitions are added to the current partitions after each speech recognition candidate in the list such that each speech recognition candidate has a fixed number; andrecognizing speech based on the list and the fixed number of partitions.2. The method of claim 1 , wherein the split process performs all possible splits on all partitions.3. The method of claim 1 , wherein the update process computes an estimated new belief.4. The method of claim 3 , wherein the estimated new belief is a product of one of a reliability of automatic speech recognition claim 3 , a likelihood that a user would produce an action ...

Подробнее

Номер записи: 75

17-10-2013 дата публикации

CHANNEL DETECTION IN NOISE USING SINGLE CHANNEL DATA

Номер: US20130275128A1

Автор: Claussen Heiko, Rosca Justinian

Принадлежит: Siemens Corporation

Methods related to Generalized Mutual Interdependence Analysis (GMIA), a low complexity statistical method for projecting data in a subspace that captures invariant properties of the data, are implemented on a processor based system. GMIA methods are applied to the signal processing problem of voice activity detection and classification. Real-world conversational speech data are modeled to fit the GMIA assumptions. Low complexity GMIA computations extract reliable features for classification of sound under noisy conditions and operate with small amounts of data. A speaker is characterized by a slow varying or invariant channel that is learned and is tracked from single channel data by GMIA methods. 1. A method for detecting own voice activity by a speaker using a microphone having a near field channel with the speaker covering a distance of 30 cm or less , comprising:a processor extracting a near-field signature from signals generated from a plurality of different voices applied to the near-field channel by using general mutual interdependence analysis (GMIA);the microphone generating a speaker signal from a voice of the speaker;the processor extracting a channel signature from the speaker signal by applying GMIA;the processor comparing the channel signature with the near-field signature to determine a channel used by the speaker.2. The method of claim 1 , wherein the channel used by the speaker is the near-field channel or a far-field channel.3. The method of claim 1 , further comprising:the processor determining that the speaker voice was transmitted over the near-field channel.4. The method of claim 1 , wherein a signal generated by an additional source is superimposed on the speaker signal and wherein a measure of the speaker signal relative to the signal generated by the additional source is a signal to noise ratio of 20 decibel (dB) or less.5. The method of claim 1 , further comprising:the processor extracting a far-field channel signature from signals ...

Подробнее

Номер записи: 76

17-10-2013 дата публикации

Automatic Updating of Confidence Scoring Functionality for Speech Recognition Systems

Номер: US20130275135A1

Автор: Connolly Dermot, Halberstadt Andrew, Morales Nicolas

Принадлежит:

Automatically adjusting confidence scoring functionality is described for a speech recognition engine. Operation of the speech recognition system is revised so as to change an associated receiver operating characteristic (ROC) curve describing performance of the speech recognition system with respect to rates of false acceptance (FA) versus correct acceptance (CA). Then a confidence scoring functionality related to recognition reliability for a given input utterance is automatically adjusted such that where the ROC curve is better for a given operating point after revising the operation of the speech recognition system, the adjusting reflects a double gain constraint to maintain FA and CA rates at least as good as before revising operation of the speech recognition system. 1. A method for automatically adjusting operation of a speech recognition system comprising:revising operation of the speech recognition system so as to change an associated receiver operating characteristic (ROC) curve describing performance of the speech recognition system with respect to rates of false acceptance (FA) versus correct acceptance (CA); andautomatically adjusting a confidence scoring functionality related to recognition reliability for a given input utterance such that where the ROC curve is better for a given operating point after revising the operation of the speech recognition system, the adjusting reflects a double gain constraint to maintain FA and CA rates at least as good as before revising operation of the speech recognition system.2. A method according to claim 1 , wherein where the ROC curve is not better for a given operating point after revising the operation of the speech recognition system claim 1 , the adjusting minimizes worsening of the FA and CA rates.3. A method according to claim 1 , wherein automatically adjusting the confidence scoring functionality includes establishing a mapping to maintain equivalence of the confidence scoring functionality before and after ...

Подробнее

Номер записи: 77

24-10-2013 дата публикации

SYSTEMS AND METHODS FOR AUDIO SIGNAL PROCESSING

Номер: US20130282372A1

Автор: Guo Yinyi, Kim Lae-Hoon, Nam Juhan, Visser Erik

Принадлежит: QUALCOMM INCORPORATED

A method for detecting voice activity by an electronic device is described. The method includes detecting near end speech based on a near end voiced speech detector and at least one single channel voice activity detector. The near end voiced speech detector is associated with a harmonic statistic based on a speech pitch histogram. 1. A method for detecting voice activity by an electronic device , comprising:detecting near end speech based on a near end voiced speech detector and at least one single channel voice activity detector, wherein the near end voiced speech detector is associated with a harmonic statistic based on a speech pitch histogram.2. The method of claim 1 , wherein the near end voiced speech detector and the at least one single channel voice activity detector are integrated.3. The method of claim 1 , further comprising switching to a single microphone.4. The method of claim 3 , wherein switching to a single microphone comprises switching from a dual microphone to the single microphone.5. The method of claim 3 , wherein switching to a single microphone occurs when a signal-to-noise ratio exceeds a threshold.6. The method of claim 3 , wherein switching to a single microphone occurs when a speech envelope is not maintained.7. The method of claim 3 , wherein switching to a single microphone occurs when attenuated near end speech is detected.8. The method of claim 3 , wherein switching to a single microphone occurs when a harmonicity exceeds a threshold a number of times in a defined period of time.9. The method of claim 1 , further comprising:computing a statistic that is sensitive to harmonic content;creating the harmonic statistic based on the speech pitch histogram; anddetecting near end voiced speech.10. The method of claim 9 , wherein computing a statistic that is sensitive to harmonic content further comprises evaluating a pitch on an enhanced signal.11. The method of claim 1 , wherein the near end voiced speech detector is associated with a gain ...

Подробнее

Номер записи: 78

31-10-2013 дата публикации

Cell Phone Security, Safety, Augmentation Systems, and Associated Methods

Номер: US20130288744A1

Автор: Vock Curtis A., Youngs Perry

Принадлежит:

A mobile device has a datalog module that captures multimedia data at the mobile device and transmits the multimedia data through cell networks to a control center. The mobile device may also include a GPS sensor wherein location information is included within the multimedia data. A mobile device has a motion module that, when activated at the mobile device or through a cell network, disables communications through the mobile device when in motion. A system disables operation of a mobile device by a vehicle operator and includes a transmitter within the vehicle that generates a disabling signal that, when received by a safety receiver within the mobile device, disables operation of the mobile device. A mobile device has a microphone, and a voice augmentation module which is selectively activated to augment voice data spoken into the mobile device, by removing background noise and/or replacing or changing voice data. 1. A mobile device , comprising:a microphone;a digital camera;a voice recognition module for determining whether a voice command is spoken into the microphone; anda datalog module for capturing and off-loading multimedia data from the microphone and digital camera when activated by the voice command.2. The mobile device of claim 1 , the multimedia data comprising one or more of image data from the digital camera claim 1 , video data from the digital camera claim 1 , and voice data from the microphone.3. The mobile device of claim 1 , wherein a control center remotely stores the multimedia data for remote access and review by and through the Internet.4. The mobile device of claim 3 , further comprising a GPS sensor integrated with the mobile device claim 3 , the datalog module further capturing and off-loading location information from the GPS sensor as part of the multimedia data stored at the control center.5. The mobile device of claim 1 , wherein turn-off of the mobile device is prohibited when the datalog module is activated.6. A mobile device claim ...

Подробнее

Номер записи: 79

31-10-2013 дата публикации

RECORDING MEDIUM

Номер: US20130289982A1

Автор: Endo Mitsuru, Mizushima Koichiro, Yamada Maki

Принадлежит:

A recording medium is provided that records a separating step of separating a mixed sound signal in which a plurality of excitations are mixed into the respective excitations, and a step of performing speech detection on the plurality of separated excitation signals, judging whether or not the plurality of excitation signals are speech and generating speech section information indicating speech/non-speech information for each excitation signal. The recording medium also includes at least one of a step of calculating and analyzing an utterance overlap duration using the speech section information for combinations of the plurality of excitation signals and a step of calculating and analyzing a silence duration. The recording medium further includes a step of calculating a degree of establishment of a conversation indicating the degree of establishment of a conversation based on the extracted utterance overlap duration or the silence duration. 1a separating step of separating a mixed sound signal in which a plurality of excitations are mixed into the respective excitations;a step of performing speech detection on the plurality of separated excitation signals, judging whether or not the plurality of excitation signals are speech and generating speech section information indicating speech/non-speech information for each excitation signal;at least one of a step of calculating and analyzing an utterance overlap duration using the speech section information for combinations of the plurality of excitation signals and a step of calculating and analyzing a silence duration; anda step of calculating a degree of establishment of a conversation indicating the degree of establishment of a conversation based on the extracted utterance overlap duration or the silence duration.. A recording medium that records: This application is a divisional of co-pending U.S. application Ser. No. 13/262,690, filed Oct. 3, 2011, which is the U.S. National Stage of International Application No. PCT/ ...

Подробнее

Номер записи: 80

31-10-2013 дата публикации

Negative Example (Anti-Word) Based Performance Improvement For Speech Recognition

Номер: US20130289987A1

Автор: Ananth Nagaraja Iyer, Aravind GANAPATHIRAJU, Felix Immanuel Wyss

Принадлежит: Interactive Intelligence Inc

A system and method are presented for negative example based performance improvements for speech recognition. The presently disclosed embodiments address identified false positives and the identification of negative examples of keywords in an Automatic Speech Recognition (ASR) system. Various methods may be used to identify negative examples of keywords. Such methods may include, for example, human listening and learning possible negative examples from a large domain specific text source. In at least one embodiment, negative examples of keywords may be used to improve the performance of an ASR system by reducing false positives.

Подробнее

Номер записи: 81

31-10-2013 дата публикации

Sampling Training Data for an Automatic Speech Recognition System Based on a Benchmark Classification Distribution

Номер: US20130289989A1

Автор: Biadsy Fadi, Bikel Daniel Martin, Moreno Mengibar Pedro J., Nakajima Kaisuke

Принадлежит:

A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system. 1. A method comprising:obtaining a benchmark classification distribution;selecting, by a computing device, training text strings, wherein the training text strings are associated with respective classifications, and wherein the training text strings are selected such that the respective classifications of the selected training text strings are in proportion to the benchmark classification distribution; andtraining an automatic speech recognition (ASR) system using the training text strings.2. The method of claim 1 , wherein the benchmark classification distribution is a distribution of topic classifications.3. The method of claim 1 , wherein obtaining the benchmark classification distribution comprises:transcribing benchmark utterances to respective benchmark text strings; anddetermining the benchmark classification distribution from the benchmark text strings.4. The method of claim 3 , wherein the benchmark utterances were made by users in a category of users claim 3 , and wherein the ASR system is configured to transcribe new utterances made by users in the category of users.5. The method of claim 4 , wherein the benchmark utterances were made by a single user ...

Подробнее

Номер записи: 82

31-10-2013 дата публикации

VOICE RECOGNITION METHOD AND VOICE RECOGNITION APPARATUS

Номер: US20130289992A1

Автор: HARADA Shouji

Принадлежит: FUJITSU LIMITED

A voice recognition method includes: detecting a vocal section including a vocal sound in a voice, based on a feature value of an audio signal representing the voice; identifying a word expressed by the vocal sound in the vocal section, by matching the feature value of the audio signal of the vocal section and an acoustic model of each of a plurality of words; and selecting, with a processor, the word expressed by the vocal sound in a word section based on a comparison result between a signal characteristic of the word section and a signal characteristic of the vocal section. 1. A voice recognition method comprising:detecting a vocal section including a vocal sound in a voice, based on a feature value of an audio signal representing the voice;identifying a word expressed by the vocal sound in the vocal section, by matching the feature value of the audio signal of the vocal section and an acoustic model of each of a plurality of words; andselecting, with a processor, the word expressed by the vocal sound in a word section based on a comparison result between a signal characteristic of the word section and a signal characteristic of the vocal section.2. The voice recognition method according to claim 1 , whereinthe signal characteristic of the vocal section includes at least one of a signal-to-noise ratio (SNR) of the vocal section and an average power of the audio signal of the vocal section, andthe signal characteristic of the word section includes at least one of a signal-to-noise ratio (SNR) of the word section and an average power of the audio signal of the word section.3. The voice recognition method according to claim 1 , whereinthe selecting includes selecting the word expressed by the vocal sound in the word section having a signal characteristic not less than a given lower limit threshold value with respect to the signal characteristic of the vocal section.4. The voice recognition method according to claim 3 , whereinthe selecting includes using the lower ...

Подробнее

Номер записи: 83

31-10-2013 дата публикации

Method and Device for Voice Controlling

Номер: US20130289995A1

Автор: Li Manhai, Liao Xin, Wang Jingping, Xiao Kaili

Принадлежит: ZTE CORPORATION

The present invention discloses a method and device for voice control, which are used to solve the problem of low success rate of voice control in the prior art. The method includes: classifying stored recognition information used for voice recognizing to obtain a syntax packet corresponding to each type of recognition information (); receiving an inputted voice signal, and performing a voice recognition processing respectively on the received voice signal by using each obtained syntax packet in turn (), and performing a corresponding control processing based on a voice recognition result of the voice signal according to each syntax packet (). 1. A method for voice control , comprising the following steps:classifying stored recognition information used for voice recognition to obtain a syntax packet corresponding to each type of recognition information;receiving an inputted voice signal, and performing a voice recognition processing respectively on the received voice signal by using each obtained syntax packet in turn; and, performing a corresponding control processing based on a voice recognition result of the voice signal according to each syntax packet.2. The method according to claim 1 , wherein claim 1 , the step of performing a voice recognition processing on the received voice signal according to each syntax packet specifically comprises:when at least one piece of recognition information in the syntax packet can be recognized from the received voice signal, selecting an identifier corresponding to the recognized recognition information from identifiers corresponding to various recognition information in the pre-designated syntax packet as the voice recognition result of the syntax packet on the voice signal;otherwise determining that voice recognition of this time fails, and selecting an identifier corresponding to a reason for voice recognition processing failure of this time from identifiers corresponding to pre-designated various reasons for voice ...

Подробнее

Номер записи: 84

07-11-2013 дата публикации

SYSTEM AND METHOD FOR CLASSIFICATION OF EMOTION IN HUMAN SPEECH

Номер: US20130297297A1

Автор: GUVEN Erhan

Принадлежит:

A system performs local feature extraction. The system includes a processing device that performs a Short Time Fourier Transform to obtain a spectrogram for a discrete-time speech signal sample. The spectrogram is subdivided based on natural divisions of frequency to humans. Time-frequency-energy is then quantized using information obtained from the spectrogram. And, feature vectors are determined based on the quantized time-frequency-energy information. 1. A method for performing local feature extraction comprising using a processing device to perform the steps of:performing a Short Time Fourier Transform to obtain a spectrogram for a discrete-time speech signal sample;subdividing the spectrogram based on natural divisions of frequency to humans;quantizing time-frequency-energy information obtained from the spectrogram;computing feature vectors based on the quantized time-frequency-energy information; andclassifying an emotion of the speech signal sample based on the computed feature vectors.2. The method according to claim 1 , wherein the step of subdividing the spectrogram comprises subdividing the spectrogram based on the Bark scale.3. The method according to further comprising the step of employing majority voting on the feature vectors to predict an emotion associated with the speech signal sample.4. The method according to further comprising the step of employing weighted-majority voting on the feature vectors to predict an emotion associated with the speech signal sample.5. The method according to claim 1 , wherein the time and the frequency information of a speech signal is transformed into a short time Fourier series and quantized by the regressed surfaces of the spectrogram.6. The method according to claim 1 , further comprising storing both the time and the frequency information together.7. A system for performing local feature extraction comprising using a processing device to perform the steps of:a processor configured to perform a Short Time Fourier ...

Подробнее

Номер записи: 85

07-11-2013 дата публикации

Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech and Speaker Recognition

Номер: US20130297299A1

Автор: Chakrabartty Shantanu, Fazeldehkordi Amin

Принадлежит: BOARD OF TRUSTEES OF MICHIGAN STATE UNIVERSITY

The speech feature extraction algorithm is based on a hierarchical combination of auditory similarity and pooling functions. Computationally efficient features referred to as “Sparse Auditory Reproducing Kernel” (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique (“MAX” operation). Different hyper-parameters and kernel functions may be used to enhance the performance of a SPARK based speech recognizer. 1. A method of processing time domain speech signal digitally represented as a vector of a first dimension , comprising:storing the time domain speech signal in the memory of said processor;representing a set of gammatone basis functions as a set of gammatone basis vectors of said first dimension and storing said gammatone basis vectors in the memory of a processor;using the processor to apply a reproducing kernel function to transform the stored gammatone basis vectors and the stored speech signal to a higher dimensional space;using the processor to compute a set of similarity vectors in said higher dimensional space based on the stored gammatone basis vectors and the stored speech signal;using the processor to apply an inverse function to transform the set of similarity vectors in said higher dimensional space to a set of similarity vectors of the first dimension; andusing the processor to select one of said set of similarity vectors of the first dimension as a processed representation of said speech signal.2. The method of wherein the transformation from higher dimensional space to the first dimension effects a nonlinear transformation.3. The method of wherein the step ...

Подробнее

Номер записи: 86

07-11-2013 дата публикации

APPARATUS AND METHOD FOR SPEECH RECOGNITION

Номер: US20130297304A1

Автор: Kim Sang Hun, KIM Seung Hi

Принадлежит: Electronics and Telecommunications Research Institute

Disclosed is an apparatus for speech recognition and automatic translation operated in a PC or a mobile device. The apparatus for speech recognition according to the present invention includes a display unit that displays a screen for selecting a domain as a unit for a speech recognition region previously sorted for speech recognition to a user; a user input unit that receives a selection of a domain from the user; and a communication unit that transmits the user selection information for the domain. According to the present invention, the apparatus for speech recognition using an intuitive and simple user interface is provided to a user to enable the user to easily select/correct a designation domain of a speech recognition system and improve accuracy and performance of speech recognition and automatic translation by the designated system for speech recognition. 1. An apparatus for speech recognition , comprising:a display unit that displays a screen for selecting a domain for speech recognition to a user;a user input unit that receives a selection of a domain from the user; anda communication unit that transmits the user selection information for the domain.2. The apparatus of claim 1 , wherein the display unit displays a domain selected by the user or a domain previously selected and deselected by the user.3. The apparatus of claim 1 , wherein the display unit classifies and displays a domain representing the domain into a layer according to a speech recognition level.4. The apparatus of claim 3 , wherein the display unit displays a domain for the domain selected by the user among the domains classified and displayed into a layer.5. The apparatus of claim 3 , wherein the layer according to the speech recognition level classifies a general region providing a basic speech recognition region according to a generation situation of speech and the generation situation is re-classified according to generation places.6. The apparatus of claim 3 , wherein the display unit ...

Подробнее

Номер записи: 87

07-11-2013 дата публикации

GENERATING ACOUSTIC MODELS

Номер: US20130297310A1

Автор: Mangibar Pedro J. Moreno, Weinstein Eugene

Принадлежит:

This document describes methods, systems, techniques, and computer program products for generating and/or modifying acoustic models. Acoustic models and/or transformations for a target language/dialect can be generated and/or modified using acoustic models and/or transformations from a source language/dialect. 1. A computer-implemented method comprising:receiving, at a computer system, a request to generate or modify a target acoustic model for a target language;accessing, by the computer system, a source acoustic model for a source language, wherein the source acoustic model includes information that maps acoustic features of the source language to phonemes in a transformed feature space;aligning, using the source acoustic model in the transformed feature space, untransformed voice data in the target language with phonemes in a corresponding textual transcript to obtain aligned voice data, wherein the untransformed voice data is in an untransformed feature space;transforming the aligned voice data according to a particular transform operation using the source acoustic model to obtain transformed voice data;adapting the source acoustic model to the target language using the untransformed voice data in the target language to obtain an adapted acoustic model; andtraining, by the computer system, a target acoustic model for the target language using the transformed voice data and the adapted acoustic model; andproviding the target acoustic model in association with the target language.2. The computer-implemented method of claim 1 , wherein the transformed feature space of the source acoustic model is a Constrained Maximum Likelihood Linear Regression (CMLLR) feature space that is generated from a CMLLR transform operation.3. The computer-implemented method of claim 1 , wherein the source acoustic model is generated from performance of a Linear Discriminant Analysis (LDA) transform operation claim 1 , Vocal Tract Length Normalization (VTLN) transform operation claim 1 , ...

Подробнее

Номер записи: 88

07-11-2013 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING PROGRAM

Номер: US20130297311A1

Автор: Kato Yasuhiko, Kihara Nobuyuki, Sakuraba Yohei, YAMAGUCHI Takeshi

Принадлежит: SONY CORPORATION

An information processing apparatus including: a high-quality-voice determining section configured to determine a voice, which can be determined to have been collected under a good condition, as a good-condition voice included in mixed voices pertaining to a group of voices collected under different conditions; and a voice recognizing section configured to carry out voice recognition processing by making use of a predetermined parameter on the good-condition voice determined by the high-quality-voice determining section, modify the value of the predetermined parameter on the basis of a result of the voice recognition processing carried out on the good-condition voice, and carry out the voice recognition processing by making use of the predetermined parameter having the modified value on a voice included in the mixed voices as a voice other than the good-condition voice.

Подробнее

Номер записи: 89

07-11-2013 дата публикации

Systems and Methods for Off-Board Voice-Automated Web Searching

Номер: US20130297312A1

Автор: Schalk Thomas Barton

Принадлежит:

A system for surfing the web includes a mobile system for processing and transmitting through a wireless link a voice stream spoken by a user of the mobile system and a data center for processing the voice stream received into voice web search information. The continuous voice stream includes a web search request. The data center performs automated voice recognition processing on the voice web search information to recognize components of the web search request, confirms the recognized components of the web search request through interactive speech exchanges with the user through the wireless link and the mobile system, selectively allows human data center operator intervention to assist in identifying the selected recognized web search components having a recognition confidence below a selected threshold value, and downloads web search results pertaining to the web search request for transmission to the mobile system derived from the confirmed recognized web search components. 1. A method of entering a web search string , which comprises: a search phrase; and', 'a keyword;, 'receiving at a mobile processing system a continuous voice stream spoken by a user of the mobile processing system, the continuous voice stream including a web search request being at least one ofprocessing the continuous voice stream into web search information;transmitting the processed web search information through a wireless link to a remote data center;analyzing the processed web search information with a voice recognition system at the remote data center to recognize components of the web search request spoken by the user;generating at the remote data center a list of hypothetical recognized components of the web search request listed by confidence levels as calculated for each component of the web search request analyzed by the voice recognition system;displaying the list of hypothetical recognized components and confidence levels at the remote data center for selective checking by a ...

Подробнее

Номер записи: 90

07-11-2013 дата публикации

ACOUSTIC MODEL ADAPTATION USING GEOGRAPHIC INFORMATION

Номер: US20130297313A1

Автор: Kristjansson Trausti T., Lloyd Matthew I.

Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location. 1. A system comprising:one or more computers; and receiving an audio signal that corresponds to an utterance recorded by a mobile device,', 'determining a geographic location associated with the mobile device,', 'adapting one or more acoustic models for the geographic location, and', 'performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location., 'one or more computer-readable media coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising2. The system of claim 1 , wherein adapting one or more acoustic models further comprises adapting one or more acoustic models before receiving the audio signal that corresponds to the utterance.3. The system of claim 1 , wherein adapting one or more acoustic models further comprises adapting one or more acoustic models after receiving the audio signal that corresponds to the utterance.4. The system of claim 1 , wherein:the operations further comprise receiving geotagged audio signals that correspond to audio recorded by multiple mobile devices in multiple geographic locations; andadapting one or more acoustic models for the geographic location further comprises adapting one or more acoustic models for the geographic location using a subset of the geotagged audio signals.5. The system of claim 4 ...

Подробнее

Номер записи: 91

14-11-2013 дата публикации

METHOD AND APPARATUS FOR ADAPTIVELY DETECTING A VOICE ACTIVITY IN AN INPUT AUDIO SIGNAL

Номер: US20130304464A1

Автор: Wang Zhe

Принадлежит:

The disclosure provides a method and an apparatus for adaptively detecting a voice activity in an input audio signal composed of frames. The method comprises the steps of: determining a noise characteristic of the input signal based on a received frame of the input audio signal; deriving a voice activity detection (VAD) parameter based on the noise characteristic of the input audio signal; and comparing the derived VAD parameter with a threshold value to provide a voice activity detection decision. 1. A method for adaptively detecting a voice activity in an input audio signal , wherein the input audio signal is composed of frames , the method comprising:determining a noise characteristic of the input audio signal based on a received frame of the input audio signal;deriving a voice activity detection (VAD) parameter according to the noise characteristic of the input audio signal; andcomparing the derived VAD parameter with a threshold value to provide a voice activity detection decision (VADD).2. The method according to claim 1 , wherein the noise characteristic of the input audio signal is one of:a long term signal to noise ratio,a background noise variation, ora long term signal to noise ratio and a background noise variation.3. The method according to claim 1 , wherein deriving the VAD parameter according to the noise characteristic of the input audio signal comprises:dividing the received frame of the input audio signal into one or more frequency sub-bands;obtaining a signal to noise ratio for each of the sub-bands;calculating a sub-band specific parameter of each sub-band based on the signal to noise ratio of the sub-band using an adaptive function, wherein at least one parameter of the adaptive function is selected dependent on the noise characteristic of the input audio signal; andderiving a modified segmental signal to noise ratio as the VAD parameter by adding the calculated sub-band specific parameter of each sub-band.4. The method according to claim 3 , ...

Подробнее

Номер записи: 92

14-11-2013 дата публикации

INFORMATION PROCESSING METHOD AND APPARATUS, COMPUTER PROGRAM AND RECORDING MEDIUM

Номер: US20130304469A1

Автор: Hara Keisuke, KAMADA Tomihisa

Принадлежит:

Among multiple documents presented to a user, a high interest and a low interest document are specified, a word group in the high interest document is compared with a word group in the low interest document, and a string of word groups associated weight values is generated as a user feature vector. A word group included in each of multiple data items targeted for assigning priorities is extracted, and data feature vectors are generated specific to each data item, based on the word groups extracted. A degree of similarity between each data feature vectors of multiple data items and user feature vector is obtained, and according to the degree of similarity, priorities are assigned to the multiple data items to be presented to the user. Therefore, it is possible to extract user's feature information on which the user's interests and tastes are reflected more effectively. 1. An information processing method in an information processing apparatus , comprising the steps of:generating a user feature vector specific to a user;extracting a word group included in each of multiple data items targeted for assigning priorities and generating a data feature vector specific to each data item, based on the word group extracted;obtaining a degree of similarity between each of the data feature vectors of the multiple data items and the user feature vector; andassigning priorities to the multiple data items to be presented to the user, according to the degree of similarity obtained;the step of generating the user feature vector including a step of specifying a document of high interest in which a user expresses interest and a document of low interest in which the user expresses no interest, according to the user's operation among multiple documents presented to the user, a word group included in the document of high interest and a word group included in the document of low interest being compared with each other, a weight value of a word included commonly in both documents being set ...

Подробнее

Номер записи: 93

14-11-2013 дата публикации

SYSTEM AND METHOD FOR PROCESSING MULTI-MODAL DEVICE INTERACTIONS IN A NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT

Номер: US20130304473A1

Автор: Baldwin Larry, WEIDER CHRIS

Принадлежит: VoiceBox Technologies, Inc.

A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction. 118-. (canceled)19. A computer-implemented method of facilitating natural language utterance processing via multiple input modes , the method being implemented on a computer that includes one or more physical processors executing one or more computer program modules that perform the method , the method comprising:receiving, via a first input mode, a first input;receiving, via a second input mode that is different from the first input mode, a second input that relates to the first input;determining a request based on the first input or the second input;determining, based on the first input and the second input, context information for the request; andprocessing the request based on the context information.20. The method of claim 19 , wherein determining the request comprises determining an action claim 19 , a query claim 19 , a command claim 19 , or a task based on the first input or the second input.21. The method of claim 19 , wherein receiving the first input comprises receiving a natural language utterance via a voice input mode claim 19 , and wherein receiving the second input comprises receiving a non-voice ...

Подробнее

Номер записи: 94

21-11-2013 дата публикации

APPARATUS FOR CORRECTING ERROR IN SPEECH RECOGNITION

Номер: US20130311182A1

Автор: Kim Hong-kook, PARK Ji-Hun, SEONG Woo-Kyeong

Принадлежит:

An apparatus for correcting errors in speech recognition is provided. The apparatus includes a feature vector extracting unit extracting feature vectors from a received speech. A speech recognizing unit recognizes the received speech as a word sequence on the basis of the extracted feature vectors. A phoneme weighted finite state transducer (WFST)-based converting unit converts the recognized word sequence recognized by the speech recognizing unit into a phoneme WFST. A speech recognition error correcting unit corrects errors in the converted phoneme WFST. The speech recognition error correcting unit includes a WFST synthesizing unit modeling a phoneme WFST transferred from the phoneme WFST-based converting unit as pronunciation variation on the basis of a Kullback-Leibler (KL) distance matrix. 2. The apparatus according to claim 1 , wherein the speech recognition error correcting unit further comprises a phoneme pronunciation variation model creating unit creating a phoneme confusion matrix by using the calculated KL distance.4. The apparatus according to claim 1 , further comprising an acoustic model which is referred to when the speech recognizing unit separates the word sequence from the received speech claim 1 ,wherein the acoustic model has information on a likelihood probability of the feature vectors for a phoneme of a recognition unit.5. The apparatus according to claim 1 , further comprising a pronunciation dictionary which is referred to when the speech recognizing unit recognizes the word sequence of the received speech claim 1 ,wherein the pronunciation dictionary comprises information that standard pronunciation marks of words are listed in a sequence of a recognition unit.6. The apparatus according to claim 5 , wherein the WFST synthesizing unit further comprises a pronunciation dictionary WFST converting again a phoneme sequence into words existing in the pronunciation dictionary claim 5 , wherein the phoneme sequence is a pronunciation sequence ...

Подробнее

Номер записи: 95

28-11-2013 дата публикации

METHOD AND SYSTEM FOR ANALYZING DIGITAL SOUND AUDIO SIGNAL ASSOCIATED WITH BABY CRY

Номер: US20130317815A1

Автор: Chen Mei-Yung, HONG Jon-Chao, Wu Chao-Hsin

Принадлежит: NATIONAL TAIWAN NORMAL UNIVERSITY

A method for analyzing a digital audio signal associated with a baby cry, comprising the steps of: (a) processing the digital audio signal using a spectral analysis to generate a spectral data; (b) processing the digital audio signal using a time-frequency analysis to generate a time-frequency characteristic; (c) categorizing the baby cry into one of a basic type and a special type based on the spectral data; (d) if the baby cry is of the basic type, determining a basic need based on the time-frequency characteristic and a predetermined lookup table; and (e) if the baby cry is of the special type, determining a special need by inputting the time-frequency characteristic into a pre-trained artificial neural network. 1. A method for analyzing a digital audio signal that is associated with a baby cry , the method comprising the steps of:(a) processing the digital audio signal using a spectral analysis so as to generate a spectral data associated with the digital audio signal;(b) processing the digital audio signal using a time-frequency analysis so as to generate a time-frequency characteristic associated with the digital audio signal;(c) categorizing the baby cry into one of a basic type and a special type with reference to the spectral data associated with the digital audio signal;(d) if the baby cry is categorized into the basic type, determining a basic need with reference to the time-frequency characteristic associated with the digital audio signal and a predetermined lookup table that indicates corresponding relationships between a plurality of time-frequency characteristic candidates and a plurality of basic need candidates; and(e) if the baby cry is categorized into the special type, determining a special need by inputting the time-frequency characteristic associated with the digital audio signal into an artificial neural network so as to generate an output of the special need, the artificial neural network being pre-trained using a plurality of predetermined ...

Подробнее

Номер записи: 96

28-11-2013 дата публикации

METHOD FOR RECOGNIZING AND INTERPRETING PATTERNS IN NOISY DATA SEQUENCES

Номер: US20130317816A1

Автор: Potter Jerry Lee

Принадлежит:

This invention maps possibly noisy digital input from any of a number of different hardware or software sources such as keyboards, automatic speech recognition systems, cell phones, smart phones or the web onto an interpretation consisting of an action and one or more physical objects, such as robots, machinery, vehicles, etc. or digital objects such as data files, tables and databases. Tables and lists of (i) homonyms and misrecognitions, (ii) thematic relation patterns, and (iii) lexicons are used to generate alternative forms of the input which are scored to determine the best interpretation of the noisy input. The actions may be executed internally or output to any device which contains a digital component such as, but not limited to, a computer, a robot, a cell phone, a smart phone or the web. This invention may be implemented on sequential and parallel compute engines and systems. 1. A method for correctly interpreting a sequence of possibly noisy input data using a digital system , comprising:(a) A digital system consisting of a plurality of digital devices and devices containing digital components such as but not limited to a computers, personal computers, laptops, personal digital assistants, cell phones, remote controls, inventory control devices, home appliances, automobile, robots, factory machines, construction machines, farming machines, airplanes, or remotely piloted aircrafts. Such devices may be interconnected by wired or wireless means to form the digital system and may operate in sequential or parallel mode; and (i) A plurality of top level input sequences of possibly noisy data. Said top level input sequences of data comprise a plurality of data types both noisy and not. Said top level sequence comprises a plurality of character, integer, real, single precision, double precision or quadruple precision data. Said top level input sequences may be organized into scalars, strings, arrays or any combination thereof; and', '(ii) A plurality of homonym ...

Подробнее

Номер записи: 97

28-11-2013 дата публикации

MODEL ADAPTATION DEVICE, MODEL ADAPTATION METHOD, AND PROGRAM FOR MODEL ADAPTATION

Номер: US20130317822A1

Автор: Koshinaka Takafumi

Принадлежит:

A model adaptation device includes a recognition unit which creates a recognition result of recognizing data that complies with a target domain which is an assumed condition of recognition target data, based on at least two models and a candidate of a weighting factor indicating a weight of each model on a recognition process. A weighting factor determination unit determines the weighting factor so as to assign a smaller weight to a model having higher reliability. A model update unit updates at least one model out of the models, using the recognition result as the truth label. 1. A model adaptation device comprising:a recognition unit for creating a recognition result of recognizing data that complies with a target domain which is an assumed condition of recognition target data, based on at least two models and a candidate of a weighting factor indicating a weight of each model on a recognition process;a model update unit for updating at least one model out of the models, using the recognition result as a truth label; anda weighting factor determination unit for determining the weighting factor,wherein the weighting factor determination unit determines the weighting factor so as to assign a larger weight to a model having higher reliability,wherein the recognition unit creates the recognition result based on the weighting factor determined by the weighting factor determination unit, andwherein the model update unit updates the model, using the recognition result created based on the weighting factor as the truth label.2. The model adaptation device according to claim 1 , wherein the weighting factor determination unit determines the weighting factor so as to maximize a conditional probability of the recognition result created by the recognition unit claim 1 , when the data of the target domain is given.3. The model adaptation device according to claim 1 , wherein the recognition unit creates the recognition result of the data of the target domain claim 1 , for each ...

Подробнее

Номер записи: 98

05-12-2013 дата публикации

Speech recognition adaptation systems based on adaptation data

Номер: US20130325447A1

Автор: Mark A. Malamud, Richard T. Lord, Robert W. Lord, Royce A. Levien

Принадлежит: ELWHA LLC

The instant application includes computationally-implemented systems and methods that include acquiring indication of a speech-facilitated transaction between a particular party and a target device, receiving adaptation data correlated to the particular party, the receiving facilitated by a particular device associated with the particular party, processing audio data from the particular party at least partly using the received adaptation data correlated to the particular party, and updating the adaptation data based at least in part on a result of the processed audio data, such that the updated adaptation data is configured to be transmitted to the particular device. In addition to the foregoing, other aspects are described in the claims, drawings, and text.

Подробнее

Номер записи: 99

05-12-2013 дата публикации

Speech recognition adaptation systems based on adaptation data

Номер: US20130325459A1

Автор: John D. Rinaldo, Jr., Mark A. Malamud, Richard T. Lord, Robert W. Lord, Royce A. Levien

Принадлежит: ELWHA LLC

Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text.

Подробнее

Номер записи: 100