Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 9290. Отображено 100.
19-04-2012 дата публикации

Automatically providing a user with substitutes for potentially ambiguous user-defined speech commands

Номер: US20120095765A1
Принадлежит: Nuance Communications Inc

A method for alleviating ambiguity issues of new user-defined speech commands. An original command for a user-defined speech command can be received. It can then be determined if the original command is likely to be confused with a set of existing speech commands. When confusion is unlikely, the original command can be automatically stored. When confusion is likely, a substitute command that is unlikely to be confused with existing commands can be automatically determined. The substitute can be presented as an alternative to the original command and can be selectively stored as the user-defined speech command.

Подробнее
26-04-2012 дата публикации

Multi-state barge-in models for spoken dialog systems

Номер: US20120101820A1
Автор: Andrej Ljolje
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

Подробнее
05-07-2012 дата публикации

Dialect translator for a speech application environment extended for interactive text exchanges

Номер: US20120173225A1
Принадлежит: Nuance Communications Inc

The present solution includes a real-time automated communication method. In the method, a real-time communication session can be established between a text exchange client and a speech application. A translation table can be identified that includes multiple entries, each entry including a text exchange item and a corresponding conversational translation item. A text exchange message can be received that was entered into a text exchange client. Content in the text exchange message that matches a text exchange item in the translation table can be substituted with a corresponding conversational item. The translated text exchange message can be sent as input to a voice server. Output from the voice server can be used by the speech application, which performs an automatic programmatic action based upon the output.

Подробнее
05-07-2012 дата публикации

Apparatus and method for voice command recognition based on a combination of dialog models

Номер: US20120173244A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

Provided are a voice command recognition apparatus and method capable of figuring out the intention of a voice command input through a voice dialog interface, by combining a rule based dialog model and a statistical dialog model rule. The voice command recognition apparatus includes a command intention determining unit configured to correct an error in recognizing a voice command of a user, and an application processing unit configured to check whether the final command intention determined in the command intention determining unit comprises the input factors for execution of an application.

Подробнее
23-08-2012 дата публикации

Hearing assistance system for providing consistent human speech

Номер: US20120215532A1
Принадлежит: Apple Inc

Broadly speaking, the embodiments disclosed herein describe an apparatus, system, and method that allows a user of a hearing assistance system to perceive consistent human speech. The consistent human speech can be based upon user specific preferences.

Подробнее
18-10-2012 дата публикации

Apparatus and method for processing voice command

Номер: US20120265536A1
Принадлежит: Hyundai Motor Co

Disclosed is a technique for processing voice commands. In particular, the disclose technique increases a voice recognition rate without performing a process of inputting separate voice commands by updating a voice command table based on interaction with a user by storing similar commands input by the user once those commands have been confirmed by the user as similar command.

Подробнее
29-11-2012 дата публикации

Number-assistant voice input system, number-assistant voice input method for voice input system and number-assistant voice correcting method for voice input system

Номер: US20120303368A1
Автор: Ting Ma
Принадлежит: Mitac International Corp

The present invention discloses a number-assistant voice input system, a number-assistant voice input method for a voice input system and a number-assistant voice correcting method for a voice input system, which apply software to drive a voice input system of an electronic device to provide a voice input logic circuit module. The voice input logic circuit module defines the pronunciation of numbers 1 to 26 as the paths to respectively input letters A to Z in the voice input system and allows users to selectively input or correct a letter by reading a number from 1 to 26 instead of a letter from A to Z.

Подробнее
28-02-2013 дата публикации

Truly handsfree speech recognition in high noise environments

Номер: US20130054235A1
Принадлежит: Sensory Inc

Embodiments of the present invention improve content manipulation systems and methods using speech recognition. In one embodiment, the present invention includes a method comprising configuring a recognizer to recognize utterances in the presence of a background audio signal having particular audio characteristics. A composite signal comprising a first audio signal and a spoken utterance of a user is received by the recognizer, where the first audio signal comprises the particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal. The spoke utterance is recognized in the presence of the first audio signal when the spoken utterance is one of the predetermined utterances. An operation is performed on the first audio signal.

Подробнее
04-04-2013 дата публикации

System and Method of Semi-Supervised Learning for Spoken Language Understanding Using Semantic Role Labeling

Номер: US20130085756A1
Принадлежит: AT&T Corp.

A system and method are disclosed for providing semi-supervised learning for a spoken language understanding module using semantic role labeling. The method embodiment relates to a method of generating a spoken language understanding module. Steps in the method comprise selecting at least one predicate/argument pair as an intent from a set of the most frequent predicate/argument pairs for a domain, labeling training data using mapping rules associated with the selected at least one predicate/argument pair, training a call-type classification model using the labeled training data, re-labeling the training data using the call-type classification model and iteratively several of the above steps until training set labels converge. 1. A method comprising:selecting an intent from a list of predicate/argument pairs associated with a spoken dialog system;labeling training data using mapping rules associated with the intent, wherein the mapping rules specify rules for selecting a call-type label for an utterance; and training the classification model using the training data; and', 're-labeling the training data using the classification model., 'while the training data and a classification model associated with the call-type label have a divergence above a threshold, iteratively2. The method of claim 1 , further comprising assigning the verbs “be” and “have” as special predicates.3. The method of claim 1 , further comprising distinguishing verbs from utterances which do not have a predicate by assigning the verbs to a special class.4. The method of claim 1 , wherein the method is semi-supervised.5. The method of claim 1 , further comprising capturing infrequent call types using an active-learning approach.6. The method of claim 1 , wherein the selecting of the intent is performed independent of a domain.7. The method of claim 1 , wherein the mapping rules specify that the call-type is represented by multiple predicate/argument pairs.8. A system comprising:a processor; and ...

Подробнее
02-05-2013 дата публикации

ENABLING SPEECH WITHIN A MULTIMODAL PROGRAM USING MARKUP

Номер: US20130110517A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

A method for speech enabling an application can include the step of specifying a speech input within a speech-enabled markup. The speech-enabled markup can also specify an application operation that is to be executed responsive to the detection of the speech input. After the speech input has been defined within the speech-enabled markup, the application can be instantiated. The specified speech input can then be detected and the application operation can be responsively executed in accordance with the specified speech-enabled markup. 1. A method for speech enabling an application comprising the steps of:specifying a speech input with a speech-enabled markup;defining within said speech-enabled markup at least one operation of an application that is to be executed upon a detection of said specified speech input;after said defining step, instantiating said application;detecting said specified speech input; andexecuting said application operation responsive to said detecting step.2. The method of claim 1 , wherein said application is a multimodal Web browser.3. The method of claim 1 , further comprising the steps of:providing a speech-enabled markup interpreter within an operating system upon which said application executes, wherein said speech-enabled markup interpreter is used to detect said speech input and responsively initiate said application operation.4. The method of claim 3 , further comprising the steps of:rendering a Web page within said application, wherein said Web page includes speech-enabled markup for at least one element of said Web page, and wherein said speech-enabled markup interpreter speech-enables said Web page element.5. The method of claim 1 , further comprising the steps of:associating said speech-enabled markup with a graphical user interface element of said application;determining that said graphical user interface element receives focus; andresponsive to said determination, activating said speech-enabled markup so that said application ...

Подробнее
02-05-2013 дата публикации

Active Input Elicitation by Intelligent Automated Assistant

Номер: US20130110518A1
Принадлежит: Apple Inc.

Methods, systems, and computer readable storage medium related to operating an intelligent automated assistant are disclosed. A user request is received through a conversation interface of the intelligent automated assistant, the user request including at least a speech input received from a user. One or more candidate domains relevant to the user request are identified from a plurality of predefined domains, where each predefined domain presents a respective area of service offered by the intelligent automated assistant, and the identifying is based on respective degrees of match between words derived from the user request and words representing vocabulary and entities associated with each predefined domain. Feedback is provided to the user through the conversation interface of the intelligent automated assistant, where the feedback presents a paraphrase of the user request and elicits additional input from the user to specify one or more parameters associated with a particular candidate domain. 1. A method for operating an intelligent automated assistant , comprising: receiving a user request through a conversation interface of the intelligent automated assistant, the user request comprising at least a speech input received from a user;', 'identifying one or more candidate domains relevant to the user request from a plurality of predefined domains, wherein each predefined domain presents a respective area of service offered by the intelligent automated assistant, and wherein the identifying is based on respective degrees of match between words derived from the user request and words representing vocabulary and entities associated with each predefined domain; and', 'providing feedback to the user through the conversation interface of the intelligent automated assistant, wherein the feedback presents a paraphrase of the user request and elicits additional input from the user to specify one or more parameters associated with a particular candidate domain., 'at an ...

Подробнее
02-05-2013 дата публикации

Intent Deduction Based on Previous User Interactions with Voice Assistant

Номер: US20130110520A1
Принадлежит: Apple Inc.

Methods, systems, and computer readable storage medium related to operating an intelligent digital assistant are disclosed. A text string is obtained from a speech input received from a user. Information is derived from a communication event that occurred at the electronic device prior to receipt of the speech input. The text string is interpreted to derive a plurality of candidate interpretations of user intent. One of the candidate user intents is selected based on the information relating to the communication event. 1. A method for operating an automated assistant , comprising: obtaining a text string from a speech input received from a user;', 'deriving information from a communication event that occurred at the electronic device prior to receipt of the speech input;', 'interpreting the text string to derive a plurality of candidate interpretations of user intent; and', 'selecting one of the candidate user intents based on the information relating to the communication event., 'at an electronic device comprising a processor and memory storing instructions for execution by the processor2. The method of claim 1 , wherein the information includes a name of a person that is associated with the communication event.3. The method of claim 2 , wherein the text string includes a pronoun claim 2 , and wherein selecting one of the candidate user intents comprises determining that the pronoun refers to the person.4. The method claim 3 , wherein selecting the candidate user intent includes determining whether the candidate user intent satisfies a predetermined confidence threshold.5. The method of claim 1 , wherein the communication event is selected from the group consisting of:a telephone call;an email; anda text message.6. A system for operating an intelligent automated assistant claim 1 , comprising:one or more processors; and obtaining a text string from a speech input received from a user;', 'deriving information from a communication event that occurred at the ...

Подробнее
16-05-2013 дата публикации

REAL-TIME DISPLAY OF SYSTEM INSTRUCTIONS

Номер: US20130124208A1
Принадлежит: Intellisist, Inc.

A system and method for reviewing inputted voice instructions in a vehicle-based telematics control unit. The system includes a microphone, a speech recognition processor, and an output device. The microphone receives voice instructions from a user. Coupled to the microphone is the speech recognition processor that generates a voice signal by performing speech recognition processing of the received voice instructions. The output device outputs the generated voice signal to the user, The system also includes a user interface for allowing the user to approve the outputted voice signal, and a communication component for wirelessly sending the generated voice signal to a server over a wireless network upon approval by the user. 1. A method for reviewing inputted voice instructions in a vehicle-based telematics control unit , the method comprising:recording voice instructions from a user;generating a voice signal by performing speech recognition of the recorded voice instructions; andoutputting the generated voice signal over an output device associated with the telematics control unit for review.2. The method of claim 1 , further comprising wirelessly sending at least one of the generated voice signal or the inputted voice instructions to a server over a wireless network upon approval by a user.3. The method of claim 1 , further comprising:generating a digest including the generated voice signals;sending the digest to a human operator system; andconnecting the human operator system to the telematics control unit.4. The method of claim 1 , wherein outputting comprises generating and displaying text based on the generated voice signal.5. The method of claim 1 , wherein outputting comprises generating and outputting voice based on the generated voice signal.6. A system for reviewing inputted voice instructions in a vehicle-based telematics control unit claim 1 , the system comprising:a microphone for receiving voice instructions from a user;a speech recognition processor ...

Подробнее
16-05-2013 дата публикации

SYSTEM AND METHOD FOR ENHANCED COMMUNICATIONS VIA SMALL DATA RATE COMMUNICATION SYSTEMS

Номер: US20130124211A1
Автор: McDonough John G.
Принадлежит: SHORTHAND MOBILE, INC.

A system and method for interacting with an interactive communication system include processing a profile associated with an interactive communication system; generating a user interface based on the processing of the profile to solicit a user response correlating to a response required by the interactive communication system; receiving the user response via the user interface; updating the user interface using the profile based on the user response; and sending a signal to the interactive communication system based on one or more user responses. 1processing a profile associated with an interactive communication system;generating a user interface based on the processing of the profile to solicit a user response correlating to a response required by the interactive communication system;receiving the user response via the user interface;updating the user interface using the profile based on the user response; andsending a signal to the interactive communication system based on one or more user responses.. A method for interacting with an interactive communication system comprising: This application is a continuation of U.S. patent application Ser. No. 12/122,619, filed May 16, 2008, which claims the benefit of priority of U.S. Provisional Pat. App. No. 60/938,969, filed May 18, 2007, entitled “System and Method For Communicating With Text Messaging Systems” and U.S. Provisional Pat. App. No. 60/938,965, filed May 18, 2007, entitled “System and Method for Communicating with Interactive Service Systems” all of which are hereby incorporated by reference.1. Field of the InventionThe present invention relates to communication with interactive service systems, such as service systems that use short message service (SMS), interactive voice response (IVR) systems, and websites or other data systems.2. Related ArtMany companies currently use interactive service systems, such as text messaging systems and IVR systems for various tasks as a first line of customer support and ...

Подробнее
23-05-2013 дата публикации

System and method for crowd-sourced data labeling

Номер: US20130132080A1
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until at least one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses.

Подробнее
23-05-2013 дата публикации

GENERIC FRAMEWORK FOR LARGE-MARGIN MCE TRAINING IN SPEECH RECOGNITION

Номер: US20130132083A1
Принадлежит: MICROSOFT CORPORATION

A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met. 1. A method of training an acoustic model in a speech recognition system , comprising:utilizing a training corpus, having training tokens, to calculate an initial acoustic model;computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes;calculating a sample-adaptive window bandwidth for each training token;determining a value for a loss function based on the computed scores and the calculated sample-adaptive window bandwidth for each training token;updating parameters in the current acoustic model to create a revised acoustic model based upon the loss value; andoutputting the revised acoustic model.2. The method of and further comprising:deriving the loss function from a Bayesian viewpoint.3. The method of wherein deriving the loss function from a Bayesian viewpoint further comprises utilizing a margin-free Bayes risk function.4. The method of wherein deriving the loss function from a Bayesian viewpoint further ...

Подробнее
23-05-2013 дата публикации

METHODS AND SYSTEMS FOR ADAPTING GRAMMARS IN HYBRID SPEECH RECOGNITION ENGINES FOR ENHANCING LOCAL SR PERFORMANCE

Номер: US20130132086A1
Автор: Feng Zhe, Weng Fuliang, Xu Kui
Принадлежит:

A speech recognition method includes providing a processor communicatively coupled to each of a local speech recognition engine and a server-based speech recognition engine. A first speech input is inputted into the server-based speech recognition engine. A first recognition result from the server-based speech recognition engine is received at the processor. The first recognition result is based on the first speech input. The first recognition result is stored in a memory device in association with the first speech input. A second speech input is inputted into the local speech recognition engine. The first recognition result is retrieved from the memory device. A second recognition result is produced by the local speech recognition engine. The second recognition result is based on the second speech input and is dependent upon the retrieved first recognition result. 1. A speech recognition method , comprising the steps of:providing a processor communicatively coupled to each of a local speech recognition engine and a server-based speech recognition engine;inputting a first speech input into the server-based speech recognition engine;receiving at the processor a first recognition result from the server-based speech recognition engine, the first recognition result being based on the first speech input;storing the first recognition result in a memory device, the first recognition result being stored in association with the first speech input;inputting a second speech input into the local speech recognition engine;retrieving the first recognition result from the memory device; andproducing a second recognition result by the local speech recognition engine, the second recognition result being based on the second speech input and being dependent upon the retrieved first recognition result.2. The method of claim 1 , comprising the further step of receiving at the processor a confidence score from the server-based speech recognition engine claim 1 , the confidence score ...

Подробнее
23-05-2013 дата публикации

Systems and Techniques for Producing Spoken Voice Prompts

Номер: US20130132096A1
Принадлежит: Eliza Corporation

Methods and systems are described in which spoken voice prompts can be produced in a manner such that they will most likely have the desired effect, for example to indicate empathy, or produce a desired follow-up action from a call recipient. The prompts can be produced with specific optimized speech parameters, including duration, gender of speaker, and pitch, so as to encourage participation and promote comprehension among a wide range of patients or listeners. Upon hearing such voice prompts, patients/listeners can know immediately when they are being asked questions that they are expected to answer, and when they are being given information, as well as the information that considered sensitive. 1. A method of producing spoken voice prompts for telephony-based informational interaction , the method comprising:for one or more voice prompts, determining words that receive an optimized speech parameter, based on context and/or meaning of the text of the one or more voice prompts;recording the one or more voice prompts, producing one or more spoken voice prompts; andconveying the one or more spoken voice prompts to a listener over a telephone system.2. The method of claim 1 , further comprising determining the number of words that receive an optimized speech parameter based on context and/or meaning of the one or more voice prompts;3. The method of claim 1 , wherein the optimized speech parameter comprises one or more pitch accents.4. The method of claim 3 , wherein the one or more pitch accents yield a pause lengthening pattern.5. The method of claim 3 , wherein the one or more pitch accents comprise a phrase-final lengthening pattern.6. The method of claim 3 , further comprising one or more boundary tones claim 3 , wherein the one or more pitch accents and boundary tones comprise a defined intonation pattern.7. The method of claim 6 , wherein the defined intonation pattern comprises specific rises or falls of the fundamental frequency of a spoken prompt.8. The ...

Подробнее
30-05-2013 дата публикации

VOICE-SCREEN ARS SERVICE SYSTEM, METHOD FOR PROVIDING SAME, AND COMPUTER-READABLE RECORDING MEDIUM

Номер: US20130138443A1
Автор: Kim David, KIM Yong Jin
Принадлежит: CALL GATE CO., LTD.

A method for providing a voice-screen ARS service on a terminal, according to an embodiment of the present invention, uses an application installed on the terminal to connect to an IVR system of a client company via a voice call and connects a data call to a VARS service server. Menu information including a plurality of menu items related to a client is received through a data call and displayed on a screen and voice information related to the menu is received through a voice call and output in audio. Accordingly, when a user uses the ARS, both services of voice and onscreen information are simultaneously provided and thereby decreases the limitations and inaccuracies of provided voice information increases user convenience. 1. A method for providing a voice-screen ARS (automatic response system) service in a terminal , the method comprising the steps of:providing a connection means for allowing a connection to be made to a client company IVR (interactive voice response) system;when a user makes a request for a connection to the client company IVR system by using the connection means, connecting a voice call to the client company IVR system through an Internet network or a mobile communication network and connecting a data call to a VARS (visual ARS) service server;receiving menu information including a plurality of menu items from the VARS service server through the data call and displaying the received menu information including the menu items;transmitting information on a menu item selected by the user from among the displayed menu items to the VARS service server through the data call and to the client company IVR system through the voice call at the same time;receiving screen information corresponding to the selected menu item from the VARS service server, and displaying the received screen information; andreceiving voice information corresponding to the selected menu item from the client company IVR system, and outputting the received voice information.2. ( ...

Подробнее
30-05-2013 дата публикации

MODIFICATION OF OPERATIONAL DATA OF AN INTERACTION AND/OR INSTRUCTION DETERMINATION PROCESS

Номер: US20130138444A1
Автор: George Michael
Принадлежит: SANOFI-AVENTIS DEUTSCHLAND GMBH

It is inter alia disclosed to perform at least one of operating an interaction process with a user of the medical apparatus and determining, based on a representation of at least one instruction given by the user, at least one instruction operable by the medical apparatus. Therein, the at least one of the operating and the determining at least partially depends on operational data. It is further disclosed to receive modification information for modifying at least a part of the operational data, wherein the modification information is at least partially determined based on an analysis of a representation of at least one instruction given by the user. 115-. (canceled)16. A medical apparatus , comprising:a processor configured to perform at least one of operating an interaction process with a user of said medical apparatus and determining, based on a respective representation of at least one instruction given by said user, at least one instruction operable by said medical apparatus, wherein said at least one of said operating an interaction process and said determining at least one instruction at least partially depends on operational data;a communication unit configured to receive modification information for modifying at least a part of said operational data, said modification information at least partially determined based on an analysis of a respective representation of at least one instruction given by a user.17. The medical apparatus according to claim 16 , wherein said at least one instruction is given acoustically by said user and wherein said determining is at least partially based on speech recognition of said respective representation of said at least one instruction given by said user.18. The medical apparatus according to claim 17 , wherein said speech recognition at least partially depends on said operational data claim 17 , and wherein at least a part of said modification information is determined to improve said speech recognition with respect to said ...

Подробнее
13-06-2013 дата публикации

SYSTEM AND METHOD FOR STANDARDIZED SPEECH RECOGNITION

Номер: US20130151252A1
Принадлежит: AT&T Intellectual Property I, L.P.

Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure. 1. A method comprising:receiving speech from a user;determining, via a processor, to apply one of supervised training and unsupervised training; and determining whether available data are sufficient to build a new speech recognition model;', 'when the available data is sufficient to build the new speech recognition model, building the new speech recognition model using the available data; and', selecting an existing speech recognition model; and', 'generating an adapted speech recognition model based on transformations generated from the existing speech recognition model based on the speech and associated transcriptions., 'when the available data is not sufficient to build the new speech recognition model], 'when supervised training is selected2. The method of claim 1 , wherein the new speech recognition model claim 1 , the existing speech recognition model and the adapted speech recognition model are standardized speech models.3. The method of claim 1 , wherein one of the new speech recognition ...

Подробнее
13-06-2013 дата публикации

System and Method for Targeted Tuning of a Speech Recognition System

Номер: US20130151253A1

A system and method of targeted tuning of a speech recognition system are disclosed. A particular method includes detecting that a frequency of occurrence of a particular type of utterance satisfies a threshold. The method further includes tuning a speech recognition system with respect to the particular type of utterance. 1. A method comprising:detecting that a frequency of occurrence of a particular type of utterance satisfies a threshold; andin response to detecting that the frequency satisfies the threshold, tuning a speech recognition system with respect to the particular type of utterance.2. The method of claim 1 , further comprising determining the frequency based on a group of received utterances.3. The method of claim 1 , wherein the threshold is determined by a system administrator.4. The method of claim 1 , wherein the threshold is user programmable.5. The method of claim 1 , wherein tuning the speech recognition system includes inputting a collection of utterances of the particular type of utterance into a learning module of the speech recognition system.6. The method of claim 5 , wherein inputting the collection of utterances includes playing one or more files that represent recordings of the particular type of utterance.7. The method of claim 1 , wherein system recognition of the particular type of utterance is dependent on a particular speaker.8. The method of claim 1 , wherein system recognition of the particular type of utterance is independent of a particular speaker.9. The method of claim 1 , wherein the utterance is one of a single word spoken by a speaker claim 1 , a phrase spoken by the speaker claim 1 , or a sentence spoken by the speaker.10. The method of claim 1 , wherein the utterance corresponds to a request that indicates an action to be taken on an object.11. The method of claim 10 , wherein the request is one of a request to pay a bill claim 10 , a request for an account balance claim 10 , a request to change services claim 10 , a ...

Подробнее
27-06-2013 дата публикации

Discriminative Training of Document Transcription System

Номер: US20130166297A1
Принадлежит: MULTIMODAL TECHNOLOGIES, LLC

A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript. 1. In a system including a first document containing at least some information in common with a spoken audio stream , a method comprising steps of:(A) identifying text in the first document representing a concept having a plurality of spoken forms;(B) replacing the identified text with a context-free grammar specifying the plurality of spoken forms of the concept to produce a second document;(C) generating a first language model based on the second document;(D) using the first language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a third document;(E) filtering text from the third document by reference to the second document to produce a filtered document in which text filtered from the third document is marked as unreliable; and (F)(1) applying a first speech recognition process to the spoken audio stream using a set of base acoustic models and a grammar network based on the filtered document to produce a first set of recognition structures;', '(F)(2) applying a second speech recognition process to the spoken audio stream ...

Подробнее
11-07-2013 дата публикации

SPEECH RECOGNITION APPARATUS

Номер: US20130179154A1
Автор: OKUNO Hiroyuki
Принадлежит: Denso Corporation

A speech recognition apparatus includes a first recognition dictionary, a speech input unit, a speech recognition unit, a speech transmission unit, a recognition result receipt unit, and a control unit. The speech recognition unit recognizes a speech based on a first recognition dictionary, and outputs a first recognition result. A server recognizes the speech based on a second recognition dictionary, and outputs a second recognition result. The control unit determines a likelihood level of a selected candidate obtained based on the first recognition result, and accordingly controls an output unit to output at least one of the first recognition result and the second recognition result. When the likelihood level of the selected candidate is equal to or higher than a threshold level, the control unit controls the output unit to output the first recognition result irrespective of whether the second recognition result is received from the server. 1. A speech recognition apparatus comprising:a first recognition dictionary that stores a plurality of first phoneme strings, which are respectively converted from a plurality of text data;a speech input unit that inputs a speech made by a user;a speech recognition unit that recognizes the speech based on the first recognition dictionary and outputs a first recognition result;a speech transmission unit that transmits the speech to a server, the server including a second recognition dictionary that stores a plurality of second phoneme strings respectively converted from the plurality of text data, the server recognizing the speech based on the second recognition dictionary and outputting a second recognition result;a recognition result receipt unit that receives the second recognition result from the server; anda control unit that determines a likelihood level of a selected candidate obtained based on the first recognition result, and controls an output unit to output at least one of the first recognition result and the second ...

Подробнее
11-07-2013 дата публикации

METHOD AND APPARATUS FOR EXECUTING A USER FUNCTION USING VOICE RECOGNITION

Номер: US20130179173A1
Автор: Lee Dongyeol, PARK Sehwan
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

A method and an apparatus for executing a user function using voice recognition. The method includes displaying a user function execution screen; confirming a function to be executed according to voice input; displaying a voice command corresponding to the confirmed function on the user function execution screen; recognizing a voice input by a user, while a voice recognition execution request is continuously received; and executing the function associated with the input voice command, when the recognized voice input is at least one of the displayed voice command. 1. A method for executing a user function by an electronic device using voice recognition , the method comprising:displaying a user function execution screen;confirming a function to be executed according to voice input;displaying a voice command corresponding to the confirmed function on the user function execution screen;recognizing a voice input by a user, while a voice recognition execution request is continuously received; andexecuting the function associated with the input voice command, when the recognized voice input is at least one of the displayed voice command.2. The method of claim 1 , wherein the voice command is displayed around an image component of the user function execution screen or in a blanket of the user function execution screen.3. The method of claim 1 , wherein the voice command is displayed around an image component associated with a function corresponding to the voice command.4. The method of claim 1 , wherein the voice command is displayed around a mounted location of a key input unit generating a key input event claim 1 , when a function executed according to the voice input is a function executed by the key input event.5. The method of claim 1 , further comprises determining whether the voice input by the user corresponds to at least one of the displayed voice command.6. The method of claim 1 , wherein the function includes one of a function executed when a touch event and a ...

Подробнее
18-07-2013 дата публикации

USER SPEECH INTERFACES FOR INTERACTIVE MEDIA GUIDANCE APPLICATIONS

Номер: US20130185080A1
Принадлежит: UNITED VIDEO PROPERTIES, INC.

A user speech interface for interactive media guidance applications, such as television program guides, guides for audio services, guides for video-on-demand (VOD) services, guides for personal video recorders (PVRs), or other suitable guidance applications is provided. Voice commands may be received from a user and guidance activities may be performed in response to the voice commands. 1153-. (canceled)154. A system for generating a customized voice control media interface , comprising: receive a voice command entered by a user at a first device, wherein the voice command comprises a media control request;', 'identify the user based on data stored in memory at the processing circuitry associating the user with the first device;', 'generate a customized feature based on the identified user; and', 'execute the media control requested by the user., 'processing circuitry configured to155. The system of claim 154 , wherein:the media control request comprises a request to store media content; andthe customized feature comprises storing the media content to a file associated with the identified user.156. The system of claim 154 , wherein:the media control request comprises a request to play media content; andthe customized feature comprises presenting media content from a media content source associated with the identified user.157. The system of claim 154 , wherein the customized feature comprises a targeted advertisement selected based on the identified user.158. The system of claim 154 , wherein the customized feature is a favorites list that comprises preferred media content or media sources associated with the identified user.159. The system of claim 154 , wherein the first device comprises a display and speaker configured to present media content to the user.160. The system of claim 159 , wherein:the media control request includes a request for information identifying media content that is available on the first device; andthe first device is configured to generate ...

Подробнее
25-07-2013 дата публикации

COMPUTERIZED INFORMATION AND DISPLAY APPARATUS

Номер: US20130188055A1
Автор: Gazdzinski Robert F.
Принадлежит: WEST VIEW RESEARCH, LLC

Apparatus useful for obtaining and displaying information. In one embodiment, the apparatus includes a network interface, display device, and speech recognition apparatus configured to receive user speech input and enable performance of various tasks via a remote entity, such as obtaining desired information relating to maps or directions, or any number of other topics. The downloaded data may also, in one variant, be displayed with contextually related advertising or other content. 140-. (canceled)41. Computerized information and display apparatus , comprising:a network interface;processing apparatus in data communication with the network interface;a display device; anda storage apparatus comprising at least one computer program, said at least one program being configured to, when executed:obtain digitized speech generated based on speech received from a user, the digitized speech relating to a query for desired information which the user wishes to find; andcause, based at least in part on the digitized speech, access of a remote network entity to cause retrieval of the desired information;wherein the apparatus is further configured to display advertising content on the display device, the content received via the network interface and selected based at least in part on the digitized speech.42. The apparatus of claim 41 , wherein the received content is selected from a plurality of advertising content that is contextually related to the desired information.43. The apparatus of claim 42 , wherein the desired information comprises information relating to an entity or location.44. The apparatus of claim 43 , wherein the desired information comprises information relating to a business entity or organization claim 43 , and the contextual relationship comprises a contextual relationship between the selected content and an industry or type of the business entity or organization.45. The apparatus of claim 41 , wherein the desired action comprises obtaining information ...

Подробнее
25-07-2013 дата публикации

Automatic Door

Номер: US20130191123A1
Автор: Clough Bradford A.
Принадлежит: Altorr Corporation

In some implementations a storage device having a voice-recognition engine stored thereon is coupled to a microcontroller, a device-controller for an automatic door is operably coupled to the microcontroller. 1. An apparatus comprising:a command receiver that is operable to detect a command to open a door;a door opener that is operably coupled to the command receiver and that is operable to initiate opening of the door when the command receiver detects the command to open the door;an obstacle detector that is operably coupled to the command receiver and that is operable to be initiated when the command receiver detects the command to open the door, obstacle detector also operable to perform an obstacle detection process while the door is opening, the obstacle detector also operable to evaluate an obstacle warning parameter when an obstacle is being detected; anda device controller that is operably coupled to the door opener and the obstacle detector, the device controller being is operable to halt the door opening when the obstacle warning parameter is set to NO, the device controller also operable to initialize a warning counter to a maximum number of iterations of a warning when the obstacle warning parameter is set to YES, the device controller also operable to perform a loop the maximum the number of iterations indicated by the warning counter when the warning counter is initialized, the loop providing an obstacle warning, and polling for a response, the device controller also operable to perform a predetermined default action when no response to the obstacle warning is received, the device controller also operable to perform a door command in accordance with the response when a response to the obstacle warning is received.2. The apparatus of claim 1 , wherein the loop further comprises:responsive to the warning counter being initialized, decrement the warning counter by 1;provide the obstacle warning;start a timer;performing a door command in accordance with ...

Подробнее
15-08-2013 дата публикации

SYSTEM AND METHOD FOR PROVIDING A NATURAL LANGUAGE VOICE USER INTERFACE IN AN INTEGRATED VOICE NAVIGATION SERVICES ENVIRONMENT

Номер: US20130211710A1
Принадлежит: VoiceBox Technologies, Inc.

A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment. 1. A method for providing a natural language voice user interface , comprising:receiving a natural language utterance from an input device associated with a navigation device, wherein the natural language utterance relates to navigation;determining a current location of the computing device;selecting, from among a plurality of sets of location-specific grammar information, a set of location-specific grammar information based on proximity between the current location and a location associated with the set of location-specific grammar information;generating a recognition grammar with the set of location-specific grammar information;generating one or more interpretations of the natural language utterance using the recognition grammar;determining, from the one or more interpretations, a destination having a first full or partial address;determining a route from the current location associated with the navigation device to the first full or partial address of the destination;receiving subsequent natural language utterances from the input device;determining a second full or partial address from the subsequent natural language utterances; andupdating the destination with the second full or partial address.2. The method of ...

Подробнее
22-08-2013 дата публикации

Sound Recognition Operation Apparatus and Sound Recognition Operation Method

Номер: US20130218562A1
Автор: Igarashi Yoshihiro
Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to one embodiment, a sound recognition operation apparatus includes a sound detection module, a keyword detection module, an audio mute module, and a transmission module. The sound detection module is configured to detect sound. The keyword detection module is configured to detect a particular keyword using voice recognition when the sound detection module detects sound. The audio mute module is configured to transmit an operation signal for muting audio sound when the keyword detection module detects the keyword. The transmission module is configured to recognize the voice command after the keyword is detected by the keyword detection module, and transmit an operation signal corresponding to the voice command. 1. (canceled)2. An electronic device comprising:a word recognizer configured to recognize a predetermined word by voice recognition;a command recognizer configured to recognize a voice command if the predetermined word is recognized; anda transmitter configured to transmit a signal corresponding to the recognized voice command.3. The electronic device of claim 2 , wherein the word recognizer is configured to recognize the predetermined word indicating an electronic device to be controlled.4. The electronic device of claim 3 , wherein the predetermined word comprises a word of “television”.5. The electronic device of claim 2 , wherein the predetermined word comprises a predetermined specific keyword.6. The electronic device of claim 2 , wherein the command recognizer is configured to recognize a voice command for controlling an electronic device.7. The electronic device of claim 6 , wherein the electronic device comprises a television broadcast receiving apparatus.8. The electronic device of claim 2 , further comprising a microphone configured to receive the predetermined word and the voice command.9. The electronic device of claim 2 , further comprising a notifier configured to notify one of a set state and an operation state of the electronic ...

Подробнее
22-08-2013 дата публикации

System and Method for Providing a Natural Language Interface to a Database

Номер: US20130218564A1
Принадлежит: AT&T INTELLECTUAL PROPERTY II, L.P.

A system and method for providing a natural language interface to a database or the Internet. The method provides a response from a database to a natural language query. The method comprises receiving a user query, extracting key data from the user query, submitting the extracted key data to a data base search engine to retrieve a top n pages from the data base, processing of the top n pages through a natural language dialog engine and providing a response based on processing the top n pages. 1. A method comprising:extracting, via a processor, key data from a user query;submitting the key data to a search engine to perform a search and to retrieve a set of top n pages from a database, wherein in response to a restriction to access a restricted page of the set of top n pages, the processor provides data to the database to overcome the restriction independent of a user navigation to the restricted page;providing, at a first time, a response to the user query;after providing the response at the first time, continuing, without further user input, to search for information associated with the user query using a machine learning process to expand the search; andpresenting an option to a device associated with a user, at a second time which is later than the first time, to view the related information separate from the response.2. The method of claim 1 , wherein the response is text-based and audible.3. The method of claim 1 , wherein the user query is one of a natural language speech query and a text-based query.4. The method of claim 1 , wherein the key data comprises one of keywords and key phrases.5. The method of claim 1 , wherein submitting of the key data to the search engine further comprises submitting the key data to a plurality of search engines.6. The method of claim 1 , wherein the user query is received via a speech recognizer.7. The method of claim 1 , wherein the response is a natural language response provided via synthetic speech.8. The method of claim 1 ...

Подробнее
22-08-2013 дата публикации

Management and Prioritization of Processing Multiple Requests

Номер: US20130218574A1
Принадлежит: MICROSOFT CORPORATION

Systems and methods are described for systems that utilize an interaction manager to manage interactions—also known as requests or dialogues—from one or more applications. The interactions are managed properly even if multiple applications use different grammars. The interaction manager maintains a priority for each of the interactions, such as via an interaction list, where the priority of the interactions corresponds to an order in which the interactions are to be processed. Interactions are normally processed in the order in which they are received. However, the systems and method described herein may provide a grace period after processing a first interaction and before processing a second interaction. If a third interaction that is chained to the first interaction is received during this grace period, then the third interaction may be processed before the second interaction. 1. A system , comprising:memory;one or more processors; and assign processing priorities to a plurality of requests wherein each request corresponds to a priority in which the request is to be processed, such that a first request having a higher priority is processed before a second request having a lower priority; and', 'provide a grace period after processing the first request, wherein in response to determining that the system receives a third request chained to the first request during the grace period, the system is configured to process the third request prior to processing the second request., 'an interaction manager maintained in the memory and executable by the one or more processors to2. The system as recited in claim 1 , wherein the interaction manager interrupts a request currently being processed when a received request is assigned a priority higher than the interrupted request claim 1 , the interrupted request resuming processing after the received request is processed.3. The system as recited in claim 1 , wherein the interaction manager interrupts a request currently being ...

Подробнее
29-08-2013 дата публикации

SPOKEN CONTROL FOR USER CONSTRUCTION OF COMPLEX BEHAVIORS

Номер: US20130226580A1
Принадлежит: Fluential, LLC

A device interface system is presented. Contemplated device interfaces allow for construction of complex device behaviors by aggregating device functions. The behaviors are triggered based on conditions derived from environmental data about the device. 1. A device interface comprising:a dialog interface module disposed within a device and configured to accept a signal comprising a representation of a spoken utterance;a data source connection configured to acquire environment data from a plurality of data sources and representative of a device environment;a device function database storing primitive device functions indexed by device state attributes;an interaction history database storing previous interactions indexed by environment data attributes; and obtain previous interaction from the interaction history database by submitting a query to the interacting history database, the query instantiated based on environment data attributes derived from the environment data;', 'derive a device state from at least one of the environment data and previous interactions;', 'obtain a set of primitive device functions from the device function database based on the device state;', 'instantiate a future device behavior constructed from the set of primitive functions and the dialog signal;', 'create a trigger as a function of the future device behavior and the device state; and', 'configure the device to exhibit the future device behavior upon satisfaction of the trigger., 'a triggering module coupled with the dialog interface, data source interface, and the interaction history database and configured to2. The interface of claim 1 , wherein the device state comprises a current device state.3. The interface of claim 1 , wherein the device state comprises a previous device state.4. The interface of claim 1 , wherein the device state comprises a future device state.5. The interface of claim 1 , wherein the device state comprises a functional state of the device.6. The interface of ...

Подробнее
05-09-2013 дата публикации

Context Sensitive Overlays In Voice Controlled Headset Computer Displays

Номер: US20130231937A1
Принадлежит: Kopin Corporation

In headset computers that leverage voice commands, often the user does not know what voice commands are available. In one embodiment, a method includes providing a user interface in a headset computer and, in response to user utterance of a cue toggle command, displaying at least one cue in the user interface. Each cue can correspond to a voice command associated with code to execute. In response to user utterance of the voice command, the method can also include executing the code associated with the voice command. The user can therefore ascertain what voice commands are available. 1. A method comprising:providing a user interface in a headset computer;in response to user utterance of a cue toggle command, displaying at least one cue, each cue corresponding to a voice command associated with code to execute, in the user interface; andin response to user utterance of the voice command, executing the code associated with the voice command.2. The method of claim 1 , further comprising:displaying the interface without the cue at least one of prior to the cue toggle command and after a subsequent cue toggle command.3. The method of claim 1 , wherein displaying the cue includes displaying words that activate the voice command.4. The method of claim 1 , wherein displaying the cue includes displaying the cue in the user interface corresponding to the voice command associated with the control claim 1 , the control displayed in the user interface.5. The method of claim 1 , wherein displaying the cue includes displaying the cue in the user interface corresponding to the voice command associated with the control claim 1 , the control hidden from the user interface.6. The method of claim 1 , wherein displaying the cue includes displaying the cue in the user interface corresponding to the voice command associated with the control claim 1 , the control being a global headset control.7. The method of claim 1 , wherein the cue is loaded from a control claim 1 , the control ...

Подробнее
12-09-2013 дата публикации

System and Method for Automatically Generating a Dialog Manager

Номер: US20130238333A1
Принадлежит: AT&T Intellectual Property I, L.P.

Disclosed herein are systems, methods, and computer-readable storage media for automatically generating a dialog manager for use in a spoken dialog system. A system practicing the method receives a set of user interactions having features, identifies an initial policy, evaluates all of the features in a linear evaluation step of the algorithm to identify a set of most important features, performs a cubic policy improvement step on the identified set of most important features, repeats the previous two steps one or more times, and generates a dialog manager for use in a spoken dialog system based on the resulting policy and/or set of most important features. Evaluating all of the features can include estimating a weight for each feature which indicates how much each feature contributes to at least one of the identified policies. The system can ignore features not in the set of most important features. 1. A method comprising:identifying, via a processor, features from a set of user interactions;identifying a policy for using the features in developing a dialog manager;performing, based on the policy, a linear evaluation on the features, to yield a set of features;repeating a cubic policy process on the set of features until the set of features results in a reduced set of features having a quantity below a threshold; andgenerating the dialog manager using a modified set of user interactions, the modified set of user interactions being selected based on the reduced set of features.2. The method of claim 1 , wherein the cubic policy process comprises a least-squares policy iteration algorithm.3. The method of claim 1 , wherein the linear evaluation comprises estimating a weight for each feature in the features.4. The method of claim 3 , wherein the weight of each feature indicates how much each feature contributes to the policy.5. The method of claim 1 , further comprising ignoring claim 1 , during generation of the dialog manager claim 1 , features which are not in the ...

Подробнее
12-09-2013 дата публикации

ENDPOINT DETECTION APPARATUS FOR SOUND SOURCE AND METHOD THEREOF

Номер: US20130238335A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

An apparatus for detecting endpoints of sound signals when sound sources vocalized from a remote site are processed even if a plurality of speakers exists and an interference sound being input from a direction different from a direction of one speaker, and a method thereof, wherein in an environment in which a plurality of sound sources exists, the existence and the length of the sound source being input according to each direction is determined and the endpoint is found, thereby improving the performance of the post-processing, and speech being input from a direction other than a direction of speech from a speaker vocalized at a remote area from a sound source collecting unit is distinguished while the speech from the speaker is being recorded, thereby enabling a remote sound source recognition without restriction on the installation region of a microphone. 1. An apparatus for detecting endpoints of a plurality of sounds signals from a plurality of sound sources , the apparatus comprising:a plurality of microphones configured to receive the plurality of sound source signals from the plurality of sound sources;a sound source position detecting unit configured to detect positions of the plurality of sound sources from the sound source signals received through the plurality of microphones;a sound source position change determination unit configured to determine a change in position of the sound source according to each direction by reading the positions of the plurality of sound sources detected through the sound source position detecting unit;a sound source maintenance time calculating unit configured to calculate a sound source maintenance time of the sound source at a predetermined position by reading the positions of the plurality of sound sources detected through the sound source position detecting unit; andan endpoint determination unit configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by ...

Подробнее
19-09-2013 дата публикации

ELECTRONIC DEVICE AND METHOD FOR CONTROLLING POWER USING VOICE RECOGNITION

Номер: US20130246071A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

An electronic apparatus and a power controlling method are provided. The electronic apparatus includes: a voice input unit which receives an audio input in a stand-by mode of the electronic apparatus; a voice sensing unit which determines whether the received audio input is a user voice, and if the user voice is input, outputs a power control signal; and a power control voice recognition unit which, if the power control signal is received from the voice recognition unit, turns on and performs voice recognition regarding the input user voice. 1. An electronic apparatus , comprising:a voice input unit which receives an audio input in a stand-by mode of the electronic apparatus;a voice sensing unit which determines whether the received audio input is a user voice , and outputs a first power control signal in response to determining that the received audio input is the user voice; anda power control voice recognition unit which, in response to receiving the first power control signal from the voice recognition unit, turns on and performs voice recognition regarding the received audio input.2. The apparatus as claimed in claim 1 , wherein the power control voice recognition unit determines whether the received audio input is to control power of the electronic apparatus.3. The apparatus as claimed in claim 2 , further comprising:a main control unit which controls the electronic apparatus,wherein the power control voice recognition unit transmits a second power control signal to the main control unit in response to determining that the received audio input is to control the power of the electronic apparatus, andwherein the main control unit converts a mode of the electronic apparatus from the stand-by mode into an operation mode in response to receiving the second power control signal from the power control voice recognition unit.4. The apparatus as claimed in claim 3 , wherein the power control voice recognition unit turns off after a predetermined time elapses upon ...

Подробнее
19-09-2013 дата публикации

System and Method for Customized Voice Response

Номер: US20130246072A1
Автор: Duffield Nicholas
Принадлежит: AT&T Intellectual Property I, L.P.

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages. 1. A method comprising:collecting a user-specific services list associated with a user about to use an interactive voice response system;for each service in the user-specific services list, generating country-specific weights;selecting an interactive voice response system language model based on an aggregation of the country-specific weights; andrecognizing speech received from the user via the interactive voice response system based on the interactive voice response system language model.2. The method of claim 1 , wherein the interactive voice response system changes a user interface based on the interactive voice response language model.3. The method of claim 1 , wherein the interactive voice response system selects language options for a splash screen based on the country-specific weights.4. The method of claim 1 , wherein the interactive voice response system tunes a voice recognition algorithm based on the country-specific weights.5. The method of claim 1 , wherein the ...

Подробнее
26-09-2013 дата публикации

Speech Conversation Support Apparatus, Method, and Program

Номер: US20130253924A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to one embodiment, a speech conversation support apparatus includes a division unit, an analysis unit, a detection unit, an estimation unit and an output unit. The division unit divides a speech data item including a word item and a sound item into a plurality of divided speech data items. The analysis unit obtains an analysis result. The detection unit detects, for each divided speech data item, at least one clue expression indicating one of an instruction by a user and a state of the user. The estimation unit estimates, if the clue expression is detected, playback data item from at least one divided speech data item corresponding to a speech uttered before the clue expression is detected. The output unit outputs the playback data item. 1. A speech conversation support apparatus , comprising:a division unit configured to divide, a speech data item including a word item and a sound item, into a plurality of divided speech data items, in accordance with at least one of a first characteristic of the word item and a second characteristic of the sound item;an analysis unit configured to obtain an analysis result on the at least one of the first characteristic and the second characteristic, for each divided speech data item;a first detection unit configured to detect, for each divided speech data item, at least one clue expression indicating one of an instruction by a user and a state of the user in accordance with at least one of an utterance by the user and an action by the user;an estimation unit configured to estimate, if the clue expression is detected, at least one playback data item from at least one divided speech data item corresponding to a speech uttered before the clue expression is detected, based on the analysis result; andan output unit configured to output the playback data item.2. The apparatus according to claim 1 , further comprising an indication unit configured to generate claim 1 , if the clue expression detected by the first detection ...

Подробнее
26-09-2013 дата публикации

CONVERSATION SUPPORTING DEVICE, CONVERSATION SUPPORTING METHOD AND CONVERSATION SUPPORTING PROGRAM

Номер: US20130253932A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

A conversation supporting device of an embodiment of the present disclosure has a information storage unit, a recognition resource constructing unit, and a voice recognition unit. Here, the information storage unit stores the information disclosed by a speaker. The recognition resource constructing unit uses the disclosed information to construct the recognition resource including a voice model and a language model for recognition of voice data. The voice recognition unit uses the recognition resource to recognize the voice data. 1. A conversation supporting device comprising:a storage unit configured to store information disclosed by a speaker;a recognition resource constructing unit configured to use the disclosed information in constructing a recognition resource for voice recognition using one of an acoustic model and a language model; anda voice recognition unit configured to use the recognition resource to generate text data corresponding to the voice data.2. The conversation supporting device of claim 1 , further comprising:a voice information storage unit configured to store the voice data correlated to identification information, the identification information including an identity of a speaker of a talk contained in the voice data, and a time information of the talk contained in the voice data; anda conversation interval determination unit configured to use the voice data, the identification information, and the time information to determine a conversation interval in the voice data when the voice data contains a plurality of talks from a plurality of speakers;wherein the recognition resource constructing unit is further configured to use the information disclosed by the plurality of speakers who spoke during the conversation interval to construct the recognition resource, andthe voice recognition unit is further configured to recognize the voice data corresponding to the conversation interval determined by the conversation interval determination unit.3. ...

Подробнее
03-10-2013 дата публикации

Voice-Enabled Touchscreen User Interface

Номер: US20130257780A1
Автор: Baron Charles
Принадлежит:

An electronic device may receive a touch selection of an element on a touch screen. In response, the electronic device may enter a listening mode for a voice command spoken by a user of the device. The voice command may specify a function which the user wishes to apply to the selected element. Optionally, the listening mode may be limited a defined time period based on the touch selection. Such voice commands in combination with touch selections may facilitate user interactions with the electronic device. 1. A method for controlling an electronic device , comprising:receiving a touch selection of a selectable element displayed on a touch screen of the electronic device;in response to receiving the touch selection, enabling the electronic device to listen for a voice command directed to the selectable element; andin response to receiving the voice command, applying a function associated with the voice command to the selectable element.2. The method of wherein the selectable element is one of a plurality of selectable elements represented on the touch screen.3. The method of including:receiving a second touch selection of a second selectable element of the plurality of selectable elements;in response to receiving the second touch selection, enabling the electronic device to listen for a second voice command directed to the second selectable element; andin response to receiving the second voice command, applying a function associated with the second voice command to the second selectable element.4. The method of including receiving the voice command using a microphone of the electronic device.5. The method of including claim 1 , prior to enabling the electronic device to listen for the voice command claim 1 , determining that an ambient sound level does not exceed a maximum noise level.6. The method of including claim 1 , prior to enabling the electronic device to listen for the voice command claim 1 , determining that an ambient sound type is not similar to spoken ...

Подробнее
03-10-2013 дата публикации

SPOKEN DIALOG SYSTEM USING PROMINENCE

Номер: US20130262117A1
Автор: Heckmann Martin
Принадлежит: HONDA RESEARCH INSTITUTE EUROPE GMBH

The invention presents a method for analyzing speech in a spoken dialog system, comprising the steps of: accepting an utterance by at least one means for accepting acoustical signals, in particular a microphone, analyzing the utterance and obtaining prosodic cues from the utterance using at least one processing engine, wherein the utterance is evaluated based on the prosodic cues to determine a prominence of parts of the utterance, and wherein the utterance is analyzed to detect at least one marker feature, e.g. a negative statement, indicative of the utterance containing at least one part to replace at least one part in a previous utterance, the part to be replaced in the previous utterance being determined based on the prominence determined for the parts of the previous utterance and the replacement parts being determined based on the prominence of the parts in the utterance, and wherein the previous utterance is evaluated with the replacement part(s). 1. A method for analyzing speech in a spoken dialog system , comprising the steps of:accepting an utterance by at least one means for accepting acoustical signals, in particular a microphone,analyzing the utterance and obtaining prosodic cues from the utterance using at least one processing engine,wherein the utterance is evaluated based on the prosodic cues to determine a prominence of parts of the utterance, and wherein the utterance is analyzed to detect at least one marker feature, e.g. a negative statement, indicative of the utterance containing at least one part to replace at least one part in a previous utterance, the part to be replaced in the previous utterance being determined based on the prominence determined for the parts of the previous utterance and the replacement parts being determined based on the prominence of the parts in the utterance, and wherein the previous utterance is evaluated with the replacement part(s).2. The method of claim 1 , wherein the utterance is a correction of the previous ...

Подробнее
17-10-2013 дата публикации

Automatic Updating of Confidence Scoring Functionality for Speech Recognition Systems

Номер: US20130275135A1
Принадлежит:

Automatically adjusting confidence scoring functionality is described for a speech recognition engine. Operation of the speech recognition system is revised so as to change an associated receiver operating characteristic (ROC) curve describing performance of the speech recognition system with respect to rates of false acceptance (FA) versus correct acceptance (CA). Then a confidence scoring functionality related to recognition reliability for a given input utterance is automatically adjusted such that where the ROC curve is better for a given operating point after revising the operation of the speech recognition system, the adjusting reflects a double gain constraint to maintain FA and CA rates at least as good as before revising operation of the speech recognition system. 1. A method for automatically adjusting operation of a speech recognition system comprising:revising operation of the speech recognition system so as to change an associated receiver operating characteristic (ROC) curve describing performance of the speech recognition system with respect to rates of false acceptance (FA) versus correct acceptance (CA); andautomatically adjusting a confidence scoring functionality related to recognition reliability for a given input utterance such that where the ROC curve is better for a given operating point after revising the operation of the speech recognition system, the adjusting reflects a double gain constraint to maintain FA and CA rates at least as good as before revising operation of the speech recognition system.2. A method according to claim 1 , wherein where the ROC curve is not better for a given operating point after revising the operation of the speech recognition system claim 1 , the adjusting minimizes worsening of the FA and CA rates.3. A method according to claim 1 , wherein automatically adjusting the confidence scoring functionality includes establishing a mapping to maintain equivalence of the confidence scoring functionality before and after ...

Подробнее
31-10-2013 дата публикации

Sampling Training Data for an Automatic Speech Recognition System Based on a Benchmark Classification Distribution

Номер: US20130289989A1
Принадлежит:

A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system. 1. A method comprising:obtaining a benchmark classification distribution;selecting, by a computing device, training text strings, wherein the training text strings are associated with respective classifications, and wherein the training text strings are selected such that the respective classifications of the selected training text strings are in proportion to the benchmark classification distribution; andtraining an automatic speech recognition (ASR) system using the training text strings.2. The method of claim 1 , wherein the benchmark classification distribution is a distribution of topic classifications.3. The method of claim 1 , wherein obtaining the benchmark classification distribution comprises:transcribing benchmark utterances to respective benchmark text strings; anddetermining the benchmark classification distribution from the benchmark text strings.4. The method of claim 3 , wherein the benchmark utterances were made by users in a category of users claim 3 , and wherein the ASR system is configured to transcribe new utterances made by users in the category of users.5. The method of claim 4 , wherein the benchmark utterances were made by a single user ...

Подробнее
07-11-2013 дата публикации

APPARATUS AND METHOD FOR SPEECH RECOGNITION

Номер: US20130297304A1
Автор: Kim Sang Hun, KIM Seung Hi

Disclosed is an apparatus for speech recognition and automatic translation operated in a PC or a mobile device. The apparatus for speech recognition according to the present invention includes a display unit that displays a screen for selecting a domain as a unit for a speech recognition region previously sorted for speech recognition to a user; a user input unit that receives a selection of a domain from the user; and a communication unit that transmits the user selection information for the domain. According to the present invention, the apparatus for speech recognition using an intuitive and simple user interface is provided to a user to enable the user to easily select/correct a designation domain of a speech recognition system and improve accuracy and performance of speech recognition and automatic translation by the designated system for speech recognition. 1. An apparatus for speech recognition , comprising:a display unit that displays a screen for selecting a domain for speech recognition to a user;a user input unit that receives a selection of a domain from the user; anda communication unit that transmits the user selection information for the domain.2. The apparatus of claim 1 , wherein the display unit displays a domain selected by the user or a domain previously selected and deselected by the user.3. The apparatus of claim 1 , wherein the display unit classifies and displays a domain representing the domain into a layer according to a speech recognition level.4. The apparatus of claim 3 , wherein the display unit displays a domain for the domain selected by the user among the domains classified and displayed into a layer.5. The apparatus of claim 3 , wherein the layer according to the speech recognition level classifies a general region providing a basic speech recognition region according to a generation situation of speech and the generation situation is re-classified according to generation places.6. The apparatus of claim 3 , wherein the display unit ...

Подробнее
07-11-2013 дата публикации

GENERATING ACOUSTIC MODELS

Номер: US20130297310A1
Принадлежит:

This document describes methods, systems, techniques, and computer program products for generating and/or modifying acoustic models. Acoustic models and/or transformations for a target language/dialect can be generated and/or modified using acoustic models and/or transformations from a source language/dialect. 1. A computer-implemented method comprising:receiving, at a computer system, a request to generate or modify a target acoustic model for a target language;accessing, by the computer system, a source acoustic model for a source language, wherein the source acoustic model includes information that maps acoustic features of the source language to phonemes in a transformed feature space;aligning, using the source acoustic model in the transformed feature space, untransformed voice data in the target language with phonemes in a corresponding textual transcript to obtain aligned voice data, wherein the untransformed voice data is in an untransformed feature space;transforming the aligned voice data according to a particular transform operation using the source acoustic model to obtain transformed voice data;adapting the source acoustic model to the target language using the untransformed voice data in the target language to obtain an adapted acoustic model; andtraining, by the computer system, a target acoustic model for the target language using the transformed voice data and the adapted acoustic model; andproviding the target acoustic model in association with the target language.2. The computer-implemented method of claim 1 , wherein the transformed feature space of the source acoustic model is a Constrained Maximum Likelihood Linear Regression (CMLLR) feature space that is generated from a CMLLR transform operation.3. The computer-implemented method of claim 1 , wherein the source acoustic model is generated from performance of a Linear Discriminant Analysis (LDA) transform operation claim 1 , Vocal Tract Length Normalization (VTLN) transform operation claim 1 , ...

Подробнее
07-11-2013 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING PROGRAM

Номер: US20130297311A1
Принадлежит: SONY CORPORATION

An information processing apparatus including: a high-quality-voice determining section configured to determine a voice, which can be determined to have been collected under a good condition, as a good-condition voice included in mixed voices pertaining to a group of voices collected under different conditions; and a voice recognizing section configured to carry out voice recognition processing by making use of a predetermined parameter on the good-condition voice determined by the high-quality-voice determining section, modify the value of the predetermined parameter on the basis of a result of the voice recognition processing carried out on the good-condition voice, and carry out the voice recognition processing by making use of the predetermined parameter having the modified value on a voice included in the mixed voices as a voice other than the good-condition voice.

Подробнее
07-11-2013 дата публикации

ACOUSTIC MODEL ADAPTATION USING GEOGRAPHIC INFORMATION

Номер: US20130297313A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location. 1. A system comprising:one or more computers; and receiving an audio signal that corresponds to an utterance recorded by a mobile device,', 'determining a geographic location associated with the mobile device,', 'adapting one or more acoustic models for the geographic location, and', 'performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location., 'one or more computer-readable media coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising2. The system of claim 1 , wherein adapting one or more acoustic models further comprises adapting one or more acoustic models before receiving the audio signal that corresponds to the utterance.3. The system of claim 1 , wherein adapting one or more acoustic models further comprises adapting one or more acoustic models after receiving the audio signal that corresponds to the utterance.4. The system of claim 1 , wherein:the operations further comprise receiving geotagged audio signals that correspond to audio recorded by multiple mobile devices in multiple geographic locations; andadapting one or more acoustic models for the geographic location further comprises adapting one or more acoustic models for the geographic location using a subset of the geotagged audio signals.5. The system of claim 4 ...

Подробнее
14-11-2013 дата публикации

SYSTEM AND METHOD FOR PROCESSING MULTI-MODAL DEVICE INTERACTIONS IN A NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT

Номер: US20130304473A1
Принадлежит: VoiceBox Technologies, Inc.

A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction. 118-. (canceled)19. A computer-implemented method of facilitating natural language utterance processing via multiple input modes , the method being implemented on a computer that includes one or more physical processors executing one or more computer program modules that perform the method , the method comprising:receiving, via a first input mode, a first input;receiving, via a second input mode that is different from the first input mode, a second input that relates to the first input;determining a request based on the first input or the second input;determining, based on the first input and the second input, context information for the request; andprocessing the request based on the context information.20. The method of claim 19 , wherein determining the request comprises determining an action claim 19 , a query claim 19 , a command claim 19 , or a task based on the first input or the second input.21. The method of claim 19 , wherein receiving the first input comprises receiving a natural language utterance via a voice input mode claim 19 , and wherein receiving the second input comprises receiving a non-voice ...

Подробнее
05-12-2013 дата публикации

METHOD OF PROVIDING VOICE RECOGNITION SERVICE AND ELECTRONIC DEVICE THEREFOR

Номер: US20130325460A1
Автор: Cho Young-Ik, Kim Joo-Hyun
Принадлежит:

A method and an electronic device provide a voice recognition service. The method includes displaying one or more application programs according to a voice command input through a microphone, determining an additional service to be driven in a selected application program in consideration of the voice command when the any one of the one or more application programs is selected, and displaying the additional service. 1. A method of providing a voice recognition service , the method comprising:displaying one or more application programs that are executable according to a voice command input through a microphone;determining an additional service to be driven in a selected application program based on the voice command when the any one of the one or more application programs is selected; anddisplaying the additional service.2. The method of claim 1 , wherein the determination of the additional service comprises:verifying whether there is an additional service corresponding to the voice command in the selected application program; anddetermining the additional service to be driven in the selected application program in consideration of the voice command when there is the additional service corresponding to the voice command.3. The method of claim 2 , further comprising displaying a home picture of the selected application program when there is no additional service corresponding to the voice command.4. The method of claim 3 , further comprising mapping the voice command with the additional service when any one additional service is selected on the home picture of the application.5. The method of claim 1 , further comprising determining the number of application programs which are executable according to the voice command claim 1 ,wherein the displaying of the application programs comprises displaying a plurality of application programs when there are the plurality of application programs which are executable according to the voice command.6. The method of claim 5 , ...

Подробнее
05-12-2013 дата публикации

METHODS AND APPARATUS FOR PERFORMING TRANSFORMATION TECHNIQUES FOR DATA CLUSTERING AND/OR CLASSIFICATION

Номер: US20130325471A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

Some aspects include transforming data, at least a portion of which has been processed to determine at least one representative vector associated with each of a plurality of classifications associated with the data to obtain a plurality of representative vectors. Techniques comprise determining a first transformation based, at least in part, on the plurality of representative vectors, applying at least the first transformation to the data to obtain transformed data, and fitting a plurality of clusters to the transformed data to obtain a plurality of established clusters. Some aspects include classifying input data by transforming the input data using at least the first transformation and comparing the transformed input data to the established clusters. 133-. (canceled)34. A method of classifying input data as belonging to one of a plurality of classifications , the plurality of classifications associated with a respective plurality of clusters that were fit to training data , the method comprising:obtaining a first transformation used to transform the training data when the plurality of clusters were fit to the training data, the first transformation based, at least in part, on a plurality of representative vectors determined from the training data, the plurality of representative vectors including at least one representative vector determined for each of the plurality of classifications;transforming the input data using at least the first transformation to obtain transformed input data;comparing the transformed input data to the plurality of clusters to determine which cluster of the plurality of clusters the input data should be associated with; andclassifying the input data according to a classification of the plurality of classifications associated with the determined cluster.35. The method of claim 34 , wherein each of the plurality of representative vectors was determined based on its location with respect to training data associated with other classifications ...

Подробнее
05-12-2013 дата публикации

METHODS AND APPARATUS FOR PERFORMING TRANSFORMATION TECHNIQUES FOR DATA CLUSTERING AND/OR CLASSIFICATION

Номер: US20130325472A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

Some aspects include transforming data, at least a portion of which has been processed to determine frequency information associated with features in the data. Techniques include determining a first transformation based, at least in part, on the frequency information, applying at least the first transformation to the data to obtain transformed data, and fitting a plurality of clusters to the transformed data to obtain a plurality of established clusters. Some aspects include classifying input data by transforming the input data using at least the first transformation and comparing the transformed input data to the established clusters. 133-. (canceled)34. A method of classifying input data as belonging to one of a plurality of classifications , the plurality of classifications associated with a respective plurality of clusters that were fit to training data , the method comprising:obtaining a first transformation used to transform the training data when the plurality of clusters were fit to the training data, the first transformation based, at least in part, on frequency information associated with features that were represented in the training data;transforming the input data using at least the first transformation to obtain transformed input data;comparing the transformed input data to the plurality of clusters to determine which cluster of the plurality of clusters the input data should be associated with; andclassifying the input data according to a classification of the plurality of classifications associated with the cluster that the input data was determined to be associated with.35. The method of claim 34 , wherein the frequency information included feature counts corresponding to a number of times given features occurred in at least a portion of the training data.36. The method of claim 35 , wherein the at least a portion of the training data included a plurality of observations claim 35 , and wherein each of the plurality of observations was associated ...

Подробнее
05-12-2013 дата публикации

DIALOGUE MODELS FOR VEHICLE OCCUPANTS

Номер: US20130325483A1
Принадлежит: GM GLOBAL TECHNOLOGY OPERATIONS LLC

Methods and apparatus for creating and managing multiple dialogue models in a statistical dialogue modeling system capable of learning, and conducting human-machine dialogues based on selected models. Dialogue models are selected according to feature vectors that describe characteristics of the dialogue participants and their current situation. Mobile apparatus in motor vehicles can provide optimized dialogue service to occupants of the motor vehicles according to vehicle location and route, in addition to personal characteristics of the occupants, whether driver or passenger. When networked via a remote dialogue server, a large pool of dialogue participants is available for automatic building of dialogue models suitable for handling a variety of situations and participants. 1. A method for operating a device to conduct a dialogue with a human dialogue participant in an environment , the method comprising:obtaining a parameter related to at least one feature selected from a group consisting of: a feature of the dialogue participant; and a feature of the environment;selecting a specific dialogue model from a plurality of dialogue models, such that the specific dialogue model is associated with the parameter;generating, by the device, at least one output dialogue action based on the specific dialogue model; andpresenting, by the device, the at least one output dialogue action to the human dialogue participant.2. The method of claim 1 , further comprising constructing a feature vector claim 1 , wherein the feature vector is derived at least in part from the parameter.3. The method of claim 2 , further comprising determining a cluster of human dialogue participants.4. The method of claim 3 , further comprising selecting a dialogue model for a given cluster.5. The method of claim 1 , further comprising:grouping a plurality of human dialogue participants into a plurality of clusters; andcreating a dialogue model for each cluster of the plurality of clusters.6. The method ...

Подробнее
12-12-2013 дата публикации

VOICE ACTIVATED SEARCH AND CONTROL FOR APPLICATIONS

Номер: US20130332168A1
Принадлежит:

A method for voice activated search and control comprises converting, using an electronic device, multiple first speech signals into one or more first words. The one or more first words are used for determining a first phrase contextually related to an application space. The first phrase is used for performing a first action within the application space. Multiple second speech signals are converted, using the electronic device, into one or more second words. The one or more second words are used for determining a second phrase contextually related to the application space. The second phrase is used for performing a second action that is associated with a result of the first action within the application space. 1. A method for voice activated search and control , comprising:converting, using an electronic device, a first plurality of speech signals into one or more first words;using the one or more first words for determining a first phrase contextually related to an application space;using the first phrase for performing a first action within the application space;converting, using the electronic device, a plurality of second speech signals into one or more second words;using the one or more second words for determining a second phrase contextually related to the application space; andusing the second phrase for performing a second action that is associated with a result of the first action within the application space.2. The method of claim 1 , further comprising:receiving the first plurality and the second plurality of speech signals using the electronic device.3. The method of claim 2 , wherein the first phrase and the second phrase are application specific phrases within the application space.4. The method of claim 3 , wherein the first action comprises a first search related to the application space.5. The method of claim 4 , wherein the second action comprises a second search within results of the first search.6. The method of claim 5 , wherein the application ...

Подробнее
19-12-2013 дата публикации

DISPLAY APPARATUS, METHOD FOR CONTROLLING THE DISPLAY APPARATUS, SERVER AND METHOD FOR CONTROLLING THE SERVER

Номер: US20130339031A1
Принадлежит:

A display apparatus is disclosed. The display apparatus includes a voice collecting unit which collects a user's voice; a first communication unit which transmits the user's voice to a first server, and receives text information corresponding to the user's voice from the first server; a second communication unit which transmits the received text information to a second server, and receives response information corresponding to the text information; an output unit which outputs a response message corresponding to the user's voice based on the response information; and a control unit which controls the output unit to output a response message differentiated from a response message corresponding to a previously collected user's voice, when a user's voice having a same utterance intention is re-collected 1. A display apparatus comprising:a voice collector configured to collect a voice of a user;a first communicator which transmits the voice to a first server, and receives text information corresponding to the voice from the first server;a second communicator which transmits the received text information to a second server, and receives response information corresponding to the text information;an outputter which outputs a response message corresponding to the voice based on the response information; anda controller configured to control the outputter to output a second response message differentiated from a first response message corresponding to a previously collected user's voice, when a user's voice having a same utterance intention as the previously collected user's voice is re-collected.2. The display apparatus according to claim 1 , wherein the second server analyzes the text information to determine an utterance intention included in the voice claim 1 , and transmits the response information corresponding to the determined utterance intention to the display apparatus.3. The display apparatus according to claim 2 , wherein the second server generates second ...

Подробнее
19-12-2013 дата публикации

DYNAMICALLY EXTENDING THE SPEECH PROMPTS OF A MULTIMODAL APPLICATION

Номер: US20130339033A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

A prompt generation engine operates to dynamically extend prompts of a multimodal application. The prompt generation engine receives a media file having a metadata container. The prompt generation engine operates on a multimodal device that supports a voice mode and a non-voice mode for interacting with the multimodal device. The prompt generation engine retrieves from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application. The prompt generation engine modifies the multimodal application to include the speech prompt. 118-. (canceled)19. A method of dynamically extending the speech prompts of a multimodal application , the method comprising:receiving, by a prompt generation engine, a media file having a metadata container;retrieving, by the prompt generation engine from the metadata container, a speech prompt related to content stored in the media file for inclusion in the multimodal application; andmodifying, by the prompt generation engine, the multimodal application to include the speech prompt.20. The method of wherein retrieving claim 19 , by the prompt generation engine claim 19 , from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application further comprises retrieving a text string prompt for execution by a text to speech engine.21. The method of wherein retrieving claim 19 , by the prompt generation engine claim 19 , from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application further comprises retrieving an audio prompt to be played by the multimodal device.22. The method of wherein retrieving claim 19 , by the prompt generation engine claim 19 , from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application further comprises identifying a tag for prompts in the metadata container ...

Подробнее
02-01-2014 дата публикации

SYSTEM AND METHOD FOR STANDARDIZED SPEECH RECOGNITION INFRASTRUCTURE

Номер: US20140006024A1
Принадлежит: AT&T Intellectual Property I, L.P.

Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure. 1. A method comprising:receiving speech from a user on a device; and determines to apply one of supervised training and unsupervised training; and', determines whether available data are sufficient to build a new speech recognition model;', 'when the available data is sufficient to build the new speech recognition model, builds the new speech recognition model using the available data;', 'when the available data is not sufficient to build the new speech recognition model:', 'selects an existing speech recognition model; and', 'generates an adapted speech recognition model based on transformations generated from the existing speech recognition model based on the speech and associated transcriptions., 'when supervised training is selected], 'communicating the speech to a separate device and away from the device, wherein the separate device2. The method of claim 1 , wherein the new speech recognition model claim 1 , the existing speech recognition model claim 1 , and the adapted speech recognition ...

Подробнее
02-01-2014 дата публикации

MOBILE TERMINAL AND METHOD FOR RECOGNIZING VOICE THEREOF

Номер: US20140006027A1
Принадлежит:

The present disclosure relates to a mobile terminal and a voice recognition method thereof. The voice recognition method may include receiving a user's voice; providing the received voice to a first voice recognition engine provided in the server and a second voice recognition engine provided in the mobile terminal; acquiring first voice recognition data as a result of recognizing the received voice by the first voice recognition engine; acquiring second voice recognition data as a result of recognizing the received voice by the second voice recognition engine; estimating a function corresponding to the user's intention based on at least one of the first and the second voice recognition data; calculating a similarity between the first and the second voice recognition data when personal information is required for the estimated function; and selecting either one of the first and the second voice recognition data based on the calculated similarity. 1. A voice recognition method of a mobile terminal in connection with a server , the method comprising:receiving a user's voice;providing the received voice to a first voice recognition engine provided in the server and a second voice recognition engine provided in the mobile terminal;acquiring first voice recognition data as a result of recognizing the received voice by the first voice recognition engine;acquiring second voice recognition data as a result of recognizing the received voice by the second voice recognition engine;estimating a function corresponding to the user's intention based on at least one of the first and the second voice recognition data;calculating a similarity between the first and the second voice recognition is data when personal information is required for the estimated function; andselecting either one of the first and the second voice recognition data based on the calculated similarity.2. The method of claim 1 , further comprising:ignoring the second voice recognition data when personal ...

Подробнее
02-01-2014 дата публикации

METHOD AND APPARATUS FOR PROCESSING MULTIPLE INPUTS

Номер: US20140006033A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

A method of processing multiple inputs in an apparatus having interfaces for interaction with an outside is provided. The method includes detecting a first user input from one of the interfaces, performing a function in response to the first user input, detecting a second user input or a system input from another one of the interfaces, and changing attributes of the function in response to the second user input or the system input. 1. A method of processing multiple inputs in an apparatus having interfaces for interaction with an outside , the method comprising:detecting a first input from one of the interfaces;performing a function in response to the first input;detecting a second input from another one of the interfaces; andchanging attributes of the function in response to the second input.2. The method of claim 1 , wherein detecting the first input includes detecting a user's gesture from a touch screen that is one of the interfaces claim 1 , wherein performing the function includes performing a function related with a graphic work in response to the user's gesture claim 1 , and wherein detecting the second input includes detecting voice information from a microphone that is one of the interfaces.3. The method of claim 2 , wherein the graphic work is one of handwriting claim 2 , drawing claim 2 , painting and erasing.4. The method of claim 3 , wherein changing the attributes of the function includes:recognizing the user's intention by analyzing the voice information;determining whether correction is needed for an already performed graphic work;calculating a correction-required portion in the already performed graphic work if correction is needed; andreflecting the user's intention on the correction-required portion.5. The method of claim 4 , wherein determining whether the correction is needed includes determining that correction is needed in the already performed graphic work if the user's gesture and the voice information are simultaneously detected.6. The ...

Подробнее
02-01-2014 дата публикации

CALL REGISTRATION DEVICE FOR ELEVATOR

Номер: US20140006034A1
Автор: Takeuchi Nobukazu
Принадлежит: Mitsubishi Electric Corporation

A call registration device for an elevator includes a voice input section which receives input of a user's voice, a voice recognition section which stores beforehand a predetermined call registration command representing a destination floor and a predetermined start command used for starting voice recognition of a call registration and which differs from the call registration command, the voice recognition section also performing voice recognition by judging whether the inputted voice is the call registration command or the start command, a call registration control section which, if the inputted voice was the call registration command, outputs a call registration request to the destination floor of the relevant call registration command, and a start command changing part which changes a start command which becomes effective by a predetermined condition. If the inputted voice is a start command made effective, the voice recognition section starts voice recognition of a call registration. 13-. (canceled)4. A call registration device for an elevator which performs a call registration by voice recognition using a voice input section which receives the input of a voice uttered by a user of the elevator , comprising:a voice recognition section which stores beforehand voice data of a predetermined call registration command representing a destination floor or direction of the call registration and voice data of a predetermined start command which is used for starting voice recognition of a call registration command, and which is different from the call registration command and does not include the call registration command, wherein the voice recognition section also performs the voice recognition by judging whether or not a voice inputted to the voice input section matches either the voice data of the stored call registration command or the voice data of the start command;a call registration control section which, in a case the voice inputted to the voice input section was ...

Подробнее
09-01-2014 дата публикации

SPEECH-RECOGNITION SYSTEM, STORAGE MEDIUM, AND METHOD OF SPEECH RECOGNITION

Номер: US20140012578A1
Автор: MORIOKA Kiyotaka
Принадлежит: SEIKO EPSON CORPORATION

A speech recognition system that recognizes speech data is provided. The speech recognition system includes a speech recognition part that performs speech recognition of the speech data, and calculates a likelihood of the speech data with respect to a registered word that is pre-registered, a reliability judgment part that performs reliability judgment on the speech recognition based on the likelihood, and a judgment reference change processing part that changes a judgment reference for the reliability judgment, according to an utterance speed of the speech data. 1. A speech recognition system that recognizes speech data , the speech recognition system comprising:a speech recognition part that performs speech recognition of the speech data, and calculates a likelihood of the speech data with respect to a registered word that is pre-registered;a reliability judgment part that performs reliability judgment on the speech recognition based on the likelihood; anda judgment reference change processing part that changes a judgment reference for the reliability judgment, according to an utterance speed of the speech data.2. The speech recognition system according to claim 1 , whereinthe reliability judgment part performs the reliability judgment for judging the reliability of the speech recognition based on a comparison result between a likelihood difference judgment threshold and a likelihood difference that is a difference in the likelihood among a plurality of the registered words obtained as a result of the speech recognition, andthe judgment reference change processing part changes the likelihood difference judgment threshold to be used for the reliability judgment to have a greater value, as the utterance speed becomes slower.3. The speech recognition system according to claim 2 , whereinthe likelihood difference judgment threshold is set corresponding to an acoustic model of each of the registered words, andthe reliability judgment part use the likelihood difference ...

Подробнее
09-01-2014 дата публикации

DISPLAY APPARATUS, INTERACTIVE SYSTEM, AND RESPONSE INFORMATION PROVIDING METHOD

Номер: US20140012585A1
Принадлежит:

A display apparatus includes a voice collecting device which collects a user voice, a communication device which performs communication with an interactive server, and a control device which, when response information corresponding to the user voice sent to the interactive server is received from the interactive server, controls to perform a feature corresponding to the response information, and the control device controls the communication device to receive replacement response information, related to the user voice, through a web search and a social network service (SNS). 1. A display apparatus comprising:a voice collecting device which collects a user voice;a communication device which performs communication with an interactive server; anda control device which, when response information corresponding to the user voice sent to the interactive server is received from the interactive server, controls to perform a feature corresponding to the response information,wherein the control device controls the communication device to receive replacement response information, related to the user voice, through a web search and a social network service (SNS).2. The display apparatus of claim 1 , further comprising:an output device;wherein, when an utterance element included within the user voice, with the non-provisionable message for the response information corresponding to the user voice, is received from the interactive server, the control device controls the output device to receive and output replacement response information related to the utterance element through the web search and the social network service.3. The display apparatus of claim 2 , whereinwhen a user command for performing the web search is received, the control device receives and outputs a result of the web search based on the utterance element.4. The display apparatus of claim 2 , whereinwhen a user command for the social network service is received, the control device generates a response request ...

Подробнее
16-01-2014 дата публикации

METHOD FOR CORRECTING VOICE RECOGNITION ERROR AND BROADCAST RECEIVING APPARATUS APPLYING THE SAME

Номер: US20140019127A1
Принадлежит:

A method for correcting a voice recognition error and a broadcast receiving apparatus applying the same are provided. The method for correcting the voice recognition error includes, receiving a user's spoken command, recognizing the user's spoken command and determining text corresponding to the user's spoken command, if a user command to correct the determined text is input, displaying a text correction user interface in which a morpheme of the determined text and an indicator are associated with each other, and correcting the morpheme of the determined text by selecting the associated indicator of the text correction UI. Accordingly, the broadcast receiving apparatus exactly corrects the misrecognized word with a word desired by the user. 1. A method for correcting a voice recognition error of a broadcast receiving apparatus , the method comprising:receiving a user's spoken command;recognizing the user's spoken command; anddetermining text corresponding to the user's spoken command;if a user command to correct the determined text is input, displaying a text correction user interface (UI) in which a morpheme of the determined text and an indicator are associated with each other; andcorrecting the morpheme of the determined text by selecting the associated indicator of the text correction UI.2. The method as claimed in claim 1 , wherein the morpheme is an initial morpheme and the correcting comprises:if the initial morpheme is selected using the associated indicator, displaying a candidate morpheme corresponding to the initial morpheme; andif the candidate morpheme is selected, correcting the initial morpheme by replacing the initial morpheme with the selected candidate morpheme.3. The method as claimed in claim 2 , wherein the displaying the candidate morpheme comprises displaying a second indicator corresponding to the candidate morpheme on one side of the candidate morpheme.4. The method as claimed in claim 3 , wherein the correcting the target morpheme comprises ...

Подробнее
16-01-2014 дата публикации

DATA PROCESSING METHOD, PRESENTATION METHOD, AND CORRESPONDING APPARATUSES

Номер: US20140019133A1
Принадлежит:

A data processing method includes obtaining text information corresponding to a presented content, the presented content comprising a plurality of areas; performing text analysis on the text information to obtain a first keyword sequence, the first keyword sequence including area keywords associated with at least one area of the plurality of areas; obtaining speech information related to the presented content, the speech information at least comprising a current speech segment; and using a first model network to perform analysis on the current speech segment to determine the area corresponding to the current speech segment, wherein the first model network comprises the first keyword sequence. 1. A data processing method , comprising:obtaining text information corresponding to a presented content, the presented content comprising a plurality of areas;performing text analysis on the text information to obtain a first keyword sequence, the first keyword sequence including area keywords associated with at least one area of the plurality of areas;obtaining speech information related to the presented content, the speech information at least comprising a current speech segment;using a first model network to perform analysis on the current speech segment to determine the area corresponding to the current speech segment, wherein the first model network comprises the first keyword sequence.2. The method according to claim 1 , wherein the using a first model network to perform analysis on the current speech segment to determine the area corresponding to the current speech segment comprises:obtaining a confidence degree of at least one area keyword in the first keyword sequence, wherein the higher similarity an area keyword has with respect to the current speech segment, the higher confidence degree is obtained for the area keyword;determining, if a first condition is met, that the area corresponding to the current speech segment is an area associated with an area keyword ...

Подробнее
16-01-2014 дата публикации

METHOD FOR PROVIDING CONTENTS INFORMATION AND BROADCAST RECEIVING APPARATUS

Номер: US20140019141A1
Принадлежит:

A method of providing contents information and broadcast receiving apparatus are provided. The method of providing contents information includes requesting, according to user input, a contents providing server to perform a contents search; receiving contents data on contents searched in response to the contents search request from the contents providing server; converting the contents data into audio data using a Text-To-Speech technology; and processing the audio data and outputting the processed audio data, according to at least one characteristic of the searched contents and/or user input. 1. A method of providing contents information of a broadcast receiving apparatus , the method comprising:requesting, according to user input, a contents providing server to perform a contents search;receiving contents data on contents searched in response to the contents search request, from the contents providing server;converting the contents data into audio data using Text-To-Speech (TTS) technology; andprocessing the audio data and outputting the processed audio data, according to at least one characteristic of the searched contents and user input.2. The method according to claim 1 , wherein the converting comprises:parsing metadata of the contents data to output text data; andconverting the text data into the audio data using the TTS technology.3. The method according to claim 2 , further comprising determining a genre of the contents from the metadata claim 2 ,wherein the processing the audio data and the outputting the processed audio data comprises processing the audio data in an audio setting corresponding to the genre of the contents, and outputting the processed audio data.4. The method according to claim 1 , further comprising generating a contents list using the contents data and displaying the generated contents list claim 1 ,wherein, if one of the contents contained in the generated contents list is selected by user manipulation, the outputting the processed ...

Подробнее
23-01-2014 дата публикации

IMAGE PICKUP DEVICE AND METHOD OF PICKING UP IMAGE USING THE SAME

Номер: US20140022404A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

An image pickup device includes an image processing unit which processes an image input through the plurality of image pickup units, a plurality of microphones which are spaced apart from each other, an audio processing unit which senses a voice of a photographer using the plurality of microphones, and a control unit which, when the voice of a photographer is sensed through the audio processing unit, controls the image processing unit to combine an image of an image pickup unit corresponding to a location of the photographer with an image of an image pickup unit currently performing photographing. 1. An image pickup device including a plurality of image pickup units , comprising:an image processing unit which processes an image input through the plurality of image pickup units;a plurality of microphones which are spaced apart from each other;an audio processing unit which senses a voice of a photographer using the plurality of microphones; anda control unit which, if the voice of the photographer is sensed through the audio processing unit, controls the image pickup unit corresponding to a location of a photographer to operate and controls the image processing unit to combine an image of an image pickup unit corresponding to a location of a photographer with an image of an image pickup unit currently performing photographing.2. The device as claimed in claim 1 , wherein the audio processing unit senses the voice of the photographer using a phase difference of the voice sensed through the plurality of microphones.3. The device as claimed in claim 1 , wherein the audio processing unit compares the voice input from a microphone corresponding to an image pickup unit currently performing photographing with the voice input from another microphone claim 1 , and removes noise from the voice input from a microphone corresponding to the image pickup unit currently performing photographing.4. The device as claimed in claim 1 , wherein the audio processing unit converts the ...

Подробнее
06-02-2014 дата публикации

INDEXING DIGITIZED SPEECH WITH WORDS REPRESENTED IN THE DIGITIZED SPEECH

Номер: US20140039899A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

Indexing digitized speech with words represented in the digitized speech, with a multimodal digital audio editor operating on a multimodal device supporting modes of user interaction, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an ASR engine, including providing by the multimodal digital audio editor to the ASR engine digitized speech for recognition; receiving in the multimodal digital audio editor from the ASR engine recognized user speech including a recognized word, also including information indicating where, in the digitized speech, representation of the recognized word begins; and inserting by the multimodal digital audio editor the recognized word, in association with the information indicating where, in the digitized speech, representation of the recognized word begins, into a speech recognition grammar, the speech recognition grammar voice enabling user interface commands of the multimodal digital audio editor. 1. A method for use with a multimodal digital audio editor operating on a multimodal device supporting multiple modes of user interaction with the multimodal digital audio editor , the modes of user interaction including a voice mode and one or more non-voice modes , the multimodal digital audio editor operatively coupled to an automatic speech recognition (ASR) engine , the method comprising:receiving, in the multimodal digital audio editor, recognized speech that the ASR engine generated from digitized speech that includes a recognized word and information indicating where, in the digitized speech, representation of the recognized word appears;inserting, by the multimodal digital audio editor, the recognized word into a speech recognition grammar; andinserting, by the multimodal digital audio editor, into the speech recognition grammar in association with the recognized word, the information indicating where, in the digitized speech, representation ...

Подробнее
13-02-2014 дата публикации

METHOD AND SYSTEM FOR ACOUSTIC DATA SELECTION FOR TRAINING THE PARAMETERS OF AN ACOUSTIC MODEL

Номер: US20140046662A1
Принадлежит: Interactive Intelligence, Inc.

A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model. 1. A method for training models in speech recognition systems through the selection of acoustic data comprising the steps of:a. training an acoustic model;b. performing a forced Viterbi alignment;c. calculating a total likelihood score;d. performing a phoneme recognition;e. retaining selected audio files; andf. training a new acoustic model.2. The method of claim 1 , wherein step (a) further comprises the steps of:a. analyzing a speech corpus comprised of audio files;b. calculating a maximum likelihood criterion; andc. estimating the parameters of said acoustic model of the probability distribution.3. The acoustic model of claim 1 , wherein said model comprises a Hidden Markov Model and a Gaussian Mixture Model.4. The method of claim 1 , wherein step (b) further comprises the steps of:a. obtaining a total likelihood score for each audio file; andb. determining an average frame likelihood score.5. The method of claim 4 , wherein step (a) further comprises the step of using the mathematical equation α=p(x|q)ΠP(q|q)p(x|q) to obtain a total likelihood score of an audio file.6. The method of claim 5 , wherein audio file r∈{1 claim 5 , R}.8. The method of claim 1 , wherein step (c) further comprises the ...

Подробнее
27-02-2014 дата публикации

SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION REQUEST DEVICE, SPEECH RECOGNITION METHOD, SPEECH RECOGNITION PROGRAM, AND RECORDING MEDIUM

Номер: US20140058729A1
Автор: Nagatomo Kentaro
Принадлежит: NEC Corporation

Provided is a speech recognition system, including: a first information processing device including a speech recognition processing unit for receiving data to be used for speech recognition transmitted via a network, carrying out speech recognition processing, and returning resultant data; and a second information processing device connected to the first information processing device via the network. The second information processing device performs conversion of the data into data having a format that disables a content thereof from being perceived and also enables the speech recognition processing unit to perform the speech recognition processing. Thereafter, the second information processing device transmits the data to be used for the speech recognition by the speech recognition processing unit and constructs resultant data returned from the first information processing device into a content of a valid and perceivable recognition result. 1. A speech recognition system , comprising:a first information processing device comprising a speech recognition processing unit for receiving data to be used for speech recognition transmitted via a network, carrying out speech recognition processing, and returning resultant data; anda second information processing device which is connected to the first information processing device via the network, which transmits the data to be used for the speech recognition by the speech recognition processing unit after performing mapping thereof by using a mapping function unknown to the first information processing device, and constructing a speech recognition result by modifying the resultant data returned from the first information processing device into the same result as a result of performing the speech recognition without using the mapping function.2. A speech recognition system , comprising a plurality of information processing devices that are connected to one another via a network and comprise a speech recognition processing ...

Подробнее
27-02-2014 дата публикации

Method and System for Selectively Biased Linear Discriminant Analysis in Automatic Speech Recognition Systems

Номер: US20140058731A1
Принадлежит: Interactive Intelligence, Inc.

A system and method are presented for selectively biased linear discriminant analysis in automatic speech recognition systems. Linear Discriminant Analysis (LDA) may be used to improve the discrimination between the hidden Markov model (HMM) tied-states in the acoustic feature space. The between-class and within-class covariance matrices may be biased based on the observed recognition errors of the tied-states, such as shared HMM states of the context dependent tri-phone acoustic model. The recognition errors may be obtained from a trained maximum-likelihood acoustic model utilizing the tied-states which may then be used as classes in the analysis. 1. A method for training an acoustic model using the maximum likelihood criteria , comprising the steps of:a. performing a forced alignment of speech training data;b. processing the training data and obtaining estimated scatter matrices, wherein said scatter matrices may comprise one or more of a between class scatter matrix and a within-class scatter matrix, from which mean vectors may be estimated;c. biasing the between class scatter matrix and the within-class scatter matrix;d. diagonalizing the between class scatter matrix and the within class scatter matrix and estimating eigen-vectors to produce transformed scatter matrices;e. obtaining new discriminative features using the estimated vectors, wherein said vectors correspond to the highest discrimination in the new space;f. training a new acoustic model based on said new discriminative features; andg. saving said acoustic model.2. The method of claim 1 , wherein step (a) further comprises the step of using the current maximum likelihood acoustic model on the entire speech training data with a Hidden Markov Model—Gaussian Mixture Model.3. The training data of claim 2 , wherein said data may consist of phonemes and triphones wherein:a. a triphone's Hidden Markov Model states may be mapped to tied states;b. each feature frame may have a tied state class label; andc. ...

Подробнее
06-03-2014 дата публикации

MODEL LEARNING DEVICE, MODEL GENERATION METHOD, AND COMPUTER PROGRAM PRODUCT

Номер: US20140067393A1
Автор: Masuko Takashi
Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to an embodiment, a model learning device learns a model having a full covariance matrix shared among a plurality of Gaussian distributions. The device includes a first calculator to calculate, from training data, frequencies of occurrence and sufficient statistics of the Gaussian distributions contained in the model; and a second calculator to select, on the basis of the frequencies of occurrence and the sufficient statistics, a sharing structure in which a covariance matrix is shared among Gaussian distributions, and calculate the full covariance matrix shared in the selected sharing structure. 1. A model learning device for learning a model having a full covariance matrix shared among a plurality of Gaussian distributions , the device comprising:a first calculator configured to calculate, from training data, frequencies of occurrence and sufficient statistics of the Gaussian distributions contained in the model; anda second calculator configured to select, on the basis of the frequencies of occurrence and the sufficient statistics, a sharing structure in which a covariance matrix is shared among Gaussian distributions, and calculate the full covariance matrix shared in the selected sharing structure.2. The device according to claim 1 , wherein the second calculator selects the sharing structure on the basis of an expected value of log likelihood calculated by using the frequencies of occurrence and the sufficient statistics.3. The device according to claim 1 , wherein the second calculator includes:a cluster selector configured to select a cluster with maximum likelihood for each of the Gaussian distributions; anda shared full covariance matrix updating unit configured to update the shared full covariance matrix on the basis of mean vectors, the frequencies of occurrence and the sufficient statistics of the Gaussian distributions belonging to the cluster.4. The device according to claim 1 , wherein the second calculator calculates the shared full ...

Подробнее
13-03-2014 дата публикации

PHONETIC PRONUNCIATION

Номер: US20140074470A1
Принадлежит: GOOGLE INC.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improved pronunciation. One of the methods includes receiving data that represents an audible pronunciation of the name of an individual from a user device. The method includes identifying one or more other users that are members of a social circle that the individual is a member. The method includes identifying one or more devices associated with the other users. The method also includes providing information that identifies the individual and the data representing the audible pronunciation to the one or more identified devices. 1. A method performed by data processing apparatus , the method comprising:receiving data that represents an audible pronunciation of the name of an individual from a user device;identifying one or more other users that have a predetermined association with the individual;identifying one or more devices associated with the other users; andproviding information that identifies the individual and the data representing the audible pronunciation to the one or more identified devices.2. The method of claim 1 , wherein the one or more devices are capable of audibly reproducing the pronunciation.3. The method of claim 1 , wherein the user device is a smart phone registered on a social networking site associated with the social circle.4. The method of claim 1 , wherein the pronunciation is associated with a contact entry associated with the user on at least one of the one or more user devices.5. The method of claim 1 , further comprising:generating voice recognition data from the data representing the audible pronunciation.6. The method of claim 5 , further comprising:receiving, by one of the one or more devices, the voice recognition data;identifying a contact entry associated with the individual using the identifying information;associating the voice recognition data with the contact entry; andupdating a new pronunciation on the device using the ...

Подробнее
13-03-2014 дата публикации

Navigation apparatus

Номер: US20140074473A1
Принадлежит: Mitsubishi Electric Corp

A navigation apparatus capable of providing a user not only with guidance, but also with all of the guidance, operational procedure, operation screen and recognition vocabulary, that is, with an operational transition that is defined by the guidance, operational procedure, operation screen and recognition vocabulary, while altering the operational transition in accordance with the recognition vocabulary comprehension level of the user. Thus, it can increase the possibility for a user with a low recognition vocabulary comprehension level to achieve a task, or for a user with a high recognition vocabulary comprehension level to improve the comfortableness of the operation, thereby being able to provide all the users with the optimum operational transition.

Подробнее
13-03-2014 дата публикации

SPEECH RECOGNITION RESULT SHAPING APPARATUS, SPEECH RECOGNITION RESULT SHAPING METHOD, AND NON-TRANSITORY STORAGE MEDIUM STORING PROGRAM

Номер: US20140074475A1
Принадлежит: NEC Corporation

There is provided a speech recognition result forming apparatus () including a recognition result output unit () that refers to character string data, which is a speech recognition result, and removes a word string of a recognition error included in the character string data from the character string data and also, when attached word strings are located before and/or after the word string of the recognition error, generates preformatted character string data by removing at least one of the attached word strings from the character string data or replacing at least one of the attached word strings with other data items and outputs the preformatted character string data. 1. A speech recognition result forming apparatus comprising:a recognition result output unit that refers to character string data, which is a speech recognition result, and removes a word string of a recognition error included in the character string data from the character string data and also, when attached word strings are located before and/or after the word string of the recognition error, generates preformatted character string data by removing at least one of the attached word strings from the character string data or replacing at least one of the attached word strings with other data items and outputs the preformatted character string data.2. The speech recognition result forming apparatus according to claim 1 ,wherein, when the word string of the recognition error is an independent word, the recognition result output unit outputs the preformatted character string data generated by removing the attached word string, which is located after the word string of the recognition error, from the character string data or replacing the attached word string with other data items, andwhen the word string of the recognition error is an attached word, the recognition result output unit outputs the preformatted character string data generated by removing the attached word strings, which are located before ...

Подробнее
13-03-2014 дата публикации

Context-Sensitive Handling of Interruptions by Intelligent Digital Assistant

Номер: US20140074483A1
Автор: Van Os Marcel
Принадлежит: Apple Inc.

Methods and systems related to intelligent interruption handling by digital assistants are disclosed. In some embodiments, a first information provision process is initiated in response to a first speech input. The first information provision process comprises preparing a first response and a second response to the first speech input. After or concurrent with the provision of the first response to the user, but before provision of the second response to the user, an event operable to initiate a second information provision process is detected. The second information provision process is initiated in response to detecting the event. The second information provision process comprises preparing a third response to the event. A relative urgency between the second response and the third response is determined. One of the second response and the third response is provided to the user in an order based on the determined relative urgency. 1. A method of operating a digital assistant , comprising: receiving a first speech input from a user;', 'initiating a first information provision process in response to receipt of the first speech input, the first information provision process comprising preparing at least a first response and a second response to the first speech input;', 'providing the first response to the user;', 'after or concurrent with the provision of the first response to the user, but before provision of the second response to the user, detecting an event operable to initiate a second information provision process;', 'initiating the second information provision process in response to detecting the event, the second information provision process comprising preparing at least a third response to the event;', 'determining a relative urgency between the second response and the third response; and', 'providing one of the second response and the third response to the user in an order based on the determined relative urgency., 'at a device having one or more processors ...

Подробнее
27-03-2014 дата публикации

APPARATUS AND METHODS FOR MANAGING RESOURCES FOR A SYSTEM USING VOICE RECOGNITION

Номер: US20140088962A1
Автор: Corfield Charles
Принадлежит: nVoq Incorporated

The technology of the present application provides a method and apparatus to managing resources for a system using voice recognition. The method and apparatus includes maintaining a database of historical data regarding a plurality of users. The historical database maintains data regarding the training resources required for users to achieve an accuracy score using voice recognition. A resource calculation module determines from the historical data an expected amount of training resources necessary to train a new user to the accuracy score. 1. A method performed on at least one processor for determining training time for a predetermined percentage of new users to achieve a predetermined accuracy score , the method comprising the steps of: receiving at least one audio file from each of the plurality of users;', 'transcribing at least one transcribed file for each of the at least one audio file from each of the plurality of users using the speech to text engine;', 'correcting the at least one transcribed file; and', 'training the user profile using the corrections;, 'training a plurality of users for a speech recognition engine wherein the training comprises the steps ofstoring historical training data for the plurality of users trained to use the speech recognition engine wherein the data includes a first set of data for each user indicative of an amount of training resources used and a second set of data for each user indicative of an accuracy score for the user;determining from the stored historical data an expected amount of training resources needed to train a new user to the accuracy score for the speech recognition engine; andreceiving historical training data for each new user trained to the accuracy score, wherein the received historical training data is stored as stored historical training data for the plurality of users such that a company can determine resources necessary to training new users of the speech recognition engine.2. The method of wherein the ...

Подробнее
03-04-2014 дата публикации

SYSTEMS AND METHODS FOR PROVIDING A VOICE AGENT USER INTERFACE

Номер: US20140095173A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

Some embodiments provide techniques performed by at least one voice agent. The techniques include receiving voice input; identifying at least one application program as relating to the received voice input; and displaying at least one selectable visual representation that, when selected, causes focus of the computing device to be directed to the at least one application program identified as relating to the received voice input. 1. A computing device , comprising: receive voice input;', 'identify at least one application program as relating to the received voice input; and', 'display at least one selectable visual representation that, when selected, causes focus of the computing device to be directed to the at least one application program identified as relating to the received voice input., 'at least one processor programmed to implement at least one voice agent, wherein the at least one voice agent is configured to2. The computing device of claim 1 , wherein the voice input specifies at least one action claim 1 , and wherein the at least one voice agent is further configured to:identify the at least one application program as relating to the received voice input, at least in part, by determining that the at least one action may be performed, at least in part, by using the at least one application program.3. The computing device of claim 1 , wherein the voice input specifies at least one action claim 1 , and wherein the at least one voice agent is further configured to:perform at least a portion of the at least one action; anddisplay the at least one selectable visual representation before the at least one voice agent completes performance of the at least the portion of the at least one action.4. The computing device of claim 1 , wherein the at least one application program comprises a plurality of application programs claim 1 , and wherein the at least one voice agent is configured to display a plurality of selectable visual representations claim 1 , each one of ...

Подробнее
03-04-2014 дата публикации

ELECTRONIC DEVICE, SERVER AND CONTROL METHOD THEREOF

Номер: US20140095174A1
Принадлежит:

Provided are a display apparatus, a control method thereof, a server, and a control method thereof. The display apparatus includes: a processor which processes a signal; a display which displays an image based on the processed signal; a command receiver which receives a voice command; a communicator which communicates with a first server; a storage; and a controller which receives, from the first server, a voice recognition command list comprising a voice recognition command and control command information corresponding to the voice recognition command, and stores the received voice recognition command list in the storage, the voice recognition command being among user's voice commands which have successfully been recognized a predetermined number of times or more, determines whether the voice command corresponds to the voice recognition command included in the voice recognition command list, and if so, controls the processor to operate based on the control command information, and if not, transmits the voice command to the first server, receives corresponding control command information from the first server, and controls the processor to operate based on the received control command information. 1. A display apparatus comprising:a processor which processes a signal;a display which displays an image based on the processed signal;a command receiver which receives a voice command from a user;a communicator which communicates with a first server;a storage; anda controller which receives, from the first server, a voice recognition command list comprising a plurality of voice recognition commands and control command information corresponding to the voice recognition commands, and stores the received voice recognition command list in the storage, the voice recognition commands being among user's voice commands which have successfully been recognized a predetermined number of times or more,wherein, in response to receiving a voice command, determines whether the received ...

Подробнее
03-04-2014 дата публикации

IMAGE PROCESSING APPARATUS AND CONTROL METHOD THEREOF AND IMAGE PROCESSING SYSTEM

Номер: US20140095175A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

An image processing apparatus including: image processor which processes broadcasting signal, to display image based on processed broadcasting signal; communication unit which is connected to a server; a voice input unit which receives a user's speech; a voice processor which processes a performance of a preset corresponding operation according to a voice command corresponding to the speech; and a controller which processes the voice command corresponding to the speech through one of the voice processor and the server if the speech is input through the voice input unit. If the voice command includes a keyword relating to a call sign of a broadcasting channel, the controller controls one of the voice processor and the server to select a recommended call sign corresponding to the keyword according to a predetermined selection condition, and performs a corresponding operation under the voice command with respect to the broadcasting channel of the recommended call sign. 1. An image processing apparatus comprising:an image processor which processes a broadcasting signal received from an outside, to display an image based on the processed broadcasting signal;a communicator which is operable to communicate with a server;a voice receiver which receives a user's speech;a voice processor which is operable to process a performance of an operation according to a voice command corresponding to the user's speech; anda controller which processes the voice command corresponding to the user's speech through one of the voice processor and the server if the speech is received through the voice receiver,wherein if the voice command comprises a keyword relating to a desired call sign of a broadcasting channel, the controller controls one of the voice processor and the server to search a plurality of call signs corresponding to the keyword according to a predetermined selection condition, as recommended call signs, controls to display an user interface (UI) image being provided to select ...

Подробнее
03-04-2014 дата публикации

ELECTRONIC DEVICE, SERVER AND CONTROL METHOD THEREOF

Номер: US20140095176A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

Provided are a display apparatus, a control method thereof, a server, and a control method thereof. The display apparatus includes: a processor which processes a signal; a display which displays an image based on the processed signal; a first command receiver which receives a voice command; a storage which stores a plurality of voice commands said by a user; a second command receiver which receives a user's manipulation command; and a controller which, upon receiving the voice command, displays a list of the stored plurality of voice commands, selects one of the plurality of voice commands of the list according to the received user's manipulation command and controls the processor to process based on the selected voice command. 1. A display apparatus comprising:a processor which processes a signal;a display which displays an image based on the processed signal;a first command receiver which receives a voice command from a user;a storage which stores a plurality of voice commands said by a user;a second command receiver which receives a user's manipulation command; anda controller which, upon receiving the voice command, display a list of the stored plurality of voice commands, selects one of the plurality of voice commands of the list according to the received user's manipulation command and controls the processor to process based on the selected voice command.2. The display apparatus according to claim 1 , wherein the controller controls to store the voice commands per user and to show the stored voice commands per user.3. The display apparatus according to claim 1 , wherein the controller controls to register identification symbols to the stored voice commands and if the user says the registered identification symbol claim 1 , determines that the corresponding voice command has been received.4. The display apparatus according to claim 1 , wherein if the user says a location where one voice command is arranged in the displayed list of voice commands claim 1 , the ...

Подробнее
10-04-2014 дата публикации

SMART SWITCH WITH VOICE OPERATED FUNCTION AND SMART CONTROL SYSTEM USING THE SAME

Номер: US20140100854A1
Принадлежит:

A smart switch applied to a smart control system in a smart house, includes a storage, a voice input unit configured to receive vocal commands and convert the vocal commands to electronic data, and a remote control unit. A processor unit which includes a voice identifying module, a determining module, and a control module is also included. The smart switch recognizes a voice command and sends a remote control command to the target electronic devices, thereby controlling the electronic devices to execute an operation. A smart control system is also provided. 1. A smart switch applied to a smart home system , the smart switch comprising:at least one socket configured to connect an electronic device;a plug configured to connect the smart switch to an power;a storage configured to store an one-to-one relationship between unique identification codes of the electronic devices and names of the electronic devices, and one-to-one relationships between voice commands and remote control commands;a voice input unit configured to receive sounds made by a user, and convert the sounds of the user to electronic data;a remote controlling unit; anda processor unit comprising a voice identifying module, a determining module, and a control module;wherein the voice identifying module identifies the electronic data, and extracts a voice command and the name of the target electronic device;wherein the determining module obtains the unique identification code of the target electronic device according to the one-to-one relationship between unique identification codes of the electronic devices and names of the electronic devices, and obtains the remote control command corresponding to the extracted voice command according to the one-to-one relationships between the voice commands and the remote control commands; andwherein the control module controls the remote controlling unit to send the remote control command to the target electronic device.2. The smart switch as described in claim 1 , ...

Подробнее
06-01-2022 дата публикации

SYSTEM AND METHOD OF FINDING AND ENGAGING WITH HISTORICAL MARKERS

Номер: US20220003568A1
Автор: Benge James
Принадлежит:

The present invention relates to an application or a system and a method for finding/exploring areas, and specifically historical markers. The application alerts travelers to road-signs that list historical information. In addition to identifying them on a map, user input/output is adapted to accept user-input via spoken user commands, and for playing audio output to the user, the audio output comprising historical marker content and acknowledgement of audio commands. The functionality of the present invention includes a registration process, a data base of markers (i.e., a data), locating and navigating user to the nearest marker (i.e., navigating to a location). The user can add information to the selected marker. 1. A system for exploring one or more historical markers , comprising:a user interface for displaying the one or more historical markers present in a user-defined radius as per current location of a user and for allowing the user to select a historical marker from the one or more historical markers present in the user-defined radius; a registration unit for enabling the user to register in the system;', 'a selection unit for setting the user-defined radius as per the current location of the user to further identify the one or more historical markers in the user-defined radius;', 'a navigation unit configured to navigate to the selected historical marker; and', 'an output unit configured to read description text of the selected historical marker., 'a memory, the memory comprising2. The system of further comprising a means for enabling the user to suggest that a new historical marker be located at a suggested coordinate.3. The system of wherein the user interface receives input through one or more voice commands.4. The system of wherein the user interface receives input through clicks and taps.5. The system of further comprising a voice command unit associated with the user interface to process the one or more voice commands as an audio input and to ...

Подробнее
06-01-2022 дата публикации

OPTIMIZATION APPARATUS, OPTIMIZATION METHOD, AND PROGRAM

Номер: US20220005471A1

To perform optimization processing of parameters with various structures without having to manually redesign processing contents of encoding and decoding. An evaluation step of obtaining an evaluated value representing an evaluation result of signal processing using a first signal processing parameter value that is a signal processing parameter; a coding step of converting, based on at least a definition file that defines an attribute of the signal processing parameter, the first signal processing parameter value into a first external parameter value that is an external parameter; a generation step of generating a second external parameter value that is the external parameter of which a value differs from the first external parameter value based on the evaluated value and the first external parameter value; and a decoding step of converting, based on the definition file, the second external parameter value into a second signal processing parameter value that is the signal processing parameter are executed. 1. An optimization apparatus , comprising processing circuitry configured to implement:an evaluating unit which obtains an evaluated value representing an evaluation result of signal processing using a first signal processing parameter value that is a signal processing parameter;a coding unit which converts, based on at least a definition file that defines an attribute of the signal processing parameter, the first signal processing parameter value into a first external parameter value that is an external parameter;a generating unit which generates a second external parameter value that is the external parameter of which a value differs from the first external parameter value based on the evaluated value and the first external parameter value; anda decoding unit which converts, based on the definition file, the second external parameter value into a second signal processing parameter value that is the signal processing parameter.2. The optimization apparatus ...

Подробнее
05-01-2017 дата публикации

TESTING WORDS IN A PRONUNCIATION LEXICON

Номер: US20170004823A1
Принадлежит:

A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence. 1. A method performed in one or more of computers , for testing words defined in a pronunciation lexicon used in an automatic speech recognition system , wherein the method comprises the following steps:obtaining a plurality of test sentences which can be accepted by a language model used in the automatic speech recognition system, wherein the test sentences cover the words defined in the pronunciation lexicon;obtaining variations of speech data corresponding to each of the test sentences;obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data;constructing a word graph, using the plurality of texts, for each of the test sentences, wherein each word in the word graph corresponds to each of the words defined in the pronunciation lexicon; anddetermining whether or not all or parts of words in a test sentence of the test sentences are present in a path of the word graph derived from the test sentence.2. The method according to claim 1 , wherein the generated test ...

Подробнее
05-01-2017 дата публикации

METHOD FOR CONTROLLING OPERATION OF AN AGRICULTURAL MACHINE AND SYSTEM THEREOF

Номер: US20170004830A1
Принадлежит:

A method for controlling operation of an agricultural machine and system thereof are disclosed. The method may comprise providing a portable device that has an input device, a processing unit, a storage unit, an output device, and a transceiver device configured for wireless data transmission; receiving a voice control command over a microphone device of the input device of the portable device; determining command text data from the voice control command by processing the voice control command by a speech recognition application running on the processing unit of the portable device; providing machine control signals assigned to a machine control function in a control device of an agricultural machine located remotely from the portable device; and controlling the operation of the agricultural machine according to the machine control signals. 1. A method for controlling operation of an agricultural machine , comprising:providing a portable device, the portable device comprising an input device, a processing unit, a storage unit, an output device, and a transceiver device configured for wireless data transmission;receiving a voice control command over a microphone device of the input device of the portable device;determining command text data from the voice control command by processing the voice control command by a speech recognition application running on the processing unit of the portable device; and determining control function data indicating the machine control function from the command text data, and', 'processing the control function data for generating the machine control signals, and', 'controlling the operation of the agricultural machine according to the machine control signals., 'providing machine control signals assigned to a machine control function in a control device of an agricultural machine located remotely from the portable device, the control device of the agricultural machine comprising a processing unit and a transceiver device configured for ...

Подробнее
04-01-2018 дата публикации

STATE MACHINE BASED CONTEXT-SENSITIVE SYSTEM FOR MANAGING MULTI-ROUND DIALOG

Номер: US20180004729A1
Автор: Qiu Nan, Wang Haofen
Принадлежит:

The present invention discloses a state machine based context-sensitive multi-round dialog management system, comprising: an input module, for receiving multi-modal input information from a user; an intention identification engine module, for identifying intention information in the multi-modal input information; an intention module, for bringing multiple intention information identified by the intention identification engine module into one-to-one correspondence with multiple intention sub-modules at back ends; a state machine module, comprising a plurality of state machines for managing a relevant context in the dialog management system and providing support for an output result; an instruction parsing engine module, comprising a plurality of instruction parsing engine sub-modules for parsing corresponding intention information and acquiring the parsed multiple intention information; and an output module, for acquiring policy information according to the results from the parsing engine module and the intention identification module, and transmitting the policy information to the state machine module. 1. A state machine based context-sensitive multi-round dialog management system , comprising:an input module, for receiving multi-modal input information from a user;an intention identification engine module, for identifying intention information in the multi-modal input information;an intention module, for bringing multiple intention information identified by the intention identification engine module into one-to-one correspondence with multiple intention sub-modules at back ends;a state machine module, comprising a plurality of state machines for managing a relevant context in the dialog management system and providing support for an output result;an instruction parsing engine module, comprising a plurality of instruction parsing engine sub-modules for parsing corresponding intention information and acquiring the parsed multiple intention information; andan output ...

Подробнее
02-01-2020 дата публикации

QUERY EXPANSION USING A GRAPH OF QUESTION AND ANSWER VOCABULARY

Номер: US20200004875A1
Принадлежит:

A method and system are provided for query expansion. The method may include: providing a graph of question and answer word nodes generated from a set of training data for a given domain in the form of training question and answer texts, wherein the answer word nodes are disjoint words that do not occur in both a training question and an associated training answer and wherein edges are provided between a disjoint pair of a training question word and an associated training disjoint answer word, including providing weightings for the nodes and edges based on frequency data; and receiving a user query input, activating input nodes in the graph for words in the user query input, and applying spreading activation through the graph using the weightings to result in a top n most highly activated nodes that are used as candidate words for expansion of the user query input. 2. The method as claimed in claim 1 , wherein applying spreading activation claim 1 , propagates a signal in all directions across question word nodes and answer word nodes that are directly or indirectly connected to the input nodes.3. The method as claimed in claim 1 , further comprising capturing user feedback on results of a user query input into a search engine using the candidate words for expansion and using the user feedback to update the graph.4. The method as claimed in claim 1 , wherein a user query input is a question input and activating input nodes activates question word nodes.5. The method as claimed in claim 4 , wherein a user query input includes an answer input in addition to the question input and activating input nodes activates answer word nodes in addition to question word nodes.6. The method as claimed in claim 5 , wherein a user query input includes an answer input in addition to a question input to refine search results to a style of answer and the candidate words are used for refinement of the search results to an answer style.7. The method as claimed in claim 1 , wherein ...

Подробнее
02-01-2020 дата публикации

BRAND SAFETY IN VIDEO CONTENT

Номер: US20200005046A1
Принадлежит:

Disclosed herein are techniques for determining brand safety of a video including image frames and audio content. In some embodiments, frame-level features, scene-level features, and video-level features are extracted by a set of frame-level models, a set of scene-level models, and a set of video-level models, respectively. Outputs from lower level models are used as inputs for higher level models. A brand safety score indicating whether it is safe to associate a brand with the video is determined based on the outputs from the set of video-level models. In some embodiments, commercial content associated with the brand is insert into the video that is determined to be safe for the brand. 1. A method comprising , by one or more processing devices:obtaining a video, the video including a plurality of scenes, each scene including a plurality of video frames;extracting, using a first set of models implemented by the one or more processing devices, frame-level features from each of two or more video frames in each scene of the plurality of scenes;generating, using a second set of models implemented by the one or more processing devices, scene-level features for each scene of the plurality of scenes based on the frame-level features extracted from the two or more video frames in each scene of the plurality of scenes;generating, using a third set of models implemented by the one or more processing devices, video-level features in the video based on the scene-level features generated for each scene of the plurality of scenes; anddetermining a brand safety score for the video based on the video-level features, the brand safety score indicating whether it is safe to associate a brand with the video.2. The method of claim 1 , further comprising:determining that it is safe to associate the brand with the video based on determining that the brand safety score is greater than a threshold value; andinserting content associated with the brand into the video.3. The method of claim 1 ...

Подробнее
07-01-2021 дата публикации

SYSTEM AND METHOD FOR AUTOMATED AGENT ASSISTANCE NEXT ACTIONS WITHIN A CLOUD-BASED CONTACT CENTER

Номер: US20210004821A1
Принадлежит:

Methods to reduce agent effort and improve customer experience quality through artificial intelligence. The Agent Assist tool provides contact centers with an innovative tool designed to reduce agent effort, improve quality and reduce costs by minimizing search and data entry tasks The Agent Assist tool is natively built and fully unified within the agent interface while keeping all data internally protected from third-party sharing. 1. A method , comprising:executing an automation infrastructure within a cloud-based contact center that includes a communication manager, speech-to-text converter, a natural language processor, and an inference processor exposed by application programming interfaces; andexecuting an agent assist functionality within the automation infrastructure that performs operations comprising:receiving a communication from a customer;automatically analyzing the communication to determine a subject for the customer's communication;automatically parsing a knowledgebase for at least one responsive answer to the subject associated with the customer's communication;providing the solution to an agent as a clickable link within a unified interface during the communication with the customer; andreceiving selection of the clickable link to perform a subsequent action in response to the selection.2. The method of claim 1 , wherein the communication is in textual form claim 1 , the method further comprising:displaying text input by the customer in a first field of a unified interface;parsing the text input by the customer for key terms;querying the knowledgebase using the key terms; anddisplaying responsive results from the knowledgebase as the solution in a second field in the unified interface.3. The method of claim 2 , wherein the method is performed in real-time.42. The method of claim 2 , further comprising:querying a customer relationship management (CRM) platform/a customer service management (CSM) platform using the key terms; anddisplaying ...

Подробнее
07-01-2021 дата публикации

SYSTEM AND METHOD FOR PRE-POPULATING FORMS USING AGENT ASSIST WITHIN A CLOUD-BASED CONTACT CENTER

Номер: US20210004836A1
Принадлежит:

Methods to reduce agent effort and improve customer experience quality through artificial intelligence. The Agent Assist tool provides contact centers with an innovative tool designed to reduce agent effort, improve quality and reduce costs by minimizing search and data entry tasks The Agent Assist tool is natively built and fully unified within the agent interface while keeping all data internally protected from third-party sharing. 1. A method , comprising:receiving a communication from a customer;displaying text associated with the communication in a first field of a unified user interface;automatically analyzing the text to determine key terms associated the communication; andautomatically populating the key terms into a web-based form.2. The method of claim 1 , further comprising:querying one of a knowledgebase, a customer relationship management (CRM) platform/a customer service management (CSM) platform, and a database of customer-agent transcripts using the key terms; anddisplaying responsive results to the querying in a second field in the unified interface.3. The method of claim 2 , further comprising automatically populating query results into the web form.4. The method of claim 3 , further comprising providing an editor in the call notes user interface to receive edits from the agent of the automatically populated key terms and query results.5. The method of claim 3 , wherein the populated key terms and query results are responsive inputs to the web form.6. The method of claim 1 , further comprising highlighting the key terms in the unified interface.7. The method of claim 1 , further comprising:receiving the communication as speech;converting the speech to text; andparsing the text for key terms.8. The method of claim 1 , wherein the communication is a multi-channel communication and received as one of an SMS text claim 1 , voice call claim 1 , e-mail claim 1 , chat claim 1 , interactive voice response (IVR)/intelligent virtual agent (IVA) systems claim ...

Подробнее
03-01-2019 дата публикации

TOPIC SHIFT DETECTOR

Номер: US20190005123A1
Принадлежит:

Aspects detect or recognize shifts in topics in computer implemented speech recognition processes as a function of mapping keywords to non-verbal cues. An initial topic is mapped to one or more keywords extracted from a first spoken query within a user keyword ontology mapping. A query spoken subsequent in time to the first query is identified and distinguished by recognizing one or more non-verbal cues associated with the audio data input that include a time elapsed between the queries, and in some aspects a user's facial expression or motion activity. Aspects determine whether the second spoken query is directed to the initial topic or to a new topic that is different from the initial topic, as a function of mappings of the keyword(s) extracted from the first query to one or more keywords extracted from the second query and to the non-verbal cue(s) within the user ontology mapping. 1. A computer-implemented method for detecting shifts in topics in computer implemented speech recognition processes as a function of mapping keywords to non-verbal cues , the method comprising executing on a computer processor:identifying an initial topic of a first spoken query within an audio data input from a user that is mapped to at least one keyword extracted from the first spoken query within a user keyword ontology mapping;identifying a second spoken query within the audio data input that is subsequent in time to the first spoken query and is distinguished from the first query by recognizing at least one non-verbal cue associated with the audio data input, wherein the at least one non-verbal cue comprises a time elapsed between the first spoken query and the second spoken query, and a user's motion activity relative to a programmable device comprising the processor during a time between the first spoken query and the second spoken query;determining whether the second spoken query is directed to the initial topic or to a new topic that is different from the initial topic, as a ...

Подробнее
03-01-2019 дата публикации

COMPUTER SYSTEM, DIALOGUE CONTROL METHOD, AND COMPUTER

Номер: US20190005311A1
Принадлежит:

A computer system that performs in dialogue with a user and provides a prescribed service, comprising: an imaging device; a computer; and a generation device generating dialogue content on a basis of an algorithm for generating dialogue content. The computer couples to a database that stores an authentication image used for an authentication process that uses an image. The computer calculates a distance between the user and the imaging device; executes an attribute estimation process in a case where the distance is larger than a threshold, selects the algorithm on the basis of results of the attribute estimation process, and issues a notification of the selected algorithm to the generation device. 1. A computer system that performs in dialogue with a user and provides a prescribed service , comprising:an imaging device being configured to obtain an image;a computer being configured to select an algorithm for generating dialogue content to be outputted to the user; anda generation device being configured to generate dialogue content on the basis of the algorithm,the computer having an arithmetic device, a storage device coupled to the arithmetic device, and an interface coupled to the arithmetic device, and coupling, through the interface, to a database that stores an authentication image used for an authentication process that uses an image obtained by the imaging device,the arithmetic device being configured to:calculate a distance between the user and the imaging device;execute an attribute estimation process that estimates an attribute that characterizes the user using the image obtained by the imaging device in a case where the distance is larger than a first threshold, and select the algorithm on the basis of results of the attribute estimation process;execute an authentication process that identifies the user on the basis of the image obtained by the imaging device and the database in a case where the distance is less than or equal to the first threshold, and ...

Подробнее
07-01-2021 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND PROGRAM

Номер: US20210005177A1
Принадлежит:

Implemented are an apparatus and a method for detecting misrecognition of a user speech on the basis of a subsequent interaction. The apparatus includes a voice recognition section that executes a voice recognition process on a user speech and a learning processing section that executes a process of updating a degree of confidence on the basis of an interaction made between a user and the information processing apparatus after the user speech. The degree of confidence is an evaluation value indicating the reliability of a voice recognition result of the user speech. The voice recognition section generates data on degrees of confidence in recognition of the user speech in which data plural user speech candidates based on the voice recognition result of the user speech are associated with the degrees of confidence which are evaluation values each indicating reliability of the corresponding user speech candidate. The learning processing section updates the degree-of-confidence values in the data on the degrees of confidence in recognition of the user speech, by analyzing context consistency or subject consistency in the interaction made between the user and the information processing apparatus after the user speech. 1. An information processing apparatus comprising:a voice recognition section that executes a voice recognition process on a user speech; anda learning processing section that executes a process of updating a degree of confidence, on a basis of an interaction made between a user and the information processing apparatus after the user speech, the degree of confidence being an evaluation value indicating reliability of a voice recognition result of the user speech.2. The information processing apparatus according to claim 1 , whereinthe learning processing section executes the process of updating the degree of confidence, by analyzing context consistency or subject consistency in the interaction made between the user and the information processing apparatus ...

Подробнее
07-01-2021 дата публикации

METHOD AND APPARATUS FOR PROCESSING AUDIO DATA

Номер: US20210005179A1
Автор: Tian Chao
Принадлежит:

A method and an apparatus for processing audio data are provided. The method includes: acquiring a first piece of audio data; and processing the first piece of audio data based on an antialias filter, to generate a second piece of audio data, a sampling rate of the second piece of audio data being smaller than a sampling rate of the first piece of audio data; the antialias filter being generated by: inputting training voice data in a training sample into an initial antialias filter; inputting an output of an initial antialias filter into a training speech recognition model, and generating a training speech recognition result; and adjusting the initial antialias filter based on the training speech recognition result and a target speech recognition result of the training voice data in the training sample. 1. A method for processing audio data , comprising:acquiring a first piece of audio data, a sampling rate of the first piece of audio data being a first target sampling rate; andprocessing the first piece of audio data based on a pre-generated antialias filter, to generate a second piece of audio data, a sampling rate of the second piece of audio data being a second target sampling rate, and the second target sampling rate being smaller than the first target sampling rate; whereinthe antialias filter is generated by:inputting training voice data in a training sample into an initial antialias filter;inputting an output of the initial antialias filter into a training speech recognition model, and generating a training speech recognition result; andadjusting the initial antialias filter based on the training speech recognition result and a target speech recognition result of the training voice data in the training sample, to generate the antialias filter.2. The method according to claim 1 , wherein a recognition accuracy rate of the training speech recognition model is greater than a preset accuracy rate threshold.3. The method according to claim 1 , wherein the ...

Подробнее
07-01-2021 дата публикации

SERVICE DATA PROCESSING METHOD AND APPARATUS AND RELATED DEVICE

Номер: US20210005185A1
Автор: Fang Xuewei, MA JINGLIN

In a service data processing method performed by a server, user speech information collected by a first terminal is received. A target service operation code according to the user speech information is obtained. The target service operation code is used for identifying target service operation information. The target service operation code is transmitted from the server to the first terminal, so that the first terminal plays the target service operation code by using a speech. The target service operation code obtained by a second terminal is received. A target execution page corresponding to the target service operation code is searched for. The target execution page is transmitted to the second terminal, so that the second terminal executes a service operation corresponding to the target 1. A service data processing method , comprising:receiving, by circuitry of a server, user speech information collected by a first terminal;obtaining, by the circuitry of the server, a target service operation code according to the user speech information, the target service operation code being used for identifying target service operation information;transmitting, by the circuitry of the server, the target service operation code to the first terminal, so that the first terminal plays the target service operation code by using a speech;receiving, by the circuitry of the server, the target service operation code obtained by a second terminal;searching, by the circuitry of the server, for a target execution page corresponding to the target service operation code; andtransmitting, by the circuitry of the server, the target execution page to the second terminal, so that the second terminal executes a service operation corresponding to the target service operation information in the target execution page.2. The method according to claim 1 , wherein the obtaining a target service operation code according to the user speech information comprises:performing, by the circuitry of the ...

Подробнее
07-01-2021 дата публикации

DIGITAL ASSISTANT DEVICE COMMAND PERFORMANCE BASED ON CATEGORY

Номер: US20210005189A1
Принадлежит:

One embodiment provides a method, including: receiving, at an information handling device, a user command; identifying, using a processor, a category associated with the user command; determining, based on the identifying, a digital assistant associated with the category; and performing, responsive to determining that the digital assistant is associated with the information handling device, a function corresponding to the user command using the information handling device. Other aspects are described and claimed. 1. A method , comprising:receiving, at an information handling device, a user command;identifying, using a processor, a category associated with the user command;determining, based on the identifying, a digital assistant associated with the category; andperforming, responsive to determining that the digital assistant is associated with the information handling device, a function corresponding to the user command using the information handling device.2. The method of claim 1 , wherein the category is associated with a task selected from the group consisting of a media task claim 1 , a home automation task claim 1 , a time management task claim 1 , and a list-making task.3. The method of claim 1 , wherein the identifying comprises accessing a database comprising a list of associations between user commands and categories.4. The method of claim 1 , wherein the determining comprises accessing a database comprising a list of associations between digital assistants and categories.5. The method of claim 4 , wherein the list is adjustable by a user.6. The method of claim 1 , further comprising directing claim 1 , responsive to determining that the digital assistant is associated with another device claim 1 , the another device to perform the function.7. The method of claim 6 , wherein the directing comprises automatically directing without receiving additional user input.8. The method of claim 6 , wherein the directing comprises transmitting an indication of the ...

Подробнее
07-01-2021 дата публикации

SYSTEM, SERVER, AND METHOD FOR SPEECH RECOGNITION OF HOME APPLIANCE

Номер: US20210005191A1
Принадлежит:

Provided is a system, server, and method for speech recognition capable of collectively setting a plurality of setting items for device control through an utterance of a single sentence provided in the form of natural language. The system includes: a home appliance configured to receive a speech command that is generated through an utterance of a single sentence for control of the home appliance; and a server configured to receive the speech command in the single sentence from the home appliance and interpret the speech command of the single sentence through multiple intent determination. 1. A speech recognition system for a home appliance , comprising:a home appliance configured to receive a speech command that is generated through an utterance of a single sentence for control of the home appliance; anda server configured to receive the speech command in the single sentence from the home appliance and interpret the speech command in the single sentence through multiple intent determination.2. The speech recognition system of claim 1 , wherein the speech command generated through the utterance of the single sentence includes a plurality of intents claim 1 , and the server interprets the speech command on the basis of the plurality of intents.3. The speech recognition system of claim 2 , wherein the server is configured to:generate a plurality of instruction sentence formulas by combining the plurality of intents;generate a plurality of derivative sentences on the basis of the plurality of instruction sentence formulas; andcompare the plurality of derivative sentences with a plurality of pieces of speech command data registered in the server, to find matching speech command data in the comparison.4. The speech recognition system of claim 3 , wherein the server is configured to:generate a plurality of scenarios operable by the home appliance on the basis of a function and a specification of the home appliance; andgenerate the plurality of instruction sentence formulas ...

Подробнее
07-01-2021 дата публикации

Detecting Self-Generated Wake Expressions

Номер: US20210005197A1
Принадлежит:

A speech-based audio device may be configured to detect a user-uttered wake expression. For example, the audio device may generate a parameter indicating whether output audio is currently being produced by an audio speaker, whether the output audio contains speech, whether the output audio contains a predefined expression, loudness of the output audio, loudness of input audio, and/or an echo characteristic. Based on the parameter, the audio device may determine whether an occurrence of the predefined expression in the input audio is a result of an utterance of the predefined expression by a user. 120-. (canceled)21. A device comprising:a first microphone;a second microphone;one or more processors; and generate, at a first time and using the first microphone, first audio data corresponding to sound;', 'generate, at a second time and using the second microphone, second audio data corresponding to the sound;', 'determine a difference between the first time and the second time;', 'generate, based at least in part on the difference, beamforming data using the first audio data and the second audio data; and', 'perform speech recognition on the beamforming data., 'one or more non-transitory storage media storing computer-executable instructions that, when executed by the one or more processors, cause the system to22. The device of claim 21 , wherein generating the beamforming data comprises generating directional audio data that emphasizes a first portion of at least one of the first audio data or the second audio data with respect to a second portion of at least one of the first audio data or the second audio data.23. The device of claim 21 , wherein the first microphone and the second microphone are directed upward from a top portion of the device.24. The device of claim 21 , wherein the first microphone and the second microphone comprise at least a portion of a circular arrangement of microphones.25. The device of claim 21 , wherein generating the beamforming data ...

Подробнее
07-01-2021 дата публикации

Detecting Self-Generated Wake Expressions

Номер: US20210005198A1
Принадлежит:

A speech-based audio device may be configured to detect a user-uttered wake expression. For example, the audio device may generate a parameter indicating whether output audio is currently being produced by an audio speaker, whether the output audio contains speech, whether the output audio contains a predefined expression, loudness of the output audio, loudness of input audio, and/or an echo characteristic. Based on the parameter, the audio device may determine whether an occurrence of the predefined expression in the input audio is a result of an utterance of the predefined expression by a user. 120-. (canceled)21. A system comprising:one or more microphones;one or more audio speakers;one or more processors; and generate, using the one or more microphones, first audio data;', 'determine one or more parameters associated with the first audio data;', 'analyze, using the one or more parameters, the first audio data to generate text data corresponding to the first audio data; and', 'cause, using the one or more audio speakers and based at least partly on the text data, output of second audio data., 'non-transitory computer-readable media storing instructions that, when executed by the one or more processors, cause the system to22. The system of claim 21 , wherein a first parameter of the one or more parameters corresponds to an audio input characteristic and a second parameter of the one or more parameters corresponds to a device operation characteristic.23. The system of wherein the audio input characteristic comprises an echo characteristic associated with the first audio data or a loudness characteristic associated with the first audio data.24. The system of claim 22 , wherein the device operation characteristic comprises a presence of the one or more audio speakers claim 22 , a loudness characteristic of sound generated by the one or more audio speaker claim 22 , or an amount of echo reduction performed by the one or more processors.25. The system of claim 21 , ...

Подробнее
07-01-2021 дата публикации

THIRD PARTY ACCOUNT LINKING FOR VOICE USER INTERFACE

Номер: US20210005199A1
Принадлежит:

Methods and systems for adding functionality to an account of a language processing system where the functionality is associated with a second account of a first application system is described herein. In a non-limiting embodiment, an individual may log into a first account of a language processing system and log into a second account of a first application system. While logged into both the first account and the second account, a button included within a webpage provided by the first application may be invoked. A request capable of being serviced using the first functionality may be received by the language processing system from a device associated with the first account. The language processing system may send first account data and the second account data to the first application system to facilitate an action associated with the request, thereby enabling the first functionality for the first account. 120.-. (canceled)21. A computer-implemented method comprising:receiving input data corresponding to an input to a first device presenting a user interface on a display;processing the input data to determine an intent to perform an action;determining the action corresponds to associating a first account of a natural language processing system with a second account of a first application;determining first data corresponding to the first account and the second account;sending the first data to the first application; andcausing output data from the first application to be output.22. The computer-implemented method of claim 21 , wherein the output data further comprises data from the first account.23. The computer-implemented method of claim 21 , further comprising:sending, between the natural language processing system and the first application, an identifier corresponding to an interaction taking place using the first application and the first device.24. The computer-implemented method of claim 21 , further comprising:updating a component of the natural language ...

Подробнее