Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 14367. Отображено 200.
20-04-2010 дата публикации

ПРЕДСТАВЛЕНИЕ ЦИФРОВЫХ МАТЕРИАЛОВ, ОСНОВАННОЕ НА ИНВАРИАНТНОСТЯХ МАТРИЦ

Номер: RU2387006C2

Изобретение относится к технологии представления сигналов. Техническим результатом является расширение функциональных возможностей. Система формирования компактного описания цифровых материалов содержит модуль получения, выполненный с возможностью получения цифрового материала, модуль сегментации, выполненный с возможностью разбиения упомянутого материала на множество областей, модуль вычисления, выполненный с возможностью формирования векторов характеристик для каждой области из упомянутого множества, причем векторы характеристик вычисляют на основе инвариантностей матриц, включающих в себя сингулярное разложение, модуль вывода, выполненный с возможностью формирования выходного результата, используя комбинацию вычисленных векторов характеристик, при этом выходной результат формирует вектор хэш-значений для этого цифрового материала, где вектор хэш-значений является компактным представлением цифрового материала, таким образом идентифицируя цифровой материал на основе упомянутого компактного ...

Подробнее
06-06-2017 дата публикации

СПОСОБ ОЦЕНКИ МГНОВЕННОЙ ЧАСТОТЫ РЕЧЕВОГО СИГНАЛА В ТОЧКАХ ЛОКАЛЬНОГО МАКСИМУМА

Номер: RU2621647C1

Изобретение относится к области техники анализа речи, в частности к способу оценки мгновенной частоты в точках локального максимума речевых сигналов. Техническим результатом является уменьшение количества вычислений для оценки мгновенной частоты в точках локального экстремума. Вводят в память компьютера частоты стробирования f и амплитуды звукового сигнала либо посредством использования микрофона и стандартной программы ввода, либо посредством считывания файла звукового сигнала. Производят отыскание точек локального максимума амплитуды речевого сигнала. Производят вычисление величины, где n - позиция локального максимума. Выполняют подсчет мгновенной частоты в указанной точке по формуле w=ƒ*argcos(Val). Выводят позицию локального максимума и найденную оценку мгновенной частоты в формате n, w. 2 ил.

Подробнее
26-08-2021 дата публикации

Номер: RU2020108161A3
Автор:
Принадлежит:

Подробнее
21-05-2019 дата публикации

ПОВТОРНОЕ РАСПОЗНАВАНИЕ РЕЧИ С ВНЕШНИМИ ИСТОЧНИКАМИ ДАННЫХ

Номер: RU2688277C1
Принадлежит: ГУГЛ ЭлЭлСи (US)

Изобретение относится к средствам для получения транскрипции речевого фрагмента. Технический результат заключается в повышении точности транскрипции речевого фрагмента. Получают первоначальный вариант транскрипции речевого фрагмента с использованием автоматизированного распознавателя речи. Идентифицируют, на основе языковой модели, которая не используется автоматизированным распознавателем речи при генерировании первоначального варианта транскрипции, один или более терминов, фонетически аналогичных одному или более терминам, которые уже присутствуют в первоначальном варианте транскрипции. При этом определение, аналогичны ли фонетически термины, предусматривают определение меры сходства и сравнение меры с порогом, или определение, превышает ли мера сходства меры сходства, относящиеся к другим парам терминов. Генерируют один или более дополнительных вариантов транскрипции на основе идентифицированных одного или более терминов. Выбирают транскрипцию из вариантов транскрипции. 3 н. и 17 з.п ...

Подробнее
01-09-2020 дата публикации

СПОСОБ И СИСТЕМА ДЛЯ ФОРМИРОВАНИЯ ТЕКСТОВОГО ПРЕДСТАВЛЕНИЯ ФРАГМЕНТА УСТНОЙ РЕЧИ ПОЛЬЗОВАТЕЛЯ

Номер: RU2731334C1

Группа изобретений относится к области обработки естественного языка. Техническим результатом является формирование текстового представления фрагмента устной речи пользователя с учетом характеристик пользователя и акустических свойств фрагмента речи. Способ включает в себя: прием указания на фрагмент устной речи пользователя; формирование по меньшей мере двух гипотез; формирование электронным устройством из этих по меньшей мере двух гипотез набора спаренных гипотез, одна из которых содержит первую гипотезу, спаренную со второй гипотезой; определение для одной пары спаренных гипотез из набора спаренных гипотез парной оценки; формирование набора признаков фрагмента речи, указывающего на одну или несколько характеристик, связанных с этим фрагментом устной речи пользователя; ранжирование первой гипотезы и второй гипотезы на основе по меньшей мере парной оценки и набора признаков этого фрагмента речи и выбор первой гипотезы в качестве текстового представления этого фрагмента устной речи пользователя ...

Подробнее
27-08-2012 дата публикации

УСТРОЙСТВО И СПОСОБЫ ДЛЯ ОБРАБОТКИ АУДИО СИГНАЛА, С ЦЕЛЬЮ ПОВЫШЕНИЯ РАЗБОРЧИВОСТИ РЕЧИ, ИСПОЛЬЗУЯ ФУНКЦИЮ ВЫДЕЛЕНИЯ НУЖНЫХ ХАРАКТЕРИСТИК

Номер: RU2011105976A
Принадлежит:

... 1. Устройство для обработки аудио сигнала для получения контрольной информации для фильтра повышения разборчивости речи, включающий устройство выделения признаков для получения временной последовательности кратковременных спектральных представлений звукового сигнала и для извлечения хотя бы одной характеристики в каждой полосе частот множества полос частот для множества кратковременных спектральных представлений, по крайней мере, одной характеристики. представляющей спектральную форму кратковременного спектрального представления в полосе частот множества частотных диапазонов, а также устройство объединения признаков для объединения, по крайней мере, одной характеристики для каждой полосы с использованием параметров комбинирования для получения контрольной информации для фильтра повышения разборчивости речи для временной части звукового сигнала. ! 2. Устройство по п.1, в котором устройство выделения признаков извлекает, по крайней мере, одну дополнительную характеристику, представляющую ...

Подробнее
27-09-2018 дата публикации

Lernmodellkonstruktionsvorrichtung, Abnormalitätsdetektionsvorrichtung, Abnormalitätsdetektionssystem und Server

Номер: DE102018204135A1
Принадлежит:

Zum Bereitstellen einer Lernmodell-Konstruktionsvorrichtung, Abnormalitäts-Detektionsvorrichtung, eines Abnormalitäts-Detektionssystems und eines Servers zum Durchführen von Abnormalitätsdetektion unter Verwendung von Geräuschinformation der Umgebung einer Fertigungseinrichtung. Eine Lernmodell-Konstruktionsvorrichtung 200 beinhaltet eine Stimmerfassungseinheit 220, die Stimmdaten einschließlich der Stimme eines Bedieners erfasst, der in der Umgebung einer Fertigungseinrichtung lokalisiert ist, über ein Mikrofon 100; eine Labelerfassungseinheit 230, die einen Abnormalitätsgrad erfasst, der sich auf eine Fertigungslinie bezieht, welche die Fertigungseinrichtung beinhaltet, als ein Label; und eine Lerneinheit 240, die ein Lernmodell für den Abnormalitätsgrad konstruiert, durch Durchführen supervidierten Lernens mit einer Gruppe von Stimmdaten und Label als Trainingsdaten.

Подробнее
03-12-1970 дата публикации

Spracherkennungsvorrichtung

Номер: DE0002021126A1
Принадлежит:

Подробнее
22-02-2017 дата публикации

Method and system for providing alerts for radio communications

Номер: GB0002541562A
Принадлежит:

A method and system for providing alerts for radio communications are provided. One or more keywords are generated based on one or more contextual parameters associated with a radio device. An audio stream is received at the radio device from a radio transmitter. One or more of the one or more keywords are detected in the audio stream, and an alert for the audio stream is provided to a user of the radio device.

Подробнее
05-04-2017 дата публикации

Customer service appraisal device, customer service appraisal system, and customer service appraisal method

Номер: GB0002542959A
Принадлежит:

A customer service appraisal device for appraising the service a person provides a customer comprises: a speech input unit (31) into which a person's speech is input as a speech signal; a keyword detector (32) for detecting one or more predetermined customer service keywords from the person's speech by acquiring the speech signal; a speech feature acquisition unit (34) for acquiring a speech feature amount for the customer service keywords detected by the keyword detector (32); and a customer service score calculation unit (36) for calculating an appraisal score as an index indicating whether the service a person provided a customer is good or bad on the basis of a detection amount for customer service keywords detected by the keyword detector (32) and the speech feature amount.

Подробнее
22-03-2017 дата публикации

A spoken dialogue system, a spoken dialogue method and a method of adapting a spoken dialogue system

Номер: GB0201701918D0
Автор:
Принадлежит:

Подробнее
30-08-2017 дата публикации

Speech processing systems

Номер: GB0201711344D0
Автор:
Принадлежит:

Подробнее
25-03-1992 дата публикации

Speech categorization system

Номер: GB0002248137A
Принадлежит:

A speech categorization system includes first and second timers which generate first and second measured durations indicative of duration of selected higher and lower amplitude segments included in a voice message. A higher amplitude segment is classified in a first category when the first and second measured durations corresponding to the higher amplitude segment and an adjacent lower amplitude segment satisfy a classification test, and a counter counts the number of the higher amplitude segments classified in the first category Accented syllables in the higher amplitude segment are recognized to aid classification.

Подробнее
09-04-2008 дата публикации

Improving pattern recognition accuracy with distortions

Номер: GB0002418764B
Принадлежит: FLUENCY VOICE TECHNOLOGY LTD

Подробнее
15-10-2008 дата публикации

Systems and methods for active listening/observing and event detection

Номер: GB2448408A
Принадлежит:

The present invention relates to the detection of events, especially in a medical/hospital setting, primarily by the use of a speech recognition system 220. An events knowledge database is used in a decision making process to record and respond to conversations 211 amongst caregivers such as doctors, nurses or surgeons. If a certain event is detected, a response 250 such as the summoning of an emergency rapid response team may be initiated. The recording of information may also be used to assist in the diagnosis of problems earlier than previously possible. The speech recognition software may differentiate between different individuals when monitoring a conversation. In addition to the monitoring of conversations, the system may monitor the output from devices that gauge or monitor the patient's physiology, such as blood pressure, heart rate, temperature etc. 210 to aid in early diagnosis.

Подробнее
02-11-2016 дата публикации

Speech synthesis using dynamical modelling with global variance

Номер: GB0002537907A
Принадлежит:

A text-to-speech (TTS) system is trained according to a linear dynamic model (LDM) whereby text is converted to a sequence of linguistic units (eg. phonemes, sub-phonemes), each state of which is looked up in an acoustic model table to produce a sequence of speech vectors which is adjusted to increase the variance of the speech vectors vi(d) based on a predefined global variance v before being output as speech. A predefined number T of hidden vectors xt evolve according to a state equation involving an observation matrix H, state transformation matrix F, covariance matrices Q & R and mean vectors m. Second order LDMs may be constrained to be critically damped towards a target q, and speech parameter trajectories Y may be calculated according to a steepest ascent method.

Подробнее
28-11-2001 дата публикации

A maximum entropy and maximum likelihood criteria for feature selection from multivariate data

Номер: GB2362744A
Принадлежит:

Improvements in speech recognition systems are achieved by considering projections of the high dimensional data on lower dimensional subspaces, subsequently by estimating the univariate probability densities via known univariate techniques, and then by reconstructing the density in the original higher dimensional space from the collection of univariate densities so obtained. The reconstructed density is by no means unique unless further restrictions on the estimated density are imposed. The variety of choices of candidate univariate densities as well as the choices of subspaces on which to project the data including their number further add to this non-uniqueness. Probability density functions are then considered that maximize certain optimality criterion as a solution to this problem. Specifically, those probability density functions that either maximize the entropy functional, or alternatively, the likelihood associated with the data are considered.

Подробнее
17-03-1965 дата публикации

Selective signal amplitude detector

Номер: GB0000986520A
Автор:
Принадлежит:

... 986, 520. Indicating extremes of amplitude. INTERNATIONAL BUSINESS MACHINES CORPORATION. Dec. 4, 1962 [Dec. 7, 1961], No. 45737/62. Heading G1U. [Also in Division G4] In speech recognition apparatus an indication of local amplitude extreme in a number of frequency channels is provided by applying each channel to a transistor adapted to respond to a local maximum, the output of any transistor which responds serving also to inhibit response of at least adjacent transistors. The speech signal from microphone 7 is applied to band-pass filters 8a-8n the outputs of which pass to detectors 10a-10n giving D.C. amplitudes proportional to the square of the input amplitudes. These signals are therefore measures of the power in the corresponding frequency band. The curve of Fig. 2 represents the power in the different frequency bands, each maximum being accompanied by smaller peaks on each side, relating to harmonics of the maximum frequency. The circuit is designed to respond only to the main maxima ...

Подробнее
24-07-2019 дата публикации

Computer-implemented phoneme-grapheme matching

Номер: GB0002546536B

Подробнее
27-06-2012 дата публикации

Identifying a speaker via mouth movement and generating a still image

Номер: GB0002486793A
Принадлежит:

... photographing apparatus recognizes the shape of a speaker's mouth in order to identify a speaker area. That is, a person in a group could be identified as the speaker by the movement of their mouth and lips. A still image of the speaker area is then generated (P1-P3). The method can selectively perform image signal processing (ISP) with respect to the detected speaker area in a moving image. The ISP may include out-of-focusing, macro zooming, and macro focusing. The apparatus may enhance a moving image by generating a still image including the speaker area and using the still image as a bookmark. The speakerâ s voice may also be used to identify the speaker area.

Подробнее
14-05-2008 дата публикации

System and methods for active listening/observing and event detection

Номер: GB0000806201D0
Автор:
Принадлежит:

Подробнее
11-07-2018 дата публикации

Automated speech pronunciation attribution

Номер: GB0002558353A
Принадлежит:

A Method, system, and apparatus. Candidate user profiles are determined as being associated with a shared device (e.g. a device that acts as a voice assistant, digital assistant etc.) 310. Pronunciation attributes associated with candidate profiles, associated with the shared device, are identified 320. A spoken utterance (e.g. a key word or user name) is received at the shared device 330. A pronunciation attribute (e.g. accent, dialect) is determined (e.g. using a speech recognition process) from the received audio data corresponding to the spoken utterance 340. The pronunciation attribute is compared to at least one of the candidate pronunciation attributes 350. A particular pronunciation attribute is selected based on the comparison 360. An audio based reply with the particular pronunciation attribute, selected from one of the candidate pronunciation attributes, is output 370. Candidate user profiles associated with the shared device may be determined through a relationship between each ...

Подробнее
26-08-2020 дата публикации

Creating modular conversations using implicit routing

Номер: GB0002581660A
Принадлежит:

A computer implemented method of routing a verbal input to one of a plurality of handlers, comprising using one or more processors adapted to execute a code, the code is adapted for receiving a verbal input from a user, applying a plurality of verbal content identifiers to the verbal input, each of the verbal content identifiers is adapted to evaluate an association of the verbal input with a respective one of a plurality of handlers by computing a match confidence value for one or more features,such as an intent expressed by the user and/or an entity indicated by the user, extracted from the verbal input and routing the verbal input to a selected one of the handlers based on the matching confidence value computed by the plurality of verbal content identifiers. The selected handler is adapted to initiate one or more actions in response to the verbal input.

Подробнее
19-01-2022 дата публикации

End of speech detection using one or more neural networks

Номер: GB0002597126A
Принадлежит:

An Automatic Speech Recognition/voice transcription system indicates an End of Speech segment based on one or more characters predicted to be within the segment, especially a particular percentage of blank (non-speech) characters within a sliding window 352, fig. 3B (eg. 95% within 500 ms). Audio data is input to a Connectionist Temporal Classification (CTC) neural network model 304 to generate character probabilities based on extracted features (eg. mel-spectogram 204, fig. 2). The Start (11) and End (12) Of Speech segments are then detected 310 via a greedy (eg. ArgMax) decoder 308.

Подробнее
30-06-2021 дата публикации

End of speech detection using one or more neural networks

Номер: GB202107009D0
Автор:
Принадлежит:

Подробнее
25-04-2018 дата публикации

A speech processing system and a method of processing a speech signal

Номер: GB0201804073D0
Автор:
Принадлежит:

Подробнее
22-02-2023 дата публикации

Processing method and device

Номер: GB0002610013A
Принадлежит:

Obtaining input information from an input member of an electronic apparatus. The input information includes a behaviour parameter in a process of inputting a target word. The method further includes determining a display parameter of the target word based on the behaviour parameter to display the target word on a target display according to the display parameter. The display parameter represents feature information when inputting the target word. The input information may include trajectory information from the input, audio input information, or posture input information.

Подробнее
13-09-2023 дата публикации

Cloud service platform system for speakers

Номер: GB0002616512A
Принадлежит:

A cloud service platform system for smart speakers comprising a speech input module, network connection and player receives positioning data from working speakers, marks two speakers within a threshold distance of each other as “suspected same group” and sends a “suspected same group acknowledgment message” to the corresponding speakers. If the feedback result is “yes”, data for the same group of speakers is unified and transmitted to any speaker within the group, and any speaker transmits data within the group. If a “NO” result is fed back, play data is transmitted successively according to a weight priority and volume is controlled. This allows a single application to manage conflicts between nearby speakers.

Подробнее
15-01-2011 дата публикации

SPEECH RECOGNITION

Номер: AT0000494610T
Принадлежит:

Подробнее
15-12-2011 дата публикации

COMMUNICATIONS EQUIPMENT WITH LOUDSPEAKER-INDEPENDENT SPEECH RECOGNITION

Номер: AT0000536611T
Принадлежит:

Подробнее
15-12-2008 дата публикации

PROCEDURE FOR SENDING LANGUAGE END MARKINGS IN A SPEECH RECOGNITION SYSTEM

Номер: AT0000415773T
Принадлежит:

Подробнее
15-09-2008 дата публикации

SPEECH RECOGNITION SYSTEM AND PROCEDURE ON PHONETIC BASIS

Номер: AT0000405919T
Принадлежит:

Подробнее
15-06-2011 дата публикации

PROCEDURE FOR SPEECH RECOGNITION

Номер: AT0000510278T
Принадлежит:

Подробнее
15-01-1988 дата публикации

FAR TO THE SUGGESTION ANALYSIS FOR THE AUTOMATIC SPEECH RECOGNITION.

Номер: AT0000031989T
Принадлежит:

Подробнее
15-09-1989 дата публикации

SPEECH RECOGNITION SYSTEM.

Номер: AT0000045831T
Принадлежит:

Подробнее
15-02-2006 дата публикации

SPEECH RECOGNITION

Номер: AT0000316678T
Принадлежит:

Подробнее
15-02-2004 дата публикации

PROCEDURE AND DEVICE FOR SCANNING AN EXCITATION CODE BOOK WITH A CELP CODER

Номер: AT0000259532T
Принадлежит:

Подробнее
04-02-2021 дата публикации

A SYSTEM AND A METHOD FOR CROSS-LINGUISTIC AUTOMATIC SPEECH RECOGNITION

Номер: AU2020103587A4
Принадлежит:

The present disclosure relates to a system and a method for cross linguistic automatic speech recognition. The method includes receiving corpus of speech from various sources including phone file and input speech signal of various languages using input unit, training corpus of speech and thereby creating a dictionary file by employing a training module, extracting phones from the dictional file and extracting unique word from the transcription of phone file and input speech signal by means of a dynamic feature extraction unit, making an utterance by deploying an acoustic model, wherein the acoustic model is enclosed by arithmetical demonstrations for single discrete significance, wherein each of the arithmetical demonstrations is assigned with a tag related to a phoneme, decoding various languages in a particular language, and generating and classifying of robust spontaneous speech model for multilingual speech system by employing machine learning model for generating transcription in different ...

Подробнее
16-01-2020 дата публикации

METHOD, APPARATUS AND SYSTEM FOR SPEAKER VERIFICATION

Номер: AU2019279933A1
Принадлежит: IP& Pty Ltd

Abstract The present disclosure relates to a method, apparatus, and system for speaker verification. The method includes: acquiring an audio recording; extracting speech signals from the audio recording; extracting features of the extracted speech signals; and determining whether the extracted speech signals represent speech by a predetermined speaker based on the extracted features and a speaker model trained with reference voice data of the predetermined speaker.

Подробнее
29-07-2004 дата публикации

COMPREHENSIVE SPOKEN LANGUAGE LEARNING SYSTEM

Номер: AU2003300143A1
Принадлежит:

Подробнее
26-07-1999 дата публикации

A vocoder-based voice recognizer

Номер: AU0008355398A
Принадлежит:

Подробнее
20-07-2017 дата публикации

SOCIAL MESSAGING USER INTERFACE

Номер: AU2017204474A1

Hubs for social interaction via electronic devices are described. In one aspect, a data processing device includes a display screen displaying a social interaction hub, the social interaction hub including a collection of records. Each record includes a counterparty identifier identifying a counterparty of a past social interaction event, a mode indicium identifying a mode by which the past social interaction event with the counterparty occurred, and a collection of mode indicia each identifying a mode by which a future, outgoing social interaction event with the counterparty can occur. The counterparty identifier, the mode indicium, and the collection of mode indicia are associated with one another in the records of the social interaction hub.

Подробнее
08-05-2001 дата публикации

Small vocabulary speaker dependent speech recognition

Номер: AU0001049601A
Принадлежит:

Подробнее
03-07-2001 дата публикации

Voice-controlled animation system

Номер: AU0002276601A
Принадлежит:

Подробнее
11-12-2007 дата публикации

SYSTEM AND METHOD OF SPOKEN LANGUAGE UNDERSTANDING IN HUMAN COMPUTER DIALOGS

Номер: CA0002413658C
Принадлежит: AT&T CORP.

A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self contained clauses, identifying a dialog act in each of the self contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify.

Подробнее
16-06-2004 дата публикации

VOICE RECOGNITION SYSTEM

Номер: CA0002419526A1
Автор: TASCHEREAU, JOHN
Принадлежит:

A method of matching an utterance comprising a word to a listing in a directory using an automated speech recognition system by forming a word list comprising a selection of words from the listings in the directory; using the automated speech recognition system to determine the best possible matches of the word in the utterance to the words in the word list; creating a grammar of listings in the directory that contain at least one of the best possible matches; and using the automated speech recognition system to match the utterance to a listing within the grammar.

Подробнее
03-04-2021 дата публикации

PHONEME SOUND BASED CONTROLLER

Номер: CA3095032A1
Принадлежит:

Disclosed herein is a phoneme sound based controller apparatus including: a sound input for receiving a sound signal; a phoneme sound detection module connected to the sound input to determine if at least one phoneme is detected in the sound signal; a dictionary containing at least one word, the word including at least one syllable, the syllable including the at least one phoneme; a grammar containing at least one rule, the at least one rule containing the at least one word, the at least one rule further containing at least one control action. At least one control action is taken if the at least one phoneme is detected in the sound input signal by the phoneme sound detection module. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

Подробнее
15-03-2017 дата публикации

AUTOMATIC VOICE RECOGNITION WITH DETECTION OF AT LEAST ONE CONTEXTUAL ELEMENT, APPLICATION TO STEERING AND MAINTENANCE OF AN AIRCRAFT

Номер: CA0002942116A1
Принадлежит:

Ce dispositif de reconnaissance vocale automatique (30) comprend une unité (32) d'acquisition d'un signal audio, un dispositif de détection (36) pour détecter l'état d'au moins un élément contextuel, et un décodeur linguistique (38) pour la détermination d'une instruction orale correspondant au signal audio. Le décodeur linguistique (38) comprend au moins un modèle acoustique (42) définissant une loi de probabilité acoustique et au moins deux modèles syntaxiques (44) définissant chacun une loi de probabilité syntaxique. Le décodeur linguistique (38) comprend également un algorithme de construction d'instruction orale (46) mettant en oeuvre le modèle acoustique (42) et une pluralité de modèles syntaxiques actifs pris parmi les modèles syntaxiques (44), un processeur de contextualisation (48) pour sélectionner, en fonction de l'état du ou de chaque élément contextuel détecté par le dispositif de détection (36), au moins un modèle syntaxique sélectionné parmi la pluralité de modèles syntaxiques ...

Подробнее
06-08-1996 дата публикации

SPEECH CODING APPARATUS HAVING SPEAKER DEPENDENT PROTOTYPES GENERATED FROM A NONUSER REFERENCE DATA

Номер: CA0002077728C

A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature ...

Подробнее
25-09-1995 дата публикации

SIGNAL BIAS REMOVAL FOR ROBUST TELEPHONE SPEECH RECOGNITION

Номер: CA0002141010A1
Принадлежит:

A signal bias removal (SBR) method based on the maximum likelihood estimation of the bias for minimizing undesirable effects in speech recognition systems is described. The technique is readily applicable in various architectures including discrete (vector-quantization based), semicontinuous and continuous-density Hidden Markov Model (HMM) systems. For example, the SBR method can be integrated into a discrete density HMM and applied to telephone speech recognition where the contamination due to extraneous signal components is unknown. To enable real-time implementation, a sequential method for the estimation of the bias (SSBR) is disclosed.

Подробнее
29-06-1995 дата публикации

DISTRIBUTED VOICE RECOGNITION SYSTEM

Номер: CA0002179759A1
Принадлежит:

A voice recognition system having a feature extraction apparatus (22) is located in a remote station (40). The feature extraction apparatus (22) extracts features from an input speech frame and then provides the extracted features to a central processing station (42). In the central processing station (42), the features are providod to a word decoder (48) which determines the syntax of the input speech frame.

Подробнее
24-08-1995 дата публикации

Method and Apparatus for Group Encoding Signals

Номер: CA0002157024A1
Принадлежит:

Подробнее
06-08-1996 дата публикации

SPEECH RECOGNITION APPARATUS HAVING A SPEECH CODER OUTPUTTING ACOUSTIC PROTOTYPE RANKS

Номер: CA0002073991C

A speech coding and speech recognition apparatus. The value of at least one feature of an utterance is measured over each of a series of successive time intervals to produce a series of feature vector signals. The closeness of the feature value of each feature vector signal to the parameter value of each of a set of prototype vector signals is determined to obtain prototype match scores for each vector signal and each prototype vector signal. For each feature vector signal, first-rank and second-rank scores are associated with the prototype vector signals having the best and second best prototype match scores, respectively. For each feature vector signal, at least the identification value and the rank score of the first-ranked and second-ranked prototype vector signals are output as a coded utterance representation signal of the feature vector signal, to produce a series of coded utterance representation signals. For each of a plurality of speech units, a probabilistic model has a plurality ...

Подробнее
12-09-1997 дата публикации

METHOD AND RECOGNIZER FOR RECOGNIZING A SAMPLED SOUND SIGNAL IN NOISE

Номер: CA0002247364A1
Принадлежит:

A sound recognizer uses a feature value normalization process to substantially increase the accuracy of recognizing acoustic signals in noise. The sound recognizer includes a feature vector device (110) which determines a number of feature values for a number of analysis frames, a min/max device (120) which determines a minimum and maximum feature value for each of a number of frequency bands, a normalizer (130) which normalizes each of the feature values with the minimum and maximum feature values resulting in normalized feature vectors, and a comparator (140) which compares the normalized feature vectors with template feature vectors to identify one of the template feature vectors that most resembles the normalized feature vectors.

Подробнее
03-04-2001 дата публикации

METHOD AND SYSTEM FOR PERFORMING SPEECH RECOGNITION

Номер: CA0002192397C
Принадлежит: AT&T CORP.

Speech recognition processing is compensated for improving robustness of spe ech recognition in the presence of enhanced speech signals. The compensation ove rcomes the adverse effects that speech signal enhancement may have on speech recognition perfor mance where speech signal enhancement causes acoustical mismatches between recognition m odels trained using unenhanced speech signals and feature data extracted from enhanced spe ech signals. Compensation is provided at the front end of an automatic speech recognition system by combining linear predictive coding and mel-based cepstral parameter analysis for compu ting cepstral features of transmitted speech signals used for speech recognition processing by sele ctively weighting mel- filter banks when processing frequency domain representations of the enhance d speech signals.

Подробнее
28-05-1998 дата публикации

SPEECH PROCESSING SYSTEM

Номер: CA0002264773A1
Принадлежит:

A speech processing system (10) incorporates an analogue to digital converter (16) to digitise input speech signals for Fourier transformation to produce short-term spectral cross-sections. These cross-sections are compared with one hundred and fifty reference patterns in a store (34), the patterns having respective stored sets of formant frequencies assigned thereto by a human expert. Six stored patterns most closely matching each input cross-section are selected for further processing by dynamic programming, which indicates the pattern which is a best match to the input cross-section by using frequencyscale warping to achieve alignment. The stores formant frequencies of the best matching pattern are modified by the frequency warping, and the results are used as formant frequency estimates for the input cross-section. The frequencies are further refined on the basis of the shape of the input crosssection near to the chosen formants. Formant amplitudes are produced from input cross-section ...

Подробнее
31-08-1967 дата публикации

Spracherkennungsgerät

Номер: CH0000442782A

Подробнее
15-10-2020 дата публикации

Device for acting on at least parts of a body.

Номер: CH0000716065A1
Принадлежит:

Die Erfindung betrifft eine Vorrichtung und ein Verfahren zum Einwirken auf zumindest Teile eines Körpers eines Benutzers, insbesondere mit medizinisch-kosmetischer Strahlung und/oder durch mechanische Beeinflussung, umfassend Mittel zum Einwirken (22, 23) auf den menschlichen Körper, die einstellbare Betriebsparameter umfassen, und eine mit den Mitteln zum Einwirken (22, 23) verbundene Steuerung (38), die die Betriebsparameter einstellt. Eine Vorrichtung zum Einwirken auf zumindest Teile eines Körpers eines Benutzers, mit der ein Benutzer insbesondere auch im laufenden Betrieb die Vorrichtung auf seine Bedürfnisse einstellen kann, wird erfindungsgemäß dadurch geschaffen, dass zumindest ein Mikrofon vorgesehen ist, durch das Spracheingaben des Benutzers zur Steuerung der Betriebsparameter erfassbar sind, und dass das Mikrofon mit einer Auswerteeinheit verbunden ist, die von dem Mikrofon erfasste Spracheingaben in Steuerbefehle für die Mittel zum Einwirken (22, 23) umwandelt.

Подробнее
15-10-2021 дата публикации

Verfahren und Automat zur Spracherkennung deutscher Dialekte.

Номер: CH0000717305A1
Принадлежит:

Die erfindung betrifft ein Verfahren und einen Transduktorautomat zum Zuordnen von Audioverteilungen, die gesprochenen deutschen Dialekten entsprechen, zu Wortsequenzen in geschriebenem Hochdeutsch. Der Transduktorautomat Folgende umfasst: -einen ersten endlichen Transduktor zum Zuordnen der Audioverteilungen zu kontextabhängigen Phonen; einen zweiten endlichen Transduktor zum Zuordnen der kontextabhängigen Phone zu kontextunabhängigen Phonen; einen dritten endlichen Transduktor (L') zum Zuordnen der kontextunabhängigen Phone zu Pseudotokens in einer deutschen Dialektsprache; einen vierten endlichen Transduktor (X) zum Zuordnen der Pseudotokens in einer deutschen Dialektsprache zu den hochdeutschen Wörtern (w 1 , w 2 , ...); einen fünften endlichen Transduktor zum Zuordnen deutscher Wörter (w 1 , w 2 , ...) zu Wortsequenzen: ...

Подробнее
30-11-2018 дата публикации

Voice model training method, voice recognition method, devices, facility and medium

Номер: CN0108922515A
Автор: TU HONG
Принадлежит:

Подробнее
03-05-2017 дата публикации

Voice wakeup method and voice wakeup device based on artificial intelligence

Номер: CN0106611597A
Автор: TANG LILIANG
Принадлежит:

Подробнее
28-04-1999 дата публикации

Speech processing

Номер: CN0001215491A
Автор: MILNER B P, B. P. MILNER
Принадлежит:

Подробнее
10-09-2019 дата публикации

Voice feature processing method for voiceprint recognition under noise environment

Номер: CN0105679312B
Автор:
Принадлежит:

Подробнее
06-07-2016 дата публикации

Conference recording method and system for video network conference

Номер: CN0105745921A
Автор: WANG XIAOGUANG
Принадлежит:

Подробнее
25-01-2019 дата публикации

Speech recognition method based on multi-channel convolution neural network

Номер: CN0109272988A
Принадлежит:

Подробнее
16-04-2019 дата публикации

Self-adapting method of DNN acoustic model based on personal identity characteristics

Номер: CN0109637526A
Автор: LI YING, YAN BEIBEI, GUO XUDONG
Принадлежит:

Подробнее
10-07-2013 дата публикации

Frequency axis elastic coefficient estimation device and system method

Номер: CN101809652B
Автор: EMORI TADASHI
Принадлежит:

Подробнее
22-01-2019 дата публикации

Speech recognition method and apparatus

Номер: CN0106782504B
Автор:
Принадлежит:

Подробнее
13-08-2021 дата публикации

Intelligent microphone and signal processing method thereof

Номер: CN113259793A
Принадлежит:

The invention relates to an intelligent microphone and a signal processing method thereof. The intelligent microphone comprises a sound sensor and an AI special sound processor; the sound sensor collects a sound signal and converts the sound signal into an audio signal; the AI special sound processor carries out identification processing on the audio signal and extracts audio features from the audio signal; then whether a control signal is output or not is judged according to the audio characteristics, the control signal is used for awakening a rear-end processor, and the rear-end processor can respond to a sound signal collected by the intelligent microphone after being awakened; the intelligent microphone is arranged in a semiconductor packaging body, and the sound sensor and the AI special sound processor are arranged in a bare chip of the semiconductor packaging body; in the scheme, the intelligent microphone is provided with the AI special sound processor to identify the sound signal ...

Подробнее
29-06-2018 дата публикации

A data processing method and system

Номер: CN0108231064A
Автор:
Принадлежит:

Подробнее
09-07-1976 дата публикации

Speech recognition device - converting analogue speech signal into digital signals with given repetition period uses digital filters

Номер: FR0002294505A1
Автор:
Принадлежит:

Подробнее
07-12-1979 дата публикации

APPARATUS OF FORMATION OF REPRESENTATIVE SIGNALS OF THE WORD

Номер: FR0002425689A1
Автор:
Принадлежит:

Подробнее
21-04-2020 дата публикации

VOICE TRIGGER FOR A DIGITAL ASSISTANT

Номер: KR0102103057B1
Автор:
Принадлежит:

Подробнее
18-11-2020 дата публикации

The Expression System for Speech Production and Intention Using Derencephalus Action

Номер: KR0102180551B1
Автор:
Принадлежит:

Подробнее
13-02-2020 дата публикации

APPARATUS FOR RECORDING VOICE AND METHOD THEREOF

Номер: KR0102077193B1
Автор: SONG IN HYUK, LEE YE CHAN
Принадлежит:

Подробнее
03-05-2019 дата публикации

Номер: KR0101975057B1
Автор:
Принадлежит:

Подробнее
21-07-2020 дата публикации

APPARATUS AND METHOD FOR EVALUATING PRONUNCIATION ACCURACY FOR FOREIGN LANGUAGE EDUCATION

Номер: KR1020200087623A
Автор:
Принадлежит:

Подробнее
26-02-2019 дата публикации

음성 인식을 수행하는 방법 및 이를 사용하는 전자 장치

Номер: KR1020190018886A
Принадлежит:

... 다양한 실시 예에 따른 전자 장치에 있어서, 메모리; 마이크; 및 상기 메모리 및 마이크와 전기적으로 연결된 프로세서를 포함하고, 상기 메모리는, 실행 시에, 상기 프로세서가, 상기 마이크를 통해, 제1 시간에, 사용자로부터 제1 음성 신호를 수신하고, 상기 수신된 제1 음성 신호를 기반으로, 상기 제1 음성 신호에 상응하는 제1 자연어를 획득하고, 상기 획득된 제1 자연어를 기반으로, 상기 제1 음성 신호에 따른 상기 사용자의 요청을 처리할 수 없는 경우, 상기 제1 자연어를 보완 데이터로 상기 메모리에 저장하고, 상기 마이크를 통해, 제2 시간에, 상기 사용자로부터 제2 음성 신호를 수신하고, 상기 수신된 제2 음성 신호를 기반으로, 상기 제2 음성 신호에 상응하는 제2 자연어를 획득하고, 상기 획득된 제2 자연어를 기반으로, 상기 제2 음성 신호에 따른 상기 사용자의 요청을 처리할 수 있는 경우, 상기 제1 자연어 및 상기 제2 자연어 간 유사도 및 상기 제1 시간 및 상기 제2 시간의 차이가 기 지정된 시간 이내인지 판단하고, 및 상기 판단에 기초하여, 상기 제1 자연어 및 상기 제2 자연어가 유사하고, 상기 제1 시간 및 상기 제2 시간의 차이가 기 지정된 시간 이내인 경우, 상기 제2 자연어를 상기 제1 자연어에 상응하도록 상기 메모리에 상기 보완 데이터를 저장하도록 하는 인스트럭션들(instructions)을 포함할 수 있다. 이 밖의 다른 실시 예도 가능하다.

Подробнее
26-10-2017 дата публикации

COMBINED AND LEARNED DEEP NEURAL NETWORK ENSEMBLE BASED ACOUSTIC MODEL FOR VOICE RECOGNITION IN REVERBERANT ENVIRONMENT AND VOICE RECOGNITION METHOD USING SAME

Номер: KR1020170119152A
Принадлежит:

Disclosed are a combined and learned deep neural network ensemble based acoustic model for voice recognition in a reverberant environment and a voice recognition method using the same. The voice recognition method using a combined and learned deep neural network ensemble based acoustic model for voice recognition in a reverberant environment comprises the following steps: extracting a feature vector from an input voice signal; combining the feature vector by using an ensemble of a deep neural network ensemble based acoustic model previously learned with respect to each reverberant environment; and classifying a phoneme and recognizing voice. The deep neural network ensemble based acoustic model which is previously learned can estimate a phoneme probability with respect to each reverberant environment in multiple reverberant environments. COPYRIGHT KIPO 2017 ...

Подробнее
29-08-2019 дата публикации

Номер: KR1020190100484A
Автор:
Принадлежит:

Подробнее
29-05-2020 дата публикации

VOICE RECOGNIZING METHOD AND VOICE RECOGNIZING APPRATUS

Номер: KR1020200059703A
Автор:
Принадлежит:

Подробнее
01-08-2017 дата публикации

DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION METHOD AND APPARATUS THEREOF

Номер: KR1020170088165A
Принадлежит:

A deep neural network-based speech recognition method according to an aspect of the present invention comprises the steps of: receiving a voice signal; converting the voice signal into a frequency signal; calculating a plurality of max-pooling input node values corresponding to each node of a hidden layer, which is a next node, by using the weighted sum of a vector signal composed of the frequency signal and a weight vector; and determining a largest value among the plurality of max-pooling input node values as a next node value of the hidden layer, wherein the weight vector is calculated by compressing a reference weight vector preset by learning on a time axis. COPYRIGHT KIPO 2017 (AA) Start (BB) End (S510) Input a voice signal (S520) Convert a frequency signal (S530) Calculate a max-pooling input node (S540) Determine a node value of a hidden layer ...

Подробнее
27-12-2016 дата публикации

METHOD AND DEVICE FOR VOICEPRINT IDENTIFICATION

Номер: KR1020160149132A
Автор: LI CHAO, GUAN YONG
Принадлежит:

The present invention suggests a method and an apparatus for voiceprint identification. The method for voiceprint identification includes: a step of displaying an alarm text to a user, wherein the alarm text is a combination of phrases that the user registers beforehand; a step of obtaining a voice of the user reciting the alarm text; and a step of obtaining a registration model formed beforehand when the voice corresponds to the alarm text, and determining a voiceprint identification result based on the voice and the registration model. The method can guarantee that the user is not required to speak for an excessively long time and assure variability of voice content to prevent record deception. COPYRIGHT KIPO 2017 (S11) Displaying an alarm text to a user, wherein the alarm text is a combination of phrases that the user registers beforehand (S12) Obtaining a voice of the user reciting the alarm text (S13) Obtaining a registration model formed beforehand when the voice corresponds to the ...

Подробнее
16-10-2004 дата публикации

FORMANT TRACKING METHOD AND APPARATUS USING RESIDUAL MODEL FOR TRACKING FORMANTS WITHOUT REDUCING TRACKING SPACE

Номер: KR20040088364A
Принадлежит:

PURPOSE: A method and apparatus for tracking formants of a voice signal using a residual model are provided to carry out formant tracking without reducing a tracking space by using a formant tracking space having different formants in different frames of a voice signal. CONSTITUTION: A formant tracking method defines a formant tracking space including a formant set to be tracked. Formants in the first frame of a voice signal are identified using the entire formant tracking space. Formants in the second frame of the voice signal are identified using the entire formant tracking space. The formants are identified by mapping the formant set into a feature vector and applying the feature vector to a model(306). © KIPO 2005 ...

Подробнее
16-04-2019 дата публикации

Block-based principal component analysis transformation method and device thereof

Номер: TW0201916626A
Принадлежит:

The present invention provides a block-based principal component analysis transformation method and a device thereof. The principal component analysis transformation method includes: obtaining an input signal; dividing the input signal and obtaining a plurality of one-dimension vectors corresponding to the divided input signal, wherein a number of the one-dimension vectors is a division number; after arranging the one-dimension vectors to a two-dimension vector, subtracting an average value of the one-dimension vectors of the division number to obtain a zero-mean vector; calculating a covariance matrix of the zero-mean vector; calculating an eigenvector of the covariance matrix; multiplying the eigenvector by the covariance matrix to obtain a projection coefficient.

Подробнее
31-01-2019 дата публикации

Automatic speech recognition

Номер: IL0000263655D0
Автор: OMRY NETZER, Omry NETZER
Принадлежит:

Подробнее
26-01-2012 дата публикации

Speech recognition circuit and method

Номер: US20120022862A1
Принадлежит: Individual

A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, the memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying the scores by adding the score updates to the scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to the scores.

Подробнее
14-06-2012 дата публикации

Method and system for reconstructing speech from an input signal comprising whispers

Номер: US20120150544A1
Принадлежит: NANYANG TECHNOLOGICAL UNIVERSITY

A system for reconstructing speech from an input signal comprising whispers is disclosed. The system comprises an analysis unit configured to analyse the input signal to form a representation of the input signal; an enhancement unit configured to modify the representation of the input signal to adjust a spectrum of the input signal, wherein the adjusting of the spectrum of the input signal comprises modifying a bandwidth of at least one formant in the spectrum to achieve a predetermined spectral energy distribution and amplitude for the at least one formant; and a synthesis unit configured to reconstruct speech from the modified representation of the input signal.

Подробнее
30-08-2012 дата публикации

Network apparatus and methods for user information delivery

Номер: US20120221412A1
Автор: Robert F. Gazdzinski
Принадлежит: Individual

A network apparatus useful for providing directions and other information to a user of a client device in wireless communication therewith. In one embodiment, the apparatus includes one or more wireless interfaces and a network interface for communication with a server. User speech inputs in the form of digitized representations are received by the apparatus and used by the server as the basis for retrieving information including graphical representations of location or entities that the user wishes to find.

Подробнее
29-11-2012 дата публикации

Methods and apparatus for correcting recognition errors

Номер: US20120304057A1
Принадлежит: Nuance Communications Inc

Techniques for error correction using a history list comprising at least one misrecognition and correction information associated with each of the at least one misrecognitions indicating how a user corrected the associated misrecognition. The techniques include converting data input from a user to generate a text segment, determining whether at least a portion of the text segment appears in the history list as one of the at least one misrecognitions, if the at least a portion of the text segment appears in the history list as one of the at least one misrecognitions, obtaining the correction information associated with the at least one misrecognition, and correcting the at least a portion of the text segment based, at least in part, on the correction information.

Подробнее
23-05-2013 дата публикации

Voice Data Retrieval System and Program Product Therefor

Номер: US20130132090A1
Автор: KANDA Naoyuki
Принадлежит: Hitachi, Ltd.

A voice data retrieval system including an inputting device of inputting a keyword, a phoneme converting unit of converting the inputted keyword in a phoneme expression, a voice data retrieving unit of retrieving a portion of a voice data at which the keyword is spoken based on the keyword in the phoneme expression, a comparison keyword creating unit of creating a set of comparison keywords having a possibility of a confusion of a user in listening to the keyword based on a phoneme confusion matrix for each user, and a retrieval result presenting unit of presenting a retrieval result from the voice data retrieving unit and the comparison keyword from the comparison keyword creating unit to a user. 1. A voice data retrieval system comprising:an inputting device of inputting a keyword;a phoneme converting unit of converting the inputted keyword in a phoneme expression;a voice data retrieving unit of retrieving a portion of a voice data at which the keyword is spoken based on the keyword in the phoneme expression;a comparison keyword creating unit of creating a set of comparison keywords separately from the keyword having a possibility of a confusion of a user in listening to the keyword based on the keyword in the phoneme expression; anda retrieval result presenting unit of presenting a retrieval result from the voice data retrieving unit and the comparison keyword from the comparison keyword creating unit to the user.2. The voice data retrieval system according to claim 1 , further comprising:a phoneme confusion matrix for each user;wherein the comparison keyword creating unit creates the comparison keyword based on the phoneme confusion matrix.3. The voice data retrieval system according to claim 2 , further comprising:a language information inputting unit of inputting a piece of information of a language which the user can understand; anda phoneme confusion matrix creating unit of creating the phoneme confusion matrix based on the piece of information provided from ...

Подробнее
30-05-2013 дата публикации

Speech recognition apparatus based on cepstrum feature vector and method thereof

Номер: US20130138437A1

A speech recognition apparatus, includes a reliability estimating unit configured to estimate reliability of a time-frequency segment from an input voice signal; and a reliability reflecting unit configured to reflect the reliability of the time-frequency segment to a normalized cepstrum feature vector extracted from the input speech signal and a cepstrum average vector included for each state of an HMM in decoding. Further, the speech recognition apparatus includes a cepstrum transforming unit configured to transform the cepstrum feature vector and the average vector through a discrete cosine transformation matrix and calculate a transformed cepstrum vector. Furthermore, the speech recognition apparatus includes an output probability calculating unit configured to calculate an output probability value of time-frequency segments of the input speech signal by applying the transformed cepstrum vector to the cepstrum feature vector and the average vector.

Подробнее
27-06-2013 дата публикации

Frame Erasure Concealment Technique for a Bitstream-Based Feature Extractor

Номер: US20130166294A1
Принадлежит: AT&T Intellectual Property II LP

A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Подробнее
25-07-2013 дата публикации

VOICE PROCESSING APPARATUS, METHOD AND PROGRAM

Номер: US20130191124A1
Принадлежит: SONY CORPORATION

Provided is a voice processing apparatus including a feature quantity calculation section extracting a feature quantity from a target frame of an input voice signal, a sound pressure estimation candidate point updating section making each frame of the input voice signal a sound pressure estimation candidate point, retaining the feature quantity of each sound pressure estimation candidate point, and updating the sound pressure estimation candidate point based on the feature quantity of the sound pressure estimation candidate point and the feature quantity of the target frame, a sound pressure estimation section calculating an estimated sound pressure of the input voice signal, based on the feature quantity of the sound pressure estimation candidate point, a gain calculation section calculating a gain applied to the input voice signal based on the estimated sound pressure, and a gain application section performing a gain adjustment of the input voice signal based on the gain. 1. A voice processing apparatus , comprising:a feature quantity calculation section which extracts a feature quantity from a target frame of an input voice signal;a sound pressure estimation candidate point updating section which makes each of a plurality of frames of the input voice signal a sound pressure estimation candidate point, retains the feature quantity of each sound pressure estimation candidate point, and updates the sound pressure estimation candidate point based on the feature quantity of the sound pressure estimation candidate point and the feature quantity of the target frame;a sound pressure estimation section which calculates an estimated sound pressure of the input voice signal, based on the feature quantity of the sound pressure estimation candidate point;a gain calculation section which calculates a gain applied to the input voice signal based on the estimated sound pressure; anda gain application section which performs a gain adjustment of the input voice signal based on the ...

Подробнее
25-07-2013 дата публикации

Computerized information and display apparatus

Номер: US20130191750A1
Автор: Robert F. Gazdzinski
Принадлежит: West View Research LLC

Apparatus useful for obtaining and displaying information. In one embodiment, the apparatus includes a network interface, display device, and speech recognition apparatus configured to receive user speech input and enable performance of various tasks via a remote entity, such as obtaining desired information relating to directions, sports, finance, weather, or any number of other topics. The downloaded may also, in one variant, be transmitted to a personal user device, such as via a data interface.

Подробнее
15-08-2013 дата публикации

Speech recognition circuit and method

Номер: US20130211835A1
Принадлежит: Zentian Ltd

A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to said scores.

Подробнее
29-08-2013 дата публикации

Method of providing information and mobile terminal thereof

Номер: US20130227471A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

A method of providing information, performed by a mobile terminal, is provided. The method includes sensing a user's gesture requesting a panel, and causing the panel to appear from at least one from among a left side, a right side, and a bottom side of a screen and then displaying the panel on the screen, according to the user's gesture requesting the panel. The panel includes at least one from among a share panel including a list of identification information of one or more external devices, a capture panel including a list of capture data, and a recommending panel including recommend contents.

Подробнее
07-11-2013 дата публикации

SYSTEM AND METHOD FOR CLASSIFICATION OF EMOTION IN HUMAN SPEECH

Номер: US20130297297A1
Автор: GUVEN Erhan
Принадлежит:

A system performs local feature extraction. The system includes a processing device that performs a Short Time Fourier Transform to obtain a spectrogram for a discrete-time speech signal sample. The spectrogram is subdivided based on natural divisions of frequency to humans. Time-frequency-energy is then quantized using information obtained from the spectrogram. And, feature vectors are determined based on the quantized time-frequency-energy information. 1. A method for performing local feature extraction comprising using a processing device to perform the steps of:performing a Short Time Fourier Transform to obtain a spectrogram for a discrete-time speech signal sample;subdividing the spectrogram based on natural divisions of frequency to humans;quantizing time-frequency-energy information obtained from the spectrogram;computing feature vectors based on the quantized time-frequency-energy information; andclassifying an emotion of the speech signal sample based on the computed feature vectors.2. The method according to claim 1 , wherein the step of subdividing the spectrogram comprises subdividing the spectrogram based on the Bark scale.3. The method according to further comprising the step of employing majority voting on the feature vectors to predict an emotion associated with the speech signal sample.4. The method according to further comprising the step of employing weighted-majority voting on the feature vectors to predict an emotion associated with the speech signal sample.5. The method according to claim 1 , wherein the time and the frequency information of a speech signal is transformed into a short time Fourier series and quantized by the regressed surfaces of the spectrogram.6. The method according to claim 1 , further comprising storing both the time and the frequency information together.7. A system for performing local feature extraction comprising using a processing device to perform the steps of:a processor configured to perform a Short Time Fourier ...

Подробнее
07-11-2013 дата публикации

Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech and Speaker Recognition

Номер: US20130297299A1

The speech feature extraction algorithm is based on a hierarchical combination of auditory similarity and pooling functions. Computationally efficient features referred to as “Sparse Auditory Reproducing Kernel” (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique (“MAX” operation). Different hyper-parameters and kernel functions may be used to enhance the performance of a SPARK based speech recognizer. 1. A method of processing time domain speech signal digitally represented as a vector of a first dimension , comprising:storing the time domain speech signal in the memory of said processor;representing a set of gammatone basis functions as a set of gammatone basis vectors of said first dimension and storing said gammatone basis vectors in the memory of a processor;using the processor to apply a reproducing kernel function to transform the stored gammatone basis vectors and the stored speech signal to a higher dimensional space;using the processor to compute a set of similarity vectors in said higher dimensional space based on the stored gammatone basis vectors and the stored speech signal;using the processor to apply an inverse function to transform the set of similarity vectors in said higher dimensional space to a set of similarity vectors of the first dimension; andusing the processor to select one of said set of similarity vectors of the first dimension as a processed representation of said speech signal.2. The method of wherein the transformation from higher dimensional space to the first dimension effects a nonlinear transformation.3. The method of wherein the step ...

Подробнее
14-11-2013 дата публикации

INFORMATION PROCESSING METHOD AND APPARATUS, COMPUTER PROGRAM AND RECORDING MEDIUM

Номер: US20130304469A1
Принадлежит:

Among multiple documents presented to a user, a high interest and a low interest document are specified, a word group in the high interest document is compared with a word group in the low interest document, and a string of word groups associated weight values is generated as a user feature vector. A word group included in each of multiple data items targeted for assigning priorities is extracted, and data feature vectors are generated specific to each data item, based on the word groups extracted. A degree of similarity between each data feature vectors of multiple data items and user feature vector is obtained, and according to the degree of similarity, priorities are assigned to the multiple data items to be presented to the user. Therefore, it is possible to extract user's feature information on which the user's interests and tastes are reflected more effectively. 1. An information processing method in an information processing apparatus , comprising the steps of:generating a user feature vector specific to a user;extracting a word group included in each of multiple data items targeted for assigning priorities and generating a data feature vector specific to each data item, based on the word group extracted;obtaining a degree of similarity between each of the data feature vectors of the multiple data items and the user feature vector; andassigning priorities to the multiple data items to be presented to the user, according to the degree of similarity obtained;the step of generating the user feature vector including a step of specifying a document of high interest in which a user expresses interest and a document of low interest in which the user expresses no interest, according to the user's operation among multiple documents presented to the user, a word group included in the document of high interest and a word group included in the document of low interest being compared with each other, a weight value of a word included commonly in both documents being set ...

Подробнее
06-02-2014 дата публикации

System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain

Номер: US20140037095A1
Принадлежит: Intellisis Corp

A system and method may be configured to process an audio signal. The system and method may track pitch, chirp rate, and/or harmonic envelope across the audio signal, may reconstruct sound represented in the audio signal, and/or may segment or classify the audio signal. A transform may be performed on the audio signal to place the audio signal in a frequency chirp domain that enhances the sound parameter tracking, reconstruction, and/or classification.

Подробнее
27-02-2014 дата публикации

System and Method for Targeted Advertising

Номер: US20140058846A1
Принадлежит: AT&T Intellectual Property I, L.P.

Disclosed is a method of receiving an audio stream containing user speech from a first device, generating text based on the user speech, identifying a key phrase in the text, receiving from an advertiser an advertisement related to the identified key phrase, and displaying the advertisement. The method can include receiving from an advertiser a set of rules associated with the advertisement and displaying the advertisement in accordance with the associated set of rules. The method can display the advertisement on one or both of a first device and a second device. A central server can generate text based on the speech. A key phrase in the text can be identified based on a confidence score threshold. The advertisement can be displayed after the audio stream terminates. 1. A method comprising:receiving speech of a user;generating, via a processor, text based on the speech;identifying a key phrase in the text;receiving, via the processor, data from a user profile comprising one of information describing the user, usage habits of the user, previously recognized key phrases of the user, and demographic information of the user; andreceiving an advertisement related to the key phrase and the data.2. The method of claim 1 , further comprising displaying the advertisement3. The method of claim 2 , wherein displaying the advertisement occurs a period of time after receiving the speech.4. The method of claim 3 , wherein receiving the speech occurs during a call.5. The method of claim 4 , wherein displaying the advertisement occurs a period of time after one of an ending of the call and after identifying the key phrase.6. The method of claim 1 , wherein the speech is received from a first device and wherein displaying the advertisement occurs on a second device different from the first device.7. The method of claim 1 , further comprising:receiving from an advertiser a set of rules associated with the advertisement; anddisplaying the advertisement in accordance with the set of ...

Подробнее
13-03-2014 дата публикации

Method and System for Building a Phonotactic Model for Domain Independent Speech Recognition

Номер: US20140074476A1
Автор: Giuseppe Riccardi
Принадлежит: AT&T Intellectual Property II LP

The invention concerns a method and corresponding system for building a phonotactic mode for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting die detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system daring the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.

Подробнее
13-03-2014 дата публикации

System and Method of Spoken Language Understanding in Human Computer Dialogs

Номер: US20140074477A1
Принадлежит: AT&T INTELLECTUAL PROPERTY II, L.P.

A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify. 1. A method comprising:identifying, in a domain-independent manner, a dialog act for an independent clause of a speech recognizer output;identifying, in a domain-dependent manner, an object within the independent clause; andrecursively generating, via a processor, for each sub-independent clause within the independent clause, a semantic representation using the dialog act and the object of the independent clause.2. The method of claim 1 , wherein the semantic representation is used by a dialog manager in a spoken dialog system to determine a response to a user input.3. The method of claim 1 , further comprising:identifying, in the domain-dependent manner, an action within each of the independent clause, wherein recursively generating the semantic representation further comprises using the action.4. The method of claim 1 , wherein while recursively generating the semantic representation claim 1 , additional objects are extracted from the independent clause.5. The method of claim 1 , wherein identifying the object comprises using a domain specific ...

Подробнее
27-03-2014 дата публикации

VOICE RECOGNITION DEVICE AND METHOD, AND SEMICONDUCTOR INTEGRATED CIRCUIT DEVICE

Номер: US20140088960A1
Автор: NONAKA Tsutomu
Принадлежит: SEIKO EPSON CORPORATION

A semiconductor integrated circuit device for voice recognition includes: a signal processing unit which generates a feature pattern representing a state of distribution of frequency components of an input voice signal; a voice recognition database storage unit which stores a voice recognition database including a standard pattern representing a state of distribution of frequency components of plural phonemes; a conversion list storage unit which stores a conversion list including plural words or sentences to be conversion candidates; a standard pattern extraction unit which extracts a standard pattern corresponding to character data representing the first syllable of each word or sentence included in the conversion list, from the voice recognition database; and a matching detection unit which compares the feature pattern generated from the first syllable of the voice signal with the extracted standard pattern and thus detects the matching of the syllable. 1. A semiconductor integrated circuit device comprising:a signal processing unit which performs Fourier transform on an inputted voice signal, thus extracts frequency components of the voice signal, and generates a feature pattern representing a state of distribution of the frequency components of the voice signal;a voice recognition database storage unit which stores a voice recognition database including a standard pattern representing a state of distribution of frequency components of plural phonemes used in a predetermined language;a conversion list storage unit which stores a conversion list expressed by character data and including plural words or sentences to be conversion candidates;a standard pattern extraction unit which extracts the standard pattern corresponding to the character data representing the first syllable of each word or sentence included in the conversion list, from the voice recognition database; anda matching detection unit which compares the feature pattern generated from the first ...

Подробнее
27-03-2014 дата публикации

APPARATUS AND METHOD FOR SPEECH RECOGNITION

Номер: US20140088967A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

Apparatus for speech recognition includes a recognition unit configured to recognize a speech signal and to generate a first recognition result, a transmitting unit that transmits at least one of the speech signal and a recognition feature to a server, a receiving unit that receives a second recognition result from the server, a result generating unit configured to generate a third recognition result, a result storage unit that stores the third recognition result and a dictionary update unit configured to update the client recognition dictionary. 1. An apparatus for speech recognition , comprising:a recognition unit configured to recognize a speech signal by utilizing a client recognition dictionary and to generate a first recognition result, the client recognition dictionary including vocabularies recognizable in the recognition unit;a transmitting unit configured to transmit at least one of the speech signal and a recognition feature extracted from the speech signal to a server before the first recognition result is generated by the recognition unit;a receiving unit configured to receive a second recognition result from the server, the second recognition result being generated by the server;a result generating unit configured to generate a third recognition result, the third recognition result being generated by utilizing the first recognition result when the first recognition result is generated before receiving the second recognition result, otherwise by at least utilizing the second recognition result;a result storage unit configured to store the third recognition result; anda dictionary update unit configured to update the client recognition dictionary by utilizing a history of the third recognition result;the dictionary update unit further configured to update the client recognition dictionary so that the client recognition dictionary includes a first vocabulary prior to a second vocabulary in the case that the history of the third recognition result includes ...

Подробнее
10-04-2014 дата публикации

METHOD FOR CUSTOMER FEEDBACK MEASUREMENT IN PUBLIC PLACES UTILIZING SPEECH RECOGNITION TECHNOLOGY

Номер: US20140100851A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

A method, a system and a computer program product for enabling a customer response speech recognition unit to dynamically receive customer feedback. The customer response speech recognition unit is positioned at a customer location. The speech recognition unit is automatically initialized when one or more spoken words are detected. The response statements of customers are dynamically received by the customer response speech recognition unit at the customer location, in real time. The customer response speech recognition unit determines when the one or more spoken words of the customer response statement are associated with a score in a database. An analysis of the words is performed to generate a score that reflects the evaluation of the subject by the customer. The score is dynamically updated as new evaluations are received, and the score is displayed within graphical user interface (GUI) to be viewed by one or more potential customers. 1. In a data processing device having a processor and a voice capture and recognition unit coupled to the processor , a processor-implemented method for enabling dynamic receipt and analysis of captured audio responses utilizing a voice recognition unit , said method comprising:dynamically detecting an audio input that includes speech and speech related sounds;identifying one or more received keywords within the audio input detected;in response to identifying the one or more received keywords, comparing the one or more received keywords to one or more pre-identified keywords in a pre-established database; andwhen the one or more received keywords match or are associated with one or more of the pre-identified keywords within the pre-established database, generating a score for the one or more received keywords, based on a relative score associated with the pre-identified keywords that match the one or more received keywords, wherein the score represents one or more of a positive evaluation, a negative evaluation, and a neutral ...

Подробнее
06-01-2022 дата публикации

SYSTEM AND METHOD OF GENERATING FACIAL EXPRESSION OF A USER FOR VIRTUAL ENVIRONMENT

Номер: US20220005246A1
Принадлежит:

The present invention relates to a method of generating a facial expression of a user for a virtual environment. The method comprises obtaining a video and an associated speech of the user. Further, extracting in real-time at least one of one or more voice features and one or more text features based on the speech. Furthermore, identifying one or more phonemes in the speech. Thereafter, determining one or more facial features relating to the speech of the user using a pre-trained second learning model based on the one or more voice features, the one or more phonemes, the video and one or more previously generated facial features of the user. Finally, generating the facial expression of the user corresponding to the speech for an avatar representing the user in the virtual environment. 1. A method of generating a facial expression of a user for a virtual environment , the method comprises:obtaining, by a computing system, a video and an associated speech of the user;extracting in real-time, by the computing system, at least one of one or more voice features and one or more text features based on the speech of the user;identifying in real-time, by the computing system, one or more phonemes in the speech using a pre-trained first learning model based on at least one of the one or more voice features and the one or more text features;determining in real-time, by the computing system, one or more facial features relating to the speech of the user using a pre-trained second learning model based on the one or more voice features, the one or more phonemes, the video and one or more previously generated facial features of the user; andgenerating in real-time, by the computing system, the facial expression of the user corresponding to the speech for an avatar representing the user in the virtual environment based on the one or more facial features.2. The method as claimed in claim 1 , wherein obtaining the video and the associated speech comprises one of:receiving the video ...

Подробнее
06-01-2022 дата публикации

MULTI-LOOK ENHANCEMENT MODELING AND APPLICATION FOR KEYWORD SPOTTING

Номер: US20220005468A1
Автор: Yu Dong, Yu Meng
Принадлежит: Tencent America LLC

A method, computer system, and computer readable medium are provided for activating speech recognition based on keyword spotting (KWS). Waveform data corresponding to one or more speakers is received. One or more direction features are extracted from the received waveform data. One or more keywords are determined from the received waveform data based on the one or more extracted features. Speech recognition is activated based on detecting the determined keyword. 1. A method of activating speech recognition , executable by a processor , comprising:receiving waveform data corresponding to one or more speakers;extracting one or more direction features from the received waveform data;determining one or more keywords from the received waveform data based on the one or more extracted features; andactivating speech recognition based on detecting the determined keywords.2. The method of claim 1 , wherein the received waveform data comprises one or more multi-channel input waveforms and one or more preset look directions.3. The method of claim 2 , further comprising mapping the multi-channel input waveforms to one or more spectrograms by a 1-D convolution layer.4. The method of claim 3 , further comprising extracting a single-channel spectral feature claim 3 , logarithm power spectrum data claim 3 , and one or more multi-channel spatial features based on the spectrograms.5. The method of claim 2 , wherein each extracted direction feature corresponds to one of the one or more preset look directions.6. The method of claim 5 , wherein a set of directions in the horizontal plane is sampled based on the direction features.7. The method of claim 6 , wherein the one or more direction feature vectors are derived from the set of directions in the horizontal plane.8. The method of claim 7 , wherein the directional feature computes the averaged cosine distance between a look direction steering vector between one or more pairs of microphones based on a phase of the look direction ...

Подробнее
05-01-2017 дата публикации

Speech recognition apparatus, speech recognition method, and electronic device

Номер: US20170004824A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

A speech recognition apparatus includes a probability calculator configured to calculate phoneme probabilities of an audio signal using an acoustic model; a candidate set extractor configured to extract a candidate set from a recognition target list; and a result returner configured to return a recognition result of the audio signal based on the calculated phoneme probabilities and the extracted candidate set.

Подробнее
05-01-2017 дата публикации

SYSTEM AND METHOD FOR DATA-DRIVEN SOCIALLY CUSTOMIZED MODELS FOR LANGUAGE GENERATION

Номер: US20170004825A1
Принадлежит:

Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user's social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user. 1. A method comprising:identifying, via a processor configured to perform speech analysis, an identity of a user based on characteristics of received speech during a dialog between the user and a dialog system, to yield a user identification;generating a personalized natural language generation model based on a stylistic analysis on a literary narrative and the user identification; andapplying the personalized natural language generation model while performing, as part of the dialog, one of automatic speech recognition or natural language generation.2. The method of claim 1 , wherein the stylistic analysis identifies connections between two or more of a personality independent quotation lattice claim 1 , personality independent attributes claim 1 , personality dependent attributes claim 1 , and speakers within the literary narrative.3. The method of claim 2 , wherein stylistic analysis further comprises:identifying the speakers in the literary narrative, to yield identified speakers;attributing quoted utterances in the literary narrative to the identified speakers, to yield a quotation lattice;identifying the personality independent attributes and the personality dependent attributes of the quoted utterances within the quotation lattice; andorganizing the quotation lattice based on the personality ...

Подробнее
05-01-2017 дата публикации

Voice Activity Detection Method and Method Used for Voice Activity Detection and Apparatus Thereof

Номер: US20170004840A1
Принадлежит:

The present document relates to a voice activity detection (VAD) method and methods used for voice activity detection and apparatus thereof, the VAD method includes: obtaining sub-band signals and spectrum amplitudes of a current frame; computing values of a energy feature and a spectral centroid feature of the current frame according to the sub-band signals; computing a signal to noise ratio parameter of the current frame according to a background noise energy estimated from a previous frame, an energy of SNR sub-bands and a energy feature of the current frame; computing a VAD decision result according to a tonality signal flag, a signal to noise ratio parameter, a spectral centroid feature, and a frame energy feature. The methods and apparatus of the present document can improve the accuracy of non-stationary noise (such as office noise) and music detection. 1. A voice activity detection (VAD) method , wherein , the method comprises:obtaining sub-band signals and spectrum amplitudes of a current frame;computing values of a energy feature, a spectral centroid feature and a time-domain stability feature of the current frame by using the sub-band signals; computing values of a spectral flatness feature and a tonality feature according to the spectrum amplitudes;computing a signal to noise ratio (SNR) parameter of the current frame with a background energy estimated from a previous frame, the energy feature and the energy of SNR sub-bands of the current frame;computing a tonality signal flag of the current frame with the energy feature, the spectral centroid feature, the time-domain stability feature, the spectral flatness feature and the tonality feature of the current frame;computing a VAD decision result with the tonality signal flag, the signal to noise ratio parameter, the spectral centroid feature, and the energy feature.2. The method of claim 1 , wherein claim 1 , before or after obtaining a VAD decision result claim 1 , the method further comprises:computing ...

Подробнее
05-01-2017 дата публикации

SYSTEMS AND METHODS FOR SOURCE SIGNAL SEPARATION

Номер: US20170004844A1
Принадлежит:

A method includes receiving an input signal comprising an original domain signal and creating a first window data set and a second window data set from the signal, wherein an initiation of the second window data set is offset from an initiation of the first window data set, converting the first window data set and the second window data set to a frequency domain and storing the resulting data as data in a second domain different from the original domain, performing complex spectral phase evolution (CSPE) on the second domain data to estimate component frequencies of the first and second window data sets, using the component frequencies estimated in the CSPE, sampling a set of second-domain high resolution windows to select a mathematical representation comprising a second-domain high resolution window that fits at least one of the amplitude, phase, amplitude modulation and frequency modulation of a component of an underlying signal wherein the component comprises at least one oscillator peak, generating an output signal from the mathematical representation of the original signal as at least one of: an audio file; one or more audio signal components; and one or more speech vectors and outputting the output signal to an external system. 1. A method comprising:receiving a biomedical signal; and computing a plurality of oscillator peaks for the biomedical signal;', 'tracking one or more of the plurality of oscillator peaks to form one or more tracklets; and', 'grouping the one or more tracklets based, at least in part, on at least one characteristic harmonic of the one or more tracklets to create a first fingerprint., 'creating a fingerprint of the biomedical signal wherein said creating comprises the steps of2. The method of claim 1 , further comprising comparing the first fingerprint to a previously computed second fingerprint.3. The method of wherein the comparing further comprises comparing the first fingerprint to the previously computed second fingerprint to ...

Подробнее
05-01-2017 дата публикации

INFORMATION PROCESSING DEVICE AND IMAGE FORMING APPARATUS

Номер: US20170004847A1
Автор: SHIOMI Ryo
Принадлежит: KYOCERA Document Solutions Inc.

An information processing device includes a voice recorder, a retrieval section, and an analysis section. The information processing device utilizes a meeting report on a meeting. The voice recorder records utterances during the meeting. The retrieval section retrieves an utterance of a term entered in the meeting report from among the utterances recorded on the voice recorder. The analysis section analyzes a content of the meeting based on the utterance of the term. 1. An information processing device that utilizes a meeting report on a meeting , comprising:a voice recorder configured to record utterances during the meeting;a retrieval section configured to retrieve an utterance of a term entered in the meeting report from among the utterances recorded on the voice recorder; andan analysis section configured to analyze a content of the meeting based on the utterance of the term.2. The information processing device according to claim 1 , whereinthe analysis section analyzes the content of the meeting based on either or both of a number of utterances of the term and a total time length of the utterance of the term.3. The information processing device according to claim 1 , whereinthe analysis section analyzes the content of the meeting through comparison between a number of utterances of the term in a first time zone of the meeting and a number of utterances of the term in a second time zone of the meeting.4. The information processing device according to claim 1 , whereinthe analysis section analyzes the content of the meeting through comparison between a duration length of the meeting and the total time length of the utterance of the term.5. The information processing device according to claim 1 , further comprising:an image scanning section configured to scan an image of either a note on a whiteboard used in the meeting or a meeting memorandum, whereinthe retrieval section retrieves the utterance of the term contained in the image from among the utterances ...

Подробнее
01-01-2015 дата публикации

Systems and methods for feature extraction

Номер: US20150006164A1
Автор: Dipanjan Sen, Wenliang Lu
Принадлежит: Qualcomm Inc

A method for feature extraction by an electronic device is described. The method includes processing speech using a physiological cochlear model. The method also includes analyzing sections of an output of the physiological cochlear model. The method further includes extracting a place-based analysis vector and a time-based analysis vector for each section. The method additionally includes determining one or more features from each analysis vector.

Подробнее
01-01-2015 дата публикации

Information processing device, information processing method and program

Номер: US20150006174A1
Принадлежит: Sony Corp

An information processing system that reads a current playback time of content reproduced by an output device; controls a display to display subtitle information corresponding to the content reproduced by the output device; acquires feature information corresponding to an attribute of the content based on the read current playback time of the content; and controls the display to apply a predetermined effect corresponding to the displayed subtitle information based on the acquired feature information.

Подробнее
07-01-2021 дата публикации

RESPONSE SENTENCE GENERATION DEVICE, RESPONSE SENTENCE GENERATION METHOD, AND PROGRAM

Номер: US20210004543A1

To make it possible to generate a response sentence with respect to an input speech sentence without preparing a large amount of data. 18.-. (canceled)9. A computer-implemented method for generating a response to a speech input , the method comprising:identifying a morpheme of one or more clauses of a speech input;determining, based on the morpheme of the one or more clauses of the speech input, a speech type, wherein the speech type specifies a type of the speech input;determining, based on the determined speech type, a response type and a type conversion rule, wherein the type conversion rule is one of a set of type conversion rules, wherein the set of type conversion rules prescribes, for each of a plurality of speech types, a rule for responding to each speech type using one of a plurality of response types;automatically generating, based on the determined response type and a plurality of response sentences in a response sentence database, a response sentence, wherein the response sentence database stores the plurality of response sentences according to one of the plurality of response types; andproviding the response sentence as a response to the speech input.10. The computer-implemented method of claim 9 , wherein the speech type indicates claim 9 , based on an emotion and a subjective evaluation related to the speech input claim 9 , one of:an awakening act,a manifesting act,an evoking act,an intent, ora decision, andwherein the type conversion rule includes, for each of the plurality of speech types, one of response types, wherein the one of the response types indicates a type of responding to the speech input based on the one of the awakening act, the manifesting act, the evoking act, the intent, or the decision based on the emotion and the subjective evaluation related to the speech input.11. The computer-implemented method of claim 9 , the method further comprising:extracting, based on the analyzed speech input, a predicate type and modality information of ...

Подробнее
02-01-2020 дата публикации

SYSTEM AND METHOD FOR GENERATING DIALOGUE GRAPHS

Номер: US20200004878A1
Принадлежит:

A method, computer program product, and computing system for automatically generating a dialogue graph is executed on a computing device and includes receiving a plurality of conversation data. A plurality of utterance pairs from the plurality of conversation data may be clustered into a plurality of utterance pair clusters. A dialogue graph may be generated with a plurality of nodes representative of the plurality of utterance pair clusters. 1. A computer-implemented method for automatically generating a dialogue graph , executed on a computing device , comprising:receiving, at the computing device, a plurality of conversation data;clustering a plurality of utterance pairs from the plurality of conversation data into a plurality of utterance pair clusters; andgenerating a dialogue graph with a plurality of nodes representative of the plurality of utterance pair clusters.2. The computer-implemented method of claim 1 , wherein receiving the plurality of conversation data includes one or more of:receiving a plurality of chat transcripts; andconverting one or more audio recordings of one or more conversations into one or more text-based representations of the one or more conversations.3. The computer-implemented method of claim 1 , wherein clustering the plurality of utterance pairs includes:clustering the plurality of conversation data into a plurality of topic clusters.4. The computer-implemented method of claim 3 , wherein clustering the plurality of conversational data into a plurality of topic clusters includes:generating a plurality of feature vectors representative of the plurality of conversation data; andcomparing the plurality of feature vectors representative of the plurality of conversation data.5. The computer-implemented method of claim 3 , wherein clustering the plurality of utterance pairs includes:for at least one topic cluster of the plurality of topic clusters, generating a plurality of feature vectors representative of the plurality of utterance ...

Подробнее
13-01-2022 дата публикации

MULTI-USER INTELLIGENT ASSISTANCE

Номер: US20220012470A1
Принадлежит: Microsoft Technology Licensing, LLC

An intelligent assistant records speech spoken by a first user and determines a self-selection score for the first user. The intelligent assistant sends the self-selection score to another intelligent assistant, and receives a remote-selection score for the first user from the other intelligent assistant. The intelligent assistant compares the self-selection score to the remote-selection score. If the self-selection score is greater than the remote-selection score, the intelligent assistant responds to the first user and blocks subsequent responses to all other users until a disengagement metric of the first user exceeds a blocking threshold. If the self-selection score is less than the remote-selection score, the intelligent assistant does not respond to the first user. 1. An intelligent assistant computer , comprising:a logic machine; anda storage machine holding instructions executable by the logic machine to:recognize another intelligent assistant computer;record speech spoken by a first user;determine a self-selection score for the first user based on the speech spoken by the first user;receive a remote-selection score for the first user from the other intelligent assistant computer;if the self-selection score is greater than the remote-selection score, respond to the first user, determine a disengagement metric of the first user based on recorded speech spoken by the first user, and block subsequent responses to all other users until the disengagement metric of the first user exceeds a blocking threshold;if the self-selection score is less than the remote-selection score, do not respond to the first user; andstop blocking subsequent responses to another user responsive to a new self-selection score for the first user being less than a new remote-selection score for the first user.2. The intelligent assistant computer of claim 1 , wherein the self-selection score is determined based further on a signal-to-noise ratio of recorded speech spoken by the first user. ...

Подробнее
07-01-2016 дата публикации

Speech recognition circuit and method

Номер: US20160005397A1
Принадлежит: Zentian Ltd

A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words. The circuit includes: a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, the memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to said scores.

Подробнее
07-01-2016 дата публикации

METHOD AND SYSTEM FOR EFFICIENT SPOKEN TERM DETECTION USING CONFUSION NETWORKS

Номер: US20160005398A1
Принадлежит:

Systems and methods for spoken term detection are provided. A method for spoken term detection, comprises receiving phone level out-of-vocabulary (OOV) keyword queries, converting the phone level OOV keyword queries to words, generating a confusion network (CN) based keyword searching (KWS) index, and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries. 1. A method for spoken term detection , comprising:receiving phone level out-of-vocabulary (OOV) keyword queries;converting the phone level OOV keyword queries to words;generating a confusion network (CN) based keyword searching (KWS) index; andusing the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries, wherein the receiving, converting, generating and using steps are performed by a computer system comprising a memory and at least one processor coupled to the memory.2. The method according to claim 1 , wherein generating the CN based KWS index comprises constructing the CN based KWS index from a plurality of confusion networks by compiling each confusion network into a weighted finite state transducer having the same topology as the confusion network.3. The method according to claim 2 , wherein each weighted finite state transducer includes input labels that are words on each arc in the corresponding confusion network.4. The method according to claim 2 , wherein each weighted finite state transducer includes output labels that encode a start time (T start) and an end time (T end) of each arc in the corresponding confusion network as T start-T end strings.5. The method according to claim 2 , wherein each weighted finite state transducer includes costs that are negative log CN posteriors for each arc in the confusion network.6. The method according to claim 2 , wherein for each weighted finite state transducer claim 2 , the method further comprises adding a new start node claim 2 , Swith zero-cost epsilon-arcs connecting Sto each ...

Подробнее
07-01-2021 дата публикации

FIXED POINT INTEGER IMPLEMENTATIONS FOR NEURAL NETWORKS

Номер: US20210004686A1
Автор: ROZEN PIOTR, STEMMER Georg
Принадлежит: Intel Corporation

Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include processing a node of the neural network by determining a score for the node as a product of weights and inputs such that the weights are fixed point integer values, applying a correction to the score based on a correction value associated with at least one of the weights, and generating an output from the node based on the corrected score. 1. A system to process a node of a neural network comprising:a memory to store weights associated with the node of the neural network; and determine a score for the node of the neural network based on inputs to the node and the weights associated with the node, wherein the weights comprise a subset of first weights having associated correction values and a subset of second weights not having associated correction values;', 'apply a plurality of corrections to the score responsive to the correction values associated with the subset of first weights, each correction comprising a product of the associated correction value and a corresponding input of the inputs to the node; and', 'generate an output from the node based on the corrected score., 'one or more processors coupled to the memory, the one or more processors to2. The system of claim 1 , wherein the weights comprise fixed point integer values converted from floating point values and the correction values of the subset of first weights are associated with fixed point integer value weights having a non-zero most significant bit.3. The system of claim 1 , wherein the corrected score comprises a 32 bit fixed point integer value.4. The system of claim 1 , wherein the weights have an associated scaling factor claim 1 , the neural network comprises a neural network layer including the node claim 1 , and the scaling factor is a maximum scaling factor value that provides a corrections count for the neural network layer that is less than a predetermined ...

Подробнее
13-01-2022 дата публикации

CONVERSATION SUPPORT SYSTEM, CONVERSATION SUPPORT METHOD AND COMPUTER READABLE STORAGE MEDIUM

Номер: US20220013128A1
Принадлежит:

A conversation support system is provided at an utterance place where utterance is delivered to a plurality of persons. The persons are each an utterer having a possibility of uttering and/or a performer having a possibility of marking. The conversation support system includes a hardware processor and a marking motion catcher. The hardware processor obtains voice data of an utterance made by an utterer and received by a voice receiver, and manages the voice data on a voice timeline. The marking motion catcher catches a marking motion by which a marker is given to the utterance. The hardware processor manages the marking motion on a marking and links the marking motion with the utterance on a same timeline. 1. A conversation support system provided at an utterance place where utterance is delivered to a plurality of persons each being an utterer having a possibility of uttering and or a performer having a possibility of marking , comprising: obtains voice data of an utterance made by an utterer and received by a voice receiver; and', 'manages the voice data on a voice timeline; and, 'a hardware processor thata marking motion catcher that catches a marking motion by which a marker is given to the utterance, wherein manages the marking motion on a marking timeline; and', 'links the marking motion with the utterance on a same timeline., 'the hardware processor2. The conversation support system according to claim 1 , whereina plurality of utterers each having a possibility of uttering are present, andthe conversation support system further comprises, as the voice receiver, a plurality of voice receivers.3. The conversation support system according to claim 1 , wherein the hardware processor identifies the utterer of the utterance.4. The conversation support system according to claim 3 , further comprising claim 3 , as the voice receiver claim 3 , a plurality of voice receivers respectively provided for a plurality of utterers each having a possibility of uttering claim 3 ...

Подробнее
07-01-2021 дата публикации

System and method for automated agent assistance within a cloud-based contact center

Номер: US20210004817A1
Принадлежит: Talkdesk Inc

Methods to reduce agent effort and improve customer experience quality through artificial intelligence. The Agent Assist tool provides contact centers with an innovative tool designed to reduce agent effort, improve quality and reduce costs by minimizing search and data entry tasks The Agent Assist tool is natively built and fully unified within the agent interface while keeping all data internally protected from third-party sharing.

Подробнее
07-01-2021 дата публикации

System and method for automated scheduling using agent assist within a cloud-based contact center

Номер: US20210004824A1
Принадлежит: Talkdesk Inc

Methods to reduce agent effort and improve customer experience quality through artificial intelligence. The Agent Assist tool provides contact centers with an innovative tool designed to reduce agent effort, improve quality and reduce costs by minimizing search and data entry tasks The Agent Assist tool is natively built and fully unified within the agent interface while keeping all data internally protected from third-party sharing.

Подробнее
07-01-2021 дата публикации

SYSTEM AND METHOD FOR TEXT-ENABLED AUTOMATED AGENT ASSISTANCE WITHIN A CLOUD-BASED CONTACT CENTER

Номер: US20210005192A1
Принадлежит:

Methods to reduce agent effort and improve customer experience quality through artificial intelligence. The Agent Assist tool provides contact centers with an innovative tool designed to reduce agent effort, improve quality and reduce costs by minimizing search and data entry tasks The Agent Assist tool is natively built and fully unified within the agent interface while keeping all data internally protected from third-party sharing. 1. A method , comprising:executing an automation infrastructure within a cloud-based contact center that includes a communication manager, speech-to-text converter, a natural language processor, and an inference processor exposed by application programming interfaces; andexecuting an agent assist functionality within the automation infrastructure that performs operations comprising:receiving a text communication from a customer;performing inference processing on the text to determine a customer intent;automatically analyzing the text to determine a subject of the text communication and key terms associated with the subject;automatically parsing a knowledgebase using the key terms for at least one responsive answer associated with the subject; andproviding the solution to an agent in a unified interface during the communication with the customer.21. The method of , further comprising:querying a customer relationship management (CRM) platform/a customer service management (CSM) platform using the key terms; anddisplaying responsive results from the CRM/CSM in the second field in the unified interface.3. The method of claim 1 , further comprising:querying a database of customer-agent transcripts using the key terms; anddisplaying responsive results from the database of customer-agent transcripts in the second field in the unified interface.4. The method of claim 1 , wherein the method is performed in real-time as the customer communication progresses with the agent.5. The method of claim 1 , further comprising concurrently displaying to ...

Подробнее
07-01-2021 дата публикации

RECORDING MEDIUM RECORDING PROGRAM, INFORMATION PROCESSING APPARATUS, AND INFORMATION PROCESSING METHOD FOR TRANSCRIPTION

Номер: US20210005204A1
Автор: Sankoda Satoru
Принадлежит: FUJITSU LIMITED

A method for transcription is performed by a computer. The method includes: accepting input of a voice after causing a display unit to display a sentence including a plurality of words; acquiring first sound information being information concerning sounds corresponding to the sentence; acquiring second sound information being information concerning sounds of the voice accepted in the accepting; specifying a portion in the first sound information having a prescribed similarity to the second sound information; and correcting a character string in the sentence corresponding to the specified portion based on a character string corresponding to the second sound information. 1. A non-transitory computer-readable recording medium having stored therein a program for causing a computer to execute processing comprising:accepting input of sounds after causing a display to display a sentence including a plurality of words;acquiring first sound information being information concerning sounds corresponding to the sentence;acquiring second sound information concerning the accepted sounds;specifying a portion in the first sound information having a prescribed similarity to the second sound information; andcorrecting a character string in the sentence corresponding to the specified portion based on a character string corresponding to the second sound information.2. The recording medium according to claim 1 , whereinthe portion is specified in the specifying based on a similarity of a phoneme sequence included in the first sound information to a phoneme sequence included in the second sound information.3. The recording medium according to claim 1 , whereinthe portion is specified in the specifying based on a similarity of a waveform of a voice included in the first sound information to a waveform of a voice included in the second sound information.4. The recording medium according to claim 1 , whereinthe portion is specified in the specifying based on a phoneme sequence included in ...

Подробнее
07-01-2021 дата публикации

System and method for speech-enabled automated agent assistance within a cloud-based contact center

Номер: US20210005206A1
Принадлежит: Talkdesk Inc

Methods to reduce agent effort and improve customer experience quality through artificial intelligence. The Agent Assist tool provides contact centers with an innovative tool designed to reduce agent effort, improve quality and reduce costs by minimizing search and data entry tasks The Agent Assist tool is natively built and fully unified within the agent interface while keeping all data internally protected from third-party sharing.

Подробнее
07-01-2021 дата публикации

NONVERBAL INFORMATION GENERATION APPARATUS, METHOD, AND PROGRAM

Номер: US20210005218A1

A nonverbal information generation apparatus includes a display unit that partitions text into predetermined units, displays the text partitioned into the predetermined units, and makes nonverbal information that represents information about behavior of a verbal output agent or nonverbal information that represents information about behavior of a receiver of verbal information of the verbal output agent that corresponds to the text when the verbal output agent outputs the verbal information visible in association with the predetermined units of the text. 1. A nonverbal information generation apparatus comprising:a display device; anda hardware processor that partitions text into predetermined units and causes the display device to display the text partitioned into the predetermined units and make nonverbal information that represents information about behavior of a verbal output agent or nonverbal information that represents information about behavior of a receiver of verbal information of the verbal output agent that corresponds to the text when the verbal output agent outputs the verbal information visible in association with the predetermined units of the text.2. The nonverbal information generation apparatus according to claim 1 ,wherein the hardware processor controls the display device so as to cause the display device to display nonverbal information generated on the basis of feature quantities of the text or feature quantities of voice corresponding to the text and a learned nonverbal information generation model, and the text.3. The nonverbal information generation apparatus according to claim 2 , further comprising:an expression device that expresses the behavior,wherein the display device displays in a state in which an instruction to start, stop, fast-forward, or rewind expression of the behavior by the expression device is receivable, andthe hardware processor, upon receiving the instruction, controls the expression of the behavior by the expression ...

Подробнее
04-01-2018 дата публикации

Method and device for information processing

Номер: US20180005624A1
Автор: Weixing Shi
Принадлежит: Lenovo Beijing Ltd

An information processing method and an electronic device are provided. The method includes: obtaining audio data collected by a slave device; obtaining contextual data corresponding to the slave device; and obtaining a recognition result of recognizing the audio data based on the contextual data. The contextual data characterizes a voice environment of the audio data collected by the slave device.

Подробнее
04-01-2018 дата публикации

Electronic apparatus and method for controlling the electronic apparatus

Номер: US20180005625A1
Автор: Jeong-Ho Han
Принадлежит: SAMSUNG ELECTRONICS CO LTD

An electronic apparatus is disclosed. The electronic apparatus includes an input unit configured to receive a user input, a storage configured to store a recognition model for recognizing the user input, a sensor configured to sense a surrounding circumstance of the electronic apparatus, and a processor configured to control to recognize the received user input based on the stored recognition model and to perform an operation corresponding to the recognized user input, and update the stored recognition model in response to determining that the performed operation is caused by a misrecognition based on a user input recognized after performing the operation and the sensed surrounding circumstance.

Подробнее
04-01-2018 дата публикации

OBFUSCATING TRAINING DATA

Номер: US20180005626A1
Принадлежит:

Examples disclosed herein involve obfuscating training data. An example method includes computing a sequence of acoustic features from audio data of training data, the training data comprising the audio data and a corresponding text transcript; mapping the acoustic features to acoustic model states to generate annotated feature vectors, the annotated feature vectors comprising the acoustic features and corresponding context from the text transcript; and providing a randomized sequence of the annotated feature vectors as obfuscated training data to an audio analysis system. 1. A method to obfuscate training data , the method comprising:computing a sequence of acoustic features from audio data of the training data, the training data comprising the audio data and a corresponding text transcript;mapping the acoustic features to acoustic model states to generate annotated feature vectors, the annotated feature vectors comprising the acoustic features and corresponding states, the states corresponding to context from the text transcript; andproviding a randomized sequence of the annotated feature vectors as obfuscated training data to an audio analysis system.2. The method as defined in claim 1 , the training data comprising confidential information between an entity and a customer of the entity.3. The method as defined in claim 1 , further comprisingcreating a sequence of the annotated feature vectors corresponding to the sequence of acoustic features; andrandomizing the sequence of the annotated feature vectors to generate the randomized sequence of the annotated feature vectors.4. The method as defined in claim 3 , wherein randomizing the sequence of annotated feature vectors comprises reorganizing the sequence of annotated feature vectors to generate the randomized sequence of annotated feature vectors.5. The method as defined in claim 3 , wherein randomizing the sequence of annotated feature vectors comprises randomizing a timing of sending each annotated feature ...

Подробнее
02-01-2020 дата публикации

ARTIFICIAL INTELLIGENCE (AI)-BASED VOICE SAMPLING APPARATUS AND METHOD FOR PROVIDING SPEECH STYLE

Номер: US20200005763A1
Принадлежит: LG ELECTRONICS INC.

Disclosed is an artificial intelligence (AI)-based voice sampling apparatus for providing a speech style, including a rhyme encoder configured to receive a user's voice, extract a voice sample, and analyze a vocal feature included in the voice sample, a text encoder configured to receive text for reflecting the vocal feature, a processor configured to classify the vocal feature of the voice sample input to the rhyme encoder according to a label, extract an embedding vector representing the vocal feature from the label, and generate a speech style from the embedding vector and apply the generated speech style to the text, and a rhyme decoder configured to output synthesized voice data in which the speech style is applied to the text by the processor. 1. An artificial intelligence (AI)-based voice sampling apparatus for providing a speech style , the apparatus comprising:a rhyme encoder configured to receive a user's voice to extract a voice sample, and analyze a vocal feature included in the voice sample;a text encoder configured to receive text for reflecting the vocal feature;a processor configured to classify the vocal feature of the voice sample input to the rhyme encoder according to a label, extract an embedding vector representing the vocal feature from the label, and generate a speech style from the embedding vector and apply the generated speech style to the text; anda rhyme decoder configured to output synthesized voice data in which the speech style is applied to the text by the processor.2. The apparatus of claim 1 , whereinthe rhyme encoder divides the voice sample by a predetermined label and extract an embedding vector for the label.3. The apparatus of claim 1 , whereinthe rhyme encoder extracts the embedding vector through a vocal feature including at least one of a speech rate, a pronunciation intonation, a pause interval, a pitch, or an intonation of the user included in the voice sample.4. The apparatus of claim 3 , whereinthe extracting of the ...

Подробнее
02-01-2020 дата публикации

Deeplearning method for voice recognition model and voice recognition device based on artificial neural network

Номер: US20200005766A1
Автор: Dami Kim
Принадлежит: LG ELECTRONICS INC

A method for training an artificial neural network-based speech recognition model is disclosed. In the method for training an artificial neural network-based speech recognition model, a user's speech is learned by using target data representing features and non-target data representing non-features as random inputs and outputs, and then the user's speech is recognized under a noise situation. A method for training an artificial neural network-based speech recognition model and speech recognition device of the present disclosure can be associated with artificial intelligence modules, drones (unmanned aerial vehicles (UAVs)), robots, augmented reality (AR) devices, virtual reality (VR) devices, devices related to 5G service, etc.

Подробнее
02-01-2020 дата публикации

SPEECH RECOGNITION METHOD AND SPEECH RECOGNITION DEVICE

Номер: US20200005774A1
Автор: YUN Hwan Sik
Принадлежит:

Disclosed are a speech recognition method capable of communicating with other electronic devices and an external server in a 5G communication condition by performing speech recognition by executing an artificial intelligence (AI) algorithm and/or a machine learning algorithm. The speech recognition method may comprise performing speech recognition by using an acoustic model and a language model stored in a speech database, determining whether the speech recognition of the spoken sentence is successful, storing speech recognition failure data when the speech recognition of the spoken sentence fails, analyzing the speech recognition failure data of the spoken sentence and updating the acoustic model or the language model by adding the recognition failure data to a learning database of the acoustic model or the language model when the cause of the speech recognition failure is due to the acoustic model or the language model and machine-learning the acoustic model or the language model. 1. A speech recognition method comprising:receiving a spoken sentence speech spoken by a user;performing speech recognition using an acoustic model and a language model stored in a speech database;determining whether the speech recognition is successful;storing speech recognition failure data when the speech recognition fails;analyzing the speech recognition failure data to determine whether a cause of the speech recognition failure is due to the acoustic model or the language model; andupdating the acoustic model by adding the recognition failure data to a learning database of the acoustic model when the cause of the speech recognition failure is due to the acoustic model and machine-learning the acoustic model based on the added learning database of the acoustic model and updating the language model by adding the recognition failure data to a learning database of the language model when the cause of the speech recognition failure is due to the language model and machine-learning the ...

Подробнее
02-01-2020 дата публикации

VOICE RECOGNITION DEVICE AND VOICE RECOGNITION METHOD

Номер: US20200005775A1
Принадлежит: Mitsubishi Electric Corporation

A voice recognition device includes: a first feature vector calculating unit () for calculating a first feature vector from voice data input; an acoustic likelihood calculating unit () for calculating an acoustic likelihood of the first feature vector by using an acoustic model used for calculating an acoustic likelihood of a feature vector; a second feature vector calculating unit () for calculating a second feature vector from the voice data; a noise degree calculating unit () for calculating a noise degree of the second feature vector by using a discriminant model used for calculating a noise degree indicating whether a feature vector is noise or voice; a noise likelihood recalculating unit () for recalculating an acoustic likelihood of noise on the basis of the acoustic likelihood of the first feature vector and the noise degree of the second feature vector; and a collation unit () for performing collation with a pattern of a vocabulary word to be recognized, by using the acoustic likelihood calculated and the acoustic likelihood of noise recalculated, and outputting a recognition result of the voice data. 1. A voice recognition device comprising:a processor to execute a program; anda memory to store the program which, when executed by the processor, performs processes of,calculating a first feature vector from voice data input;calculating acoustic likelihoods of respective phonemes and an acoustic likelihood of noise of the first feature vector, by using an acoustic model used for calculating an acoustic likelihood of a feature vector;calculating a second feature vector from the voice data;calculating a noise degree of the second feature vector, by using a discriminant model used for calculating a noise degree indicating whether a feature vector is noise or voice;recalculating an acoustic likelihood of noise on a basis of a larger value between the acoustic likelihood of noise of the first feature vector, and a likelihood that is calculated by adding a maximum ...

Подробнее
02-01-2020 дата публикации

Novel and innovative means of providing an anonymized and secure mechanism for speech-to-text conversion. This invention provides a versatile and extensible privacy layer that leverages existing cloud-based Automated Speech Recognition (ASR) services and can accommodate emerging speech-to-text technologies, such as Natural Language Processing (NLP), voice bots and other voice-based artificial intelligence interfaces. This invention also allows the latest and best-of-breed speech technologies to be applied to the legal, medical, financial, and other privacy-sensitive fields without sacrificing

Номер: US20200005792A1
Автор: Wutscher Ralph T.
Принадлежит:

Novel and innovative means of providing an anonymized and secure mechanism for speech-to-text conversion. This invention provides a versatile and extensible privacy layer that leverages existing cloud-based Automated Speech Recognition (ASR) services and can accommodate emerging speech-to-text technologies, such as Natural Language Processing (NLP), voice bots and other voice-based artificial intelligence interfaces. This invention also allows the latest and best-of-breed speech technologies to be applied to the legal, medical, financial, and other privacy-sensitive fields without sacrificing security and privacy. 1. A mechanism and method for securely and privately leveraging best-of-breed Automated Speech Recognition (ASR) services while maintaining the confidentiality of the requestor and their data , comprising:Interfacing with publicly available ASR services as selected and determined by the user through a rules-based approach through the ASR services' existing Application Layer Interface (API) or by use of a simple abstraction interface where the ASR does not provide an API;Securely and reliably collecting voice data as streams or files;Deconstructing the speech data into distinct fragments;Using unrelated identification and obfuscation to prevent exploitation by the ASR service providers and/or hackers;Using an encrypted stream router to convey the distinct speech data fragments obtained from the user to multiple ASR with each ASR receiving only a portion of the distinct speech data;Receiving the results from the ASR following each respective ASR's processing of the distinct speech data fragments provided;Analyzing the data received from the ASRs and applying configurable syntax rules to parse and process the speech data;Applying the processed speech data to trigger pre-determined events or entries as specified by the user; andAllowing the user to designate and prioritize speech input streams.2. A computer system implementing the mechanism and method in claim ...

Подробнее
05-01-2017 дата публикации

FACIAL GESTURE RECOGNITION AND VIDEO ANALYSIS TOOL

Номер: US20170006258A1
Принадлежит: Krush Technologies, LLC

Embodiments disclosed herein may be directed to a video communication server. In some embodiments, the video communication server includes: at least one memory including instructions; and at least one processing device configured for executing the instructions, wherein the instructions cause the at least one processing device to perform the operations of: determining a time duration of a video communication connection between a first user of a first user device and a second user of a second user device; analyzing video content transmitted between the first user device and the second user device; determining at least one gesture of at least one of the first user and the second user based on analyzing the video content; and generating a compatibility score of the first user and the second user based at least in part on the determined time duration and the at least one determined gesture. 1. A video communication server comprising:at least one memory comprising instructions; and determining, using a communication unit comprised in the at least one processing device, a time duration of a video communication connection between a first user of a first user device and a second user of a second user device;', 'analyzing, using a graphical processing unit (GPU) comprised in the at least one processing device, video content transmitted between the first user device and the second user device;', 'determining, using a gesture analysis unit comprised in the at least one processing device, at least one gesture of at least one of the first user and the second user based on analyzing the video content; and', 'generating, using a compatibility unit comprised in the at least one processing device, a compatibility score of the first user and the second user based at least in part on the determined time duration and the at least one determined gesture., 'at least one processing device configured for executing the instructions, wherein the instructions cause the at least one processing ...

Подробнее
03-01-2019 дата публикации

METHOD AND DEVICE FOR MANAGING DIALOGUE BASED ON ARTIFICIAL INTELLIGENCE

Номер: US20190005948A1
Принадлежит:

Embodiments of the present disclosure provide a method and a device for managing a dialogue based on artificial intelligence. The method includes the followings. An optimum system action is determined from at least one candidate system action according to a current dialogue status feature, a candidate system action feature and surrounding feedback information of the at least one candidate system action and based on a decision model. Since the current dialogue status corresponding to the current dialogue status feature includes uncertain results of natural language understanding, the at least one candidate system action acquired according to the current dialogue status also includes the uncertain results of natural language understanding. 1. A method for managing a dialogue based on artificial intelligence , comprising:receiving current dialogue information;determining a user intention of the current dialogue information;determining query dimension distribution information and current single-round slot distribution information of the current dialogue information under the user intention;generating current multi-round slot distribution information according to the current single-round slot distribution information of the current dialogue information and historical multi-round slot distribution information of historical dialogue information;generating a current dialogue status according to the user intention, the query dimension distribution information and the current multi-round slot distribution information of the current dialogue information;performing a first feature extraction on the current dialogue status to obtain a current dialogue status feature;determining at least one candidate system action according to the current dialogue status and a pre-configured rule;performing a second feature extraction on the at least one candidate system action to obtain a candidate system action feature of each of the at least one candidate system; andinputting the current ...

Подробнее
03-01-2019 дата публикации

SECURE UTTERANCE STORAGE

Номер: US20190005952A1
Принадлежит:

Technologies for secure storage of utterances are disclosed. A computing device captures audio of a human making a verbal utterance. The utterance is provided to a speech-to-text (STT) service that translates the utterance to text. The STT service can also identify various speaker-specific attributes in the utterance. The text and attributes are provided to a text-to-speech (TTS) service that creates speech from the text and a subset of the attributes. The speech is stored in a data store that is less secure than that required for storing the original utterance. The original utterance can then be discarded. The STT service can also translate the speech generated by the TTS service to text. The text generated by the STT service from the speech and the text generated by the STT service from the original utterance are then compared. If the text does not match, the original utterance can be retained. 1. An apparatus , comprising: receive first audio data comprising a first utterance of one or more first words, the first audio data having a plurality of attributes;', 'perform speech recognition on the first audio data to identify the one or more first words;', 'perform speech recognition on the first audio data to identify the plurality of attributes;', 'generate second audio data using at least the one or more first words and a subset of the plurality of attributes, the second audio data comprising a second utterance of the one or more first words; and', 'store only the second audio data., 'at least one non-transitory computer-readable storage medium to store instructions which, in response to being performed by one or more processors, cause the apparatus to2. The apparatus of claim 1 , wherein one or more of the plurality of attributes indicate personally identifiable information for a speaker of the first utterance.3. The apparatus of claim 2 , wherein the subset of the attributes comprise attributes that do not indicate the personally identifiable information for the ...

Подробнее
03-01-2019 дата публикации

WAKE-ON-VOICE METHOD, TERMINAL AND STORAGE MEDIUM

Номер: US20190005954A1
Принадлежит:

The present disclosure provides a wake-on-voice method, a terminal and a storage medium. The method includes: acquiring a wake-up voice configured to wake up a smart terminal; performing an analysis on an acoustic feature of the wake-up voice by using a preset acoustic model and a preset wake-up word recognition network of the smart terminal, so as to acquire a confidence coefficient of the acoustic feature of the wake-up voice with respect to an acoustic feature of a preset wake-up word; determining whether the confidence coefficient falls in a preset range of moderate confidence coefficients, if yes, uploading the wake-up voice to a remote server; and determining whether a linguistic feature obtained by analyzing the wake-up voice using a linguistic model matches to a linguistic feature of the preset wake-up word, if yes, receiving an instruction to wake up the smart terminal generated by the remote server. 1. A wake-on-voice method , comprising:acquiring a wake-up voice configured to wake up a smart terminal;performing an analysis on an acoustic feature of the wake-up voice by using a preset acoustic model and a preset wake-up word recognition network of the smart terminal, so as to acquire a confidence coefficient of the acoustic feature of the wake-up voice with respect to an acoustic feature of a preset wake-up word;determining whether the confidence coefficient falls in a preset range of moderate confidence coefficients, and if yes, uploading the wake-up voice to a remote server; anddetermining whether a linguistic feature obtained by analyzing the wake-up voice using a linguistic model in the remote server matches to a linguistic feature of the preset wake-up word, and if yes, receiving an instruction to wake up the smart terminal generated by the remote server.2. The wake-on-voice method according to claim 1 , wherein after it is determined that the linguistic feature obtained by analyzing the wake-up voice using the linguistic model in the remote server ...

Подробнее
07-01-2016 дата публикации

Bluetooth headset and voice interaction control thereof

Номер: US20160006849A1
Принадлежит: Zgmicro Wuxi Corp

Techniques for a personalized Bluetooth headset and a voice interaction control method thereof are described. According to one aspect of the present invention, the Bluetooth headset is caused to maintain a voice contact list. Each item in the voice contact list corresponds to a phone number associated with a set of audio data (e.g., a voice or a predefined audio). When a paired mobile device receives a call, the voice contact list is searched per the caller number. A corresponding audio is played back when an item is located in the voice contact list. As such a user of the Bluetooth headset knows who is calling and determines whether the call shall be answered or not.

Подробнее
04-01-2018 дата публикации

Personal Voice-Based Information Retrieval System

Номер: US20180007201A1
Автор: Kurganov Alexander
Принадлежит:

The present invention relates to a system for retrieving information from a network such as the Internet. A user creates a user-defined record in a database that identifies an information source, such as a web site, containing information of interest to the user. This record identifies the location of the information source and also contains a recognition grammar based upon a speech command assigned by the user. Upon receiving the speech command from the user that is described within the recognition grammar, a network interface system accesses the information source and retrieves the information requested by the user. 1. A method , comprising:(a) receiving a speech command from a voice-enabled device, over a network, by a speech-recognition engine coupled to a media server by an interactive voice response application including a user-defined search, the speech-recognition engine adapted to convert the speech command into a data message, the media server adapted to identify and access at least one or more websites containing information of interest to a particular user, the speech-recognition engine adapted to select particular speech-recognition grammar describing the speech command received and assigned to fetching content relating to the data message converted from the speech command and assigned to the user-defined search including a web request, along with a uniform resource locator of an identified web site from the one or more websites containing information of interest to the particular user and responsive to the web request;(b) selecting, by the media server, at least one information-source-retrieval instruction stored for the particular speech-recognition grammar in a database coupled to the media server and adapted to retrieve information from the at least one or more websites;(c) accessing, by a web-browsing server, a portion of the information source to retrieve information relating to the speech command, by using a processor of the web-browsing server, ...

Подробнее
02-01-2020 дата публикации

PERSONALIZED SUPPORT ROUTING BASED ON PARALINGUISTIC INFORMATION

Номер: US20200007687A1
Принадлежит:

Embodiments presented herein provide techniques for inferring the current emotional state of a user based on paralinguistic features derived from audio input from that user. If the emotional state meets triggering conditions, the system provides the user with a prompt which allows the user to connect with a support agent. If the user accepts, the system selects a support agent for the user based on the predicted emotional state and on attributes of the support agent found in an agent profile. The system can also determine a priority level for the user based on the score and based on a profile of the user and determine where to place the user in a queue for the support agent. 1. A computer-implemented method comprising:retrieving a set of audio recordings of each interaction between a support agent and each user in a set of users;extracting, from each audio recording, a set of paralinguistic features associated with each user;generating, based on the set of paralinguistic features, a score for each user that represents an emotional state of the user during the interaction with the support agent;determining an implicit customer-satisfaction level for each user interacting with the support agent based on the score;obtaining a set of explicit customer-satisfaction levels from the set of users corresponding to the interaction with the support agent;determining a customer-satisfaction attribute of the support agent based on each implicit customer-satisfaction level and the set of explicit customer-satisfaction levels; andgenerating a support agent profile for the support agent including the customer-satisfaction attribute.2. The computer-implemented method of claim 1 , further comprises: providing a survey to each user interacting with the support agent.3. The computer-implemented method of claim 2 , wherein the explicit customer-satisfaction level is based on the survey completed by the user after interacting with the support agent.4. The computer-implemented method of ...

Подробнее
20-01-2022 дата публикации

RATING INTERFACE FOR BEHAVIORAL IMPACT ASSESSMENT DURING INTERPERSONAL INTERACTIONS

Номер: US20220021716A1
Принадлежит:

A rating interface system and method are provided that allow human users to continuously rate the impact they or other human users and/or their avatars are having on themselves or others during interpersonal interactions, such as conversations or group discussions. The system and method provide time stamping of users' ratings data and audio and video data of an interaction, and correlate the ratings data with the audio and video data at selected time intervals for subsequent analysis. 1. A system for providing a rating interface during an interpersonal interaction between at least a first user or an avatar thereof and a second user or an avatar thereof , comprising:an input device for transmitting ratings data input from the first user of an assessment of the second user during the interpersonal interaction, the input device configured to differentiate user inputs as numerical values; anda processor, communicatively coupled to the input device to receive the ratings data, and memory, and machine-readable instructions stored in the memory that, upon execution by the processor cause the system to carry out an operation comprising time stamping the ratings data transmitted from the input device during the interpersonal interaction.2. The system of claim 1 , wherein the processor is operative to discretize the ratings data from the input device into two or more rating bands claim 1 , each rating band corresponding to a range of input numerical values received from the input device during the interpersonal interaction.3. The system of claim 2 , wherein the rating bands comprise a positive rating band corresponding to an input positive assessment claim 2 , a negative rating band corresponding to an input negative assessment claim 2 , and a neutral rating band corresponding to an input neutral assessment.4. The system of claim 1 , further comprising:an audio device for transmitting audio data of the second user or the avatar thereof; anda video device for transmitting ...

Подробнее
08-01-2015 дата публикации

APPARATUS AND METHOD FOR EXTRACTING FEATURE FOR SPEECH RECOGNITION

Номер: US20150012274A1

An apparatus for extracting features for speech recognition in accordance with the present invention includes: a frame forming portion configured to separate input speech signals in frame units having a prescribed size; a static feature extracting portion configured to extract a static feature vector for each frame of the speech signals; a dynamic feature extracting portion configured to extract a dynamic feature vector representing a temporal variance of the extracted static feature vector by use of a basis function or a basis vector; and a feature vector combining portion configured to combine the extracted static feature vector with the extracted dynamic feature vector to configure a feature vector stream. 1. An apparatus for extracting features for speech recognition , comprising:a frame forming portion configured to separate inputted speech signals in frame units having a prescribed size;a static feature extracting portion configured to extract a static feature vector for each frame of the speech signals;a dynamic feature extracting portion configured to extract a dynamic feature vector representing a temporal variance of the extracted static feature vector by use of a basis function or a basis vector; anda feature vector combining portion configured to combine the extracted static feature vector with the extracted dynamic feature vector to configure a feature vector stream.2. The apparatus of claim 1 , wherein the dynamic feature extracting portion is configured to use a cosine basis function as the basis function.3. The apparatus of claim 2 , wherein the dynamic feature extracting portion comprises:a DCT portion configured to perform a DCT (discrete cosine transform) for a time array of the extracted static feature vectors to compute DCT components; anda dynamic feature selecting portion configured to select some of the DCT components having a high correlation with a variance of the speech signal out of the DCT components as the dynamic feature vector.4. The ...

Подробнее
08-01-2015 дата публикации

SPEECH RECOGNITION DEVICE AND METHOD, AND SEMICONDUCTOR INTEGRATED CIRCUIT DEVICE

Номер: US20150012275A1
Автор: NONAKA Tsutomu
Принадлежит:

A semiconductor integrated circuit device for speech recognition includes a scenario setting unit that receives a command designating scenario flow information and selects prescribed speech reproduction data in a speech reproduction data storage and a prescribed conversion list, in accordance with the scenario flow information, a standard pattern extraction unit that extracts a standard pattern corresponding to at least part of individual words or sentences included in the prescribed conversion list from a speech recognition database, a speech signal synthesizer that synthesizes an output speech signal, a signal processor that generates a feature pattern representing the distribution state of the frequency component of an input speech signal, and a match detector that compares the feature pattern with the standard pattern and outputs a speech recognition result. 1. A semiconductor integrated circuit device that is used in a speech recognition device that issues a question or a message to a user based on speech reproduction data and performs speech recognition processing on speech of the user , comprising:a scenario setting unit that receives a command designating scenario flow information representing a relationship between a plurality of the speech reproduction data and a plurality of conversion lists, and, in accordance with the scenario flow information, selects prescribed speech reproduction data from among the plurality of speech reproduction data which are stored in a speech reproduction data storage, and selects a prescribed conversion list from among the plurality of conversion lists which are stored in a conversion list storage;a standard pattern extraction unit that extracts a standard pattern corresponding to at least part of individual words or sentences included in the prescribed conversion list, from a speech recognition database containing standard patterns representing a distribution state of frequency components of a plurality of phonemes that are ...

Подробнее
27-01-2022 дата публикации

SYSTEMS AND METHODS FOR PROCESSING SPEECH DIALOGUES

Номер: US20220028371A1
Автор: Han Kun, Xu Haiyang

The present disclosure is related to systems and methods for processing speech dialogue. The method includes obtaining target speech dialogue data. The method includes obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively. The method includes determining a representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence into a trained speech dialogue coding model. The method includes determining a summary of the target speech dialogue data by inputting the representation vector into a classification model. 1. A method for processing speech dialogue implemented on a computing device having at least one processor and at least one storage device , the method comprising:obtaining target speech dialogue data;obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively;determining a representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence into a trained speech dialogue coding model; anddetermining a summary of the target speech dialogue data by inputting the representation vector into a classification model.2. The method of claim 1 , further comprising:obtaining a sentence text of the summary of the target speech dialogue data; ...

Подробнее
27-01-2022 дата публикации

LEARNING DEVICE AND PATTERN RECOGNITION DEVICE

Номер: US20220028372A1
Принадлежит: NEC Corporation

The acoustic feature extraction means extracts an acoustic feature, using predetermined parameters, from an acoustic pattern obtained as a result of processing on an acoustic signal. The language vector calculation means calculates a language vector from a given label that represents an attribute of a source of the acoustic signal and that is associated with the acoustic pattern. The similarity calculation means calculates a similarity between the acoustic feature and the language vector. The parameter update means learns parameters so that the similarity becomes larger, and updates the predetermined parameters to the parameters obtained by learning. 1. A learning device comprising:an acoustic feature extraction unit that extracts an acoustic feature, using predetermined parameters, from an acoustic pattern obtained as a result of processing on an acoustic signal;a language vector calculation unit that calculates a language vector from a given label that represents an attribute of a source of the acoustic signal and that is associated with the acoustic pattern;a similarity calculation unit that calculates a similarity between the acoustic feature and the language vector; anda parameter update unit that learns parameters so that the similarity becomes larger, and updates the predetermined parameters to the parameters obtained by learning.2. The learning device according to claim 1 ,wherein the given label is defined for each hierarchy of category of the attribute of the source,wherein the learning device comprises,for each hierarchy of category,a parameter storage unit that stores the predetermined parameters;the acoustic feature extraction unit;the language vector calculation unit;the similarity calculation unit; andthe parameter update unit;whereinthe acoustic feature extraction unit of the highest hierarchy extracts the acoustic feature from a given acoustic pattern, using parameters stored in the parameter storage unit corresponding to the acoustic feature ...

Подробнее
27-01-2022 дата публикации

METHOD FOR SEMANTIC RECOGNITION, ELECTRONIC DEVICE, AND STORAGE MEDIUM

Номер: US20220028376A1

The disclosure discloses a method for semantic recognition, an electronic device, and a storage medium. The detailed solution includes: obtaining a speech recognition result of a speech to be processed, in which the speech recognition result includes a newly added recognition result fragment and a historical recognition result fragment; obtaining a semantic vector of each historical object in the historical recognition result fragment, and obtaining a semantic vector of each newly added object by inputting the semantic vector of each historical object and each newly added object in the newly added recognition result fragment into a streaming semantic coding layer; and obtaining a semantic recognition result of the speech by inputting the semantic vector of each historical object and the semantic vector of each newly added object into a streaming semantic vector fusion layer and a semantic understanding multi-task layer sequentially arranged. 1. A method for semantic recognition , comprising:obtaining a speech recognition result of a speech to be processed, wherein the speech recognition result comprises a newly added recognition result fragment and a historical recognition result fragment, and the newly added recognition result fragment is a recognition result fragment corresponding to a newly added speech fragment in the speech;obtaining a semantic vector of each historical object in the historical recognition result fragment, and obtaining a semantic vector of each newly added object by inputting the semantic vector of each historical object and each newly added object in the newly added recognition result fragment into a streaming semantic coding layer; andobtaining a semantic recognition result of the speech by inputting the semantic vector of each historical object and the semantic vector of each newly added object into a streaming semantic vector fusion layer and a semantic understanding multi-task layer sequentially arranged.2. The method of claim 1 , wherein ...

Подробнее
12-01-2017 дата публикации

METHOD FOR SYSTEM COMBINATION IN AN AUDIO ANALYTICS APPLICATION

Номер: US20170011734A1
Принадлежит:

Exemplary embodiments of the present invention provide a method of system combination in an audio analytics application including providing a plurality of language identification systems in which each of the language identification systems includes a plurality of probabilities. Each probability is associated with the system's ability to detect a particular language. The method of system combination in the audio analytics application includes receiving data at the language identification systems. The received data is different from data used to train the language identification systems. A confidence measure is determined for each of the language identification systems. The confidence measure identifies which language its system predicts for the received data and combining the language identification systems according to the confidence measures. 1. A method of system combination in an audio analytics application , comprising:providing a plurality of language identification systems, wherein each of the language identification systems includes a plurality of probabilities, wherein each probability is associated with the system's ability to detect a particular language;receiving data at the language identification systems, wherein the received data is different from data used to train the language identification systems;determining a confidence measure for each of the language identification systems, wherein the confidence measure identifies which language its system predicts for the received data; andcombining the language identification systems according to the confidence measures.2. The method of claim 1 , wherein the language identification systems have different feature extraction methods from each other.3. The method of claim 1 , wherein the language identification systems have different modeling schemes from each other.4. The method of claim 1 , wherein the language identification systems have different noise removal schemes from each other.5. The method of claim ...

Подробнее
12-01-2017 дата публикации

SPEECH RECOGNITION SYSTEM AND METHOD

Номер: US20170011735A1
Автор: Kim Dong Hyun, LEE Min Kyu
Принадлежит:

A system and a method of speech recognition which enable a spoken language to be automatically identified while recognizing speech of a person who vocalize to effectively process multilingual speech recognition without a separate process for user registration or recognized language setting such as use of a button for allowing a user to manually select a language to be vocalized and support speech recognition of each language to be automatically performed even though persons who speak different languages vocalize by using one terminal to increase convenience of the user. 1. A system of speech recognition comprising:a speech processing unit analyzing a speech signal to extract feature data; anda language identification speech recognition unit performing language identification and speech recognition by using the feature data and feeding back identified language information to the speech processing unit,wherein the speech processing unit outputs a result of the speech recognition in the language identification speech recognition unit according to the fed-back identified language information.2. The system of claim 1 , wherein the language identification speech recognition unit identifies a language for the speech signal through analysis of likelihood with respect to the feature data by referring to an acoustic model and a language model.3. The system of claim 1 , wherein the language identification speech recognition unit includesa plurality of language decoders each performing the speech recognition for the feature data in parallel and calculating a language identification score through the analysis of the likelihood every one or more speech signal frames based on the feature data by referring to the acoustic model and the language model of a corresponding language, anda language decision module deciding as the identified language a language corresponding to a selected target language decoder according to a decision rule by referring to the language identification ...

Подробнее
12-01-2017 дата публикации

METHOD AND DEVICE FOR RECOGNIZING VOICE

Номер: US20170011736A1
Принадлежит:

A method for recognizing a voice and a device for recognizing a voice are provided. The method includes: collecting voice information input by a user; extracting characteristics from the voice information to obtain characteristic information; decoding the characteristic information according to an acoustic model and a language model obtained in advance to obtain recognized voice information, wherein the acoustic model is obtained by data compression in advance. 1. A method for recognizing a voice , comprising:collecting voice information input by a user;extracting characteristics from the voice information to obtain characteristic information;decoding the characteristic information according to an acoustic model and a language model obtained in advance to obtain recognized voice information, wherein the acoustic model is obtained by data compression in advance.2. The method according to claim 1 , wherein after obtaining characteristic information claim 1 , the method further comprises:filtering the characteristic information to obtain filtered characteristic information, so as to decode the filtered characteristic information.3. The method according to claim 2 , wherein filtering the characteristic information comprises:performing an extraction of frame skipping on the characteristic information.4. The method according to claim 1 , wherein decoding the characteristic information according to an acoustic model and a language model obtained in advance to obtain recognized voice information comprises:performing a data compression on the characteristic information to obtain compressed characteristic information, and calculating the compressed characteristic information according to the acoustic model that is obtained by the data compression in advance to obtain a score of acoustic model;calculating data after acoustic model scoring according to the language model to obtain a score of language model;obtaining the recognized voice information according to the score of ...

Подробнее
12-01-2017 дата публикации

Method for Distinguishing Components of an Acoustic Signal

Номер: US20170011741A1

A method distinguishes components of a signal by processing the signal to estimate a set of analysis features, wherein each analysis feature defines an element of the signal and has feature values that represent parts of the signal, processing the signal to estimate input features of the signal, and processing the input features using a deep neural network to assign an associative descriptor to each element of the signal, wherein a degree of similarity between the associative descriptors of different elements is related to a degree to which the parts of the signal represented by the elements belong to a single component of the signal. The similarities between associative descriptors are processed to estimate correspondences between the elements of the signal and the components in the signal. Then, the signal is processed using the correspondences to distinguish component parts of the signal.

Подробнее
12-01-2017 дата публикации

VIRTUAL PHOTOREALISTIC DIGITAL ACTOR SYSTEM FOR REMOTE SERVICE OF CUSTOMERS

Номер: US20170011745A1
Автор: Navaratnam Ratnakumar
Принадлежит:

A system for remote servicing of customers includes an interactive display unit at the customer location providing two-way audio/visual communication with a remote service/sales agent, wherein communication inputted by the agent is delivered to customers via a virtual Digital Actor on the display. The system also provides for remote customer service using physical mannequins with interactive capability having two-way audio visual communication ability with the remote agent, wherein communication inputted by the remote service or sales agent is delivered to customers using the physical mannequin. A web solution integrates the virtual Digital Actor system into a business website. A smart phone solution provides the remote service to customers via an App. In another embodiment, the Digital Actor is instead displayed as a 3D hologram. The Digital Actor is also used in an e-learning solution, in a movie studio suite, and as a presenter on TV, online, or other broadcasting applications. 1. A system configured to provide service to a customer from a remote service agent comprising:a video camera and a microphone configured to selectively capture video images and sounds, respectively, within a preset customer perimeter, said system configured to transmit said video images and sounds to the remote service agent;a speaker configured to emit sound within said preset customer perimeter;a sensor configured to detect a customer positioned in said preset customer perimeter, and to trigger said system to initiate said selective capture of video images and sounds therein, and said transmissions between said preset customer perimeter and the remote service agent;means for displaying a virtual digital actor to the customer; andwherein said system is configured for an input of the remote service agent to dynamically control a visual appearance of said displayed virtual digital actor on said means for displaying, and to control verbal communication emitted from said speaker, to interact ...

Подробнее
11-01-2018 дата публикации

Systems and methods for improved user interface

Номер: US20180011688A1
Принадлежит: Baidu USA LLC

Aspects of the present disclosure relate to systems and methods for a voice-centric virtual or soft keyboard (or keypad). Unlike other keyboards, embodiments of the present disclosure prioritize the voice keyboard, meanwhile providing users with a quick and uniform navigation to other keyboards (e.g., alphabet, punctuations, symbols, emoji's, etc.). In addition, in embodiments, common actions, such as delete and return are also easily accessible. In embodiments, the keyboard is also configurable to allow a user to organize buttons according to their desired use and layout. Embodiments of such a keyboard provide a voice-centric, seamless, and powerful interface experience for users.

Подробнее
11-01-2018 дата публикации

AUTOMATIC INTERPRETATION METHOD AND APPARATUS

Номер: US20180011843A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

Provided is an automated interpretation method, apparatus, and system. The automated interpretation method includes encoding a voice signal in a first language to generate a first feature vector, decoding the first feature vector to generate a first language sentence in the first language, encoding the first language sentence to generate a second feature vector with respect to a second language, decoding the second feature vector to generate a second language sentence in the second language, controlling a generating of a candidate sentence list based on any one or any combination of the first feature vector, the first language sentence, the second feature vector, and the second language sentence, and selecting, from the candidate sentence list, a final second language sentence as a translation of the voice signal. 1. An automated interpretation method comprising:encoding a voice signal in a first language to generate a first feature vector;decoding the first feature vector to generate a first language sentence in the first language;encoding the first language sentence to generate a second feature vector with respect to a second language;decoding the second feature vector to generate a second language sentence in the second language;controlling a generating of a candidate sentence list based on any one or any combination of the first feature vector, the first language sentence, the second feature vector, and the second language sentence; andselecting, from the candidate sentence list, a final second language sentence as a translation of the voice signal.2. The method of claim 1 , wherein the generating of the candidate sentence list includes acquiring a candidate sentence claim 1 , from a database claim 1 , determined to correspond to any one or any combination of the first feature vector claim 1 , the first language sentence claim 1 , the second feature vector claim 1 , and the second language sentence from a database.3. The method of claim 2 , wherein the acquiring ...

Подробнее
14-01-2021 дата публикации

DYNAMIC AUGMENTED REALITY INTERFACE CREATION

Номер: US20210011684A1
Принадлежит:

A method for dynamic augmented reality interface creation is provided. The method detects an utterance from a user of an augmented reality device and determines an ambiguity level of the utterance. The method generates a set of visual artifacts based on the utterance and the ambiguity level. The visual artifacts are generated within an augmented reality use interface, with each visual artifact corresponding to a selectable function. The method detects an interaction with a first visual artifact corresponding to a first selectable function. The method modifies the augmented reality user interface in response to the interaction with the first visual artifact. 1. A computer-implemented method , comprising:detecting an utterance from a user of an augmented reality device;determining a context of the utterance based on audio and visual input to the augmented reality device;based on the context of the utterance, determining an ambiguity level of the utterance;based on the utterance, the context, and the ambiguity level, generating a set of visual artifacts within an augmented reality user interface, each visual artifact of the set of visual artifacts corresponding to a selectable function;detecting interaction with a first visual artifact of the set of visual artifacts, the first visual artifact corresponding to a first selectable function; andmodifying the augmented reality user interface in response to the interaction with the first visual artifact.2. The computer-implemented method of claim 1 , wherein detecting the utterance from the user further comprises:detecting a vocalization at the augmented reality device; andcomparing the vocalization to one or more vocalization profiles to determine the vocalization is the utterance corresponding to one or more functions.3. The computer-implemented method of claim 2 , wherein the one or more vocalization profiles comprises a global profile representing utterances of a plurality of users claim 2 , the global profile defining ...

Подробнее
14-01-2016 дата публикации

LOCAL AND REMOTE AGGREGATION OF FEEDBACK DATA FOR SPEECH RECOGNITION

Номер: US20160012817A1
Принадлежит:

A local feedback mechanism for customizing training models based on user data and directed user feedback is provided in speech recognition applications. The feedback data is filtered at different levels to address privacy concerns for local storage and for submittal to a system developer for enhancement of generic training models. 120-. (canceled)21. A computing device for providing speech recognition with local and remote feedback loops , the computing device comprising:a memory configured to store instructions associated with a speech recognition service; collect user data for a user, wherein the user data includes live recordings by the user and textual data from user generated documents;', 'aggregate collected user data through a feedback mechanism;', 'process the aggregated user data locally through the feedback mechanism; and', 'provide the aggregated user data to a remote system developer through the feedback mechanism for updating generic training models., 'an adaptation module configured to, 'one or more processors coupled to the memory, the one or more processor executing the speech recognition application in conjunction with the instructions stored in the memory, wherein the speech recognition service includes22. The computing device of claim 21 , wherein claim 21 , the adaptation module is further configured to:receive updated generic training models from the system developer; andupdate current training models with the received training models.23. The computing device of claim 21 , wherein the adaptation module is further configured to:filter the aggregated data prior to processing the aggregated data to prevent storage of private information, wherein the aggregated data is filtered at distinct levels.24. The computing device of claim 21 , wherein the adaptation module is further configured to:filter the aggregated data prior to providing to the system developer to remove private data, wherein the aggregated data is filtered at distinct levels for the ...

Подробнее
11-01-2018 дата публикации

FOLLOW-UP VOICE QUERY PREDICTION

Номер: US20180012594A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting follow-up queries to an initial transcription of an utterance. In some implementations, one or more follow-up queries that are pre-associated with a transcription of an initial utterance of a user are identified. A new or modified language model in which a respective probability associated with one or more of the follow-up queries is increased with respect to an initial language model is obtained. Subsequent audio data corresponding to a subsequent utterance of the user is then received. The subsequent audio data is processed using the new or modified language model to generate a transcription of the subsequent utterance. The transcription of the subsequent utterance is then provided for output to the user. 1. A computer-implemented method comprising:identifying one or more follow-up queries that are pre-associated with a term that matches a transcription of an initial utterance of a user, wherein the follow-up queries are (i) different than the term, and (ii) are pre-associated with the term based on query log data indicating that other users of a search engine have previously submitted the follow-up queries after submitting an initial query that includes the term;adjusting an initial language model to generate a modified language model, the modified language model specifying a respective probability associated with one or more of the follow-up queries that is increased with respect to the initial language model;receiving subsequent audio data corresponding to a subsequent utterance of the user;processing the subsequent audio data using the modified language model to generate a transcription of the subsequent utterance; andproviding the transcription of the subsequent utterance for output to the user.2. The method of claim 1 , further comprising:receiving initial audio data corresponding to an initial utterance of a user; andprocessing the audio data ...

Подробнее
11-01-2018 дата публикации

PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION

Номер: US20180012613A1
Принадлежит:

A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech are converted into a converted speech based on the PPG and the mapping. 1. A computer-implemented method comprising:obtaining a target speech;obtaining a source speech;generating a phonetic posteriorgram (PPG) based on acoustic features of the target speech, the PPG including a set of values corresponding to a range of times and a range of phonetic classes;generating a mapping between the PPG and the acoustic features of the target speech; andconverting the source speech into a converted speech based on the PPG and the mapping.2. The computer-implemented method of claim 1 , wherein the range of phonetic classes correspond to a range of senones.3. The computer-implemented method of claim 1 , wherein the set of values correspond to posterior probabilities of each of the range of phonetic classes for each of the range of times claim 1 , and wherein the PPG comprises a matrix.4. The computer-implemented method of claim 1 , wherein the source speech is different than the target speech.5. The computer-implemented method of claim 1 , wherein generating the PPG includes using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers.6. The computer-implemented method of claim 5 , wherein the SI-ASR system is trained for PPGs generation using a multi-speaker ASR corpus claim 5 , an input being an MFCC feature vector of tframe claim 5 , denoted as X ...

Подробнее
11-01-2018 дата публикации

Method, apparatus for eliminating popping sounds at the beginning of audio, and storage medium

Номер: US20180012620A1
Автор: Lingcheng KONG
Принадлежит: Tencent Technology Shenzhen Co Ltd

A method and apparatus for eliminating popping sounds at the beginning of audio includes: examining audio frames within a pre-set time period at the beginning of audio to determine a popping residing section; applying popping elimination to audio frames in the popping residing section; calculating an average value of amplitudes of M audio frames preceding the popping residing section and an average value of amplitudes of K audio frames succeeding the popping residing section; setting the amplitudes of the audio frames in the popping residing section to zero in response to a determination that the two average values are both smaller than a pre-set sound reduction threshold; weakening the amplitudes of the audio frames in the popping residing section in response to a determination that both the two average values are not smaller than a pre-set sound reduction threshold; M and K are integers larger than one.

Подробнее
12-01-2017 дата публикации

REAL-TIME IN-STREAM COMPLIANCE MONITORING FACILITY

Номер: US20170013127A1
Принадлежит:

Disclosed herein are systems and methods for redacting sensitive information. Certain methods involve caching an incoming communication in a cache of the machine, optionally, when the communication is a vocal stream, using a speech processor, converting the vocal stream to text before analyzing, and automatically analyzing the communication and identifying sensitive information in accordance with one or more rules and identifying the location of the sensitive information in the communication. The sensitive information is automatically redacted from the communication and the redacted communication is at least one of stored and transmitted. 1. A method , comprising:caching an incoming communication in a cache of the machine;optionally, when the communication is a vocal stream, using a speech processor, converting the vocal stream to text before analyzing;automatically analyzing the communication and identifying sensitive information in accordance with one or more rules and identifying the location of the sensitive information in the communication;automatically redacting the sensitive information from the communication; andat least one of storing and transmitting the redacted communication.2. The method of claim 1 , wherein the communication is at least one of a vocal stream claim 1 , a text message claim 1 , a chat transcript claim 1 , an SMS claim 1 , and an email.3. The method of claim 1 , wherein redacting comprises at least one of removing claim 1 , muting and bleeping.4. The method of claim 1 , wherein the vocal stream is at least one of an incoming telephonic voice communication and a cloud-based vocal stream.5. The method of claim 1 , wherein the sensitive information is at least one of all numbers where two or more numbers are found in sequence claim 1 , a credit card number claim 1 , a security code claim 1 , an expiration date claim 1 , a PIN number claim 1 , a date of birth claim 1 , a driver's license data claim 1 , authentication data claim 1 , cardholder ...

Подробнее
10-01-2019 дата публикации

VOICE RECOGNITION METHOD, RECORDING MEDIUM, VOICE RECOGNITION DEVICE, AND ROBOT

Номер: US20190013008A1
Автор: Kunitake Yuji, OTA YUSAKU
Принадлежит:

A voice recognition method is provided that includes extracting a first speech from the sound collected with a microphone connected to a voice processing device, and calculating a recognition result for the first speech and the confidence level of the first speech. The method also includes performing a speech for a repetition request based on the calculated confidence level of the first speech, and extracting with the microphone a second speech obtained through the repetition request. The method further includes calculating a recognition result for the second speech and the confidence level of the second speech, and generating a recognition result from the recognition result for the first speech and the recognition result for the second speech, based on the confidence level of the calculated second speech. 1. A voice recognition method , comprising:receiving, via a microphone, a first speech that a speaker makes intending one word, the first speech including N phonemes, where N is a natural number of 2 or more;calculating occurrence probabilities of all kinds of phonemes for each of the N phonemes included in the first speech;recognizing a phoneme string, in which phonemes each having the highest probability are lined in order, to be a first phoneme string corresponding to the first speech, the phonemes corresponding to the respective N phonemes from a first phoneme to an N-th phoneme included in the first speech;calculating a first value by multiplying together occurrence probabilities that the N phonemes included in the first phoneme string have;when the first value is smaller than a first threshold, outputting a voice to prompt the speaker to repeat the one word, via a loudspeaker;receiving, via the microphone, a second speech that the speaker repeats intending the one word, the second speech including M phonemes, where M is a natural number of 2 or more;calculating occurrence probabilities of all kinds of phonemes for each of the M phonemes included in the ...

Подробнее
10-01-2019 дата публикации

SYLLABLE BASED AUTOMATIC SPEECH RECOGNITION

Номер: US20190013009A1
Принадлежит:

Systems, methods, and computer programs are described which utilize the structure of syllables as an organizing element of automated speech recognition processing to overcome variations in pronunciation, to efficiently resolve confusable aspects, to exploit context, and to map the speech to orthography. 1. A data processing method comprising:receiving, at a computing system, a production symbol stream produced from spoken words of a particular language from an acoustic processing system;extracting, from the production symbol stream, a plurality of production patterns;using a stored production to canonical mapping data comprising conditional probabilities for one or more mappings of production patterns to canonical patterns, generating candidate syllables and a probability of each candidate syllable from the plurality of production patterns;using a stored syllable to orthographic pattern mapping comprising conditional probabilities for one or more mappings, generating candidate orthographic patterns and a probability of each candidate orthographic pattern from the candidate syllables;based, at least in part, on the probabilities for each candidate orthographic pattern, generating an orthographic representation of the production symbol stream.2. The data processing method of claim 1 , wherein the production stream is segmented into phonotactic units comprising intervowel consonant (IVC) and vowel neighborhood (VN) units claim 1 , by performing sequentially for each symbol of the production symbol stream:initializing a three-symbol buffer to zero and an IVC accumulator buffer to zero and adding production symbols sequentially to the three-symbol buffer;after adding a symbol to the three-symbol buffer, determining if the middle symbol of the three-symbol buffer is a vowel and that the three symbols therefore comprise a VN, storing the VN;if an added symbol is a consonant, appending that consonant to the IVC accumulator;if the next added symbol is not a consonant, ...

Подробнее
10-01-2019 дата публикации

Speech Recognition System, Terminal Device, and Dictionary Management Method

Номер: US20190013010A1
Принадлежит:

To assign an appropriate pronunciation to a word or phrase having a unique pronunciation or a word or phrase having a pronunciation incorrectly used by a user, a terminal device divides a first word or phrase indicated by a first recognition result acquired from a speech recognition server into morphemes and assigns a pronunciation to each of the morphemes, and divides a second word or phrase indicated by a second recognition result acquired from a speech recognition module into morphemes. Further, the terminal device selects, for a morpheme having the same character string as that of any one of the morphemes forming the second word or phrase among the morphemes forming the first word or phrase, a pronunciation of the morpheme indicated by the second recognition result. 1. A terminal device , comprising:a communication control module configured to transmit speech data on a user to a speech recognition server and to receive a first recognition result from the speech recognition server;a storage configured to store a speech recognition dictionary for speech recognition;a speech recognition module configured to perform speech recognition on the speech data through use of the speech recognition dictionary to obtain a second recognition result; anda dictionary management module configured to register a first word or phrase indicated by the first recognition result in the speech recognition dictionary, a morphological analysis module configured to divide the first word or phrase into morphemes and assign a pronunciation to each of the morphemes, and to divide a second word or phrase indicated by the second recognition result into morphemes; and', 'a pronunciation selection module configured to select, for a morpheme having the same character string as a character string of any one of the morphemes forming the second word or phrase among the morphemes forming the first word or phrase, a pronunciation of the morpheme indicated by the second recognition result, and, 'wherein ...

Подробнее
10-01-2019 дата публикации

INITIALIZATION OF CTC SPEECH RECOGNITION WITH STANDARD HMM

Номер: US20190013015A1
Принадлежит:

A method for improved initialization of speech recognition system comprises mapping a trained hidden markov model based recognition node network (HMM) to a Connectionist Temporal Classification (CTC) based node label scheme. The central state of each frame in the HMM are mapped to CTC-labeled output nodes and the non-central states of each frame are mapped to CTC-blank nodes to generate a CTC-labeled HMM and each central state represents a phoneme from human speech detected and extracted by a computing device. Next the CTC-labeled HMM is trained using a cost function, wherein the cost function is not part of a CTC cost function. Finally the CTC-labeled HMM is trained using a CTC cost function to produce a CTC node network. The CTC node network may be iteratively trained by repeating the initialization steps. 1. A method for improved initialization of speech recognition systems , the method comprising;a) mapping a central state of each frame in a trained Hidden Markov Model (HMM) to Connectionist Temporal Classification (CTC) labeled nodes and mapping one or more non-central states of each frame to CTC-blank nodes to generate a CTC-labeled HMM, wherein each central state represents a phoneme;b) training the CTC-labeled HMM using a cost function wherein the cost function is not part of a CTC cost function;c) training the CTC-labeled HMM using a CTC cost function to produce a CTC node Deep Learning Neural network.2. The method of further comprising claim 1 , d) generating new frame and label information from the CTC-node network.3. The method of wherein generating new frame information comprises omitting blank labels that are not used by the CTC node Deep Learning Neural Network.4. The method of wherein generating new label information comprising searching for new phone labels using alternative pronunciation dictionaries for each word.5. The method of wherein generating new frame information comprises omitting blank states which fail to satisfy a probability threshold. ...

Подробнее
10-01-2019 дата публикации

ANALOG VOICE ACTIVITY DETECTION

Номер: US20190013039A1
Принадлежит:

According to some embodiments, an analog processing portion may receive an audio signal from a microphone. The analog processing portion may then convert the audio signal into sub-band signals and estimate an energy statistic value, such as a Signal-to-Noise Ratio (“SNR”) value, for each sub-band signal. A classification element may classify the estimated energy statistic values with analog processing such that a wakeup signal is generated when voice activity is detected. The wakeup signal may be associated with, for example, a battery-powered, always-listening audio application. 1. A voice activity detection device , comprising:an analog processing portion to receive an audio signal from a microphone, convert the audio signal into sub-band signals, and estimate an energy statistic value for each sub-band signal; anda classification element to classify the estimated energy statistic values with analog processing such that a wakeup signal is generated when voice activity is detected, wherein lookback information is provided when the wakeup signal is generated.2. The device of claim 1 , wherein the lookback information is associated with at least one of: (i) extracted noise estimates prior to voice activity claim 1 , (ii) an audio spectrum when the wakeup signal is generated claim 1 , (iii) a free-running buffer claim 1 , (iv) a threshold detector to trigger a buffer claim 1 , (v) an acoustic activity detector to trigger a buffer claim 1 , and (vi) a periodic sampling of a subset of voice activity.3. The device of claim 1 , wherein the estimated energy statistic value is a Signal-to-Noise Ratio (“SNR”) value.4. The device of claim 1 , wherein the wakeup signal is provided to an audio processor which detects a key phrase in the audio signal and claim 1 , as a result claim 1 , wakes up an application processor.5. The device of claim 1 , wherein the wakeup signal is provided to a beamforming microphone array and digital processor to initiate high-performance audio ...

Подробнее
09-01-2020 дата публикации

INTELLIGENT ASSISTANT

Номер: US20200012906A1
Принадлежит: Microsoft Technology Licensing, LLC

Examples are disclosed herein that relate to entity tracking. One examples provides a computing device comprising a logic processor and a storage device holding instructions executable by the logic processor to receive image data of an environment including a person, process the image data using a face detection algorithm to produce a first face detection output at a first frequency, determine an identity of the person based on the first face detection output, and process the image data using another algorithm that uses less computational resources of the computing device than the face detection algorithm. The instructions are further executable to track the person within the environment based on the tracking output, and perform one or more of updating the other algorithm using a second face detection output, and updating the face detection algorithm using the tracking output. 1. A computing device , comprising:a logic processor; and receive image data of an environment including a person;', 'process the image data using a face detection algorithm to produce a first face detection output at a first frequency;', 'select at least one tracking algorithm that uses less computational resources of the computing device than the face detection algorithm, and produces a tracking output at a second frequency greater than the first frequency;', 'process the image data using the at least one tracking algorithm; and', 'track the person within the environment based on the tracking output produced by the at least one tracking algorithm., 'a storage device holding instructions executable by the logic processor to2. The computing device of claim 1 , wherein the instructions are executable to select the at least one tracking algorithm based on available computing resources of the computing device.3. The computing device of claim 1 , wherein the instructions are executable to select the at least one tracking algorithm based on a battery life condition associated with the computing ...

Подробнее
09-01-2020 дата публикации

MODEL TRAINING METHOD, APPARATUS, AND DEVICE, AND DATA SIMILARITY DETERMINING METHOD, APPARATUS, AND DEVICE

Номер: US20200012969A1
Автор: Jiang Nan, ZHAO Hongwei
Принадлежит:

A model training method includes: acquiring a plurality of user data pairs, wherein data fields of two sets of user data in each user data pair have an identical part; acquiring a user similarity corresponding to each user data pair, wherein the user similarity is a similarity between users corresponding to the two sets of user data in each user data pair; determining, according to the user similarity corresponding to each user data pair and the plurality of user data pairs, sample data for training a preset classification model; and training the classification model based on the sample data to obtain a similarity classification model. 1. A model training method , comprising:acquiring a plurality of user data pairs, wherein data fields of two sets of user data in each user data pair have an identical part;acquiring a user similarity corresponding to each user data pair, wherein the user similarity is a similarity between users corresponding to the two sets of user data in each user data pair;determining, according to the user similarity corresponding to each user data pair and the plurality of user data pairs, sample data for training a preset classification model; andtraining the classification model based on the sample data to obtain a similarity classification model.2. The method according to claim 1 , wherein the acquiring the user similarity corresponding to each user data pair comprises:acquiring biological features of users corresponding to a first user data pair, wherein the first user data pair is any user data pair in the plurality of user data pairs; anddetermining a user similarity corresponding to the first user data pair according to the biological features of the users corresponding to the first user data pair.3. The method according to claim 2 , wherein the biological features comprise a facial image feature; acquiring facial images of the users corresponding to the first user data pair; and', 'performing feature extraction on the facial images to ...

Подробнее
14-01-2021 дата публикации

VISION-ASSISTED SPEECH PROCESSING

Номер: US20210012769A1
Принадлежит: SoundHound, Inc.

Systems and methods for processing speech are described. In certain examples, image data is used to generate visual feature tensors and audio data is used to generate audio feature tensors. The visual feature tensors and the audio feature tensors are used by a linguistic model to determine linguistic features that are usable to parse an utterance of a user. The generation of the feature tensors may be jointly configured with the linguistic model. Systems may be provided in a client-server architecture. 1. A client device for processing speech comprising:an audio capture device to capture audio data associated with an utterance from a user;an image capture device to capture frames of image data, the image data featuring an environment of the user;a visual feature extractor to receive the frames of image data from the image capture device and to generate one or more visual feature tensors, the visual feature tensors providing a compressed representation of the frames of image data;an audio feature extractor to receive the audio data from the audio capture device and to generate one or more audio feature tensors; anda transmitter to transmit the visual feature tensors and the audio feature tensors to a server device, the server device being configured to supply at least the visual feature tensors and the audio feature tensors to a linguistic model, the linguistic model being configured to determine linguistic features that are usable to parse the utterance,wherein the visual feature extractor and the audio feature extractor are jointly configured with the linguistic model.2. The client device of claim 1 , wherein one or more of the visual feature extractor and the audio feature extractor comprise a neural network architecture.3. The client device of claim 1 , wherein the visual feature tensors comprise a numeric representation of a visual context for the environment claim 1 , and wherein the transmitter is configured to transmit the audio data to the server device with ...

Подробнее
11-01-2018 дата публикации

ACCOUNT ADDING METHOD, TERMINAL, SERVER, AND COMPUTER STORAGE MEDIUM

Номер: US20180013718A1
Автор: XU Dongcheng
Принадлежит:

An account adding method is performed by a social networking application running at a mobile terminal when communicating with a second terminal (e.g., using a chat session). The method includes: recording voice information from the second terminal using the social networking application; extracting character string information and voiceprint information from the voice information; sending the character string information and the voiceprint information to a server; receiving an account that matches the character string information and the voiceprint information and that is sent by the server; and adding the account to a contact list of the social networking application. For example, the social networking application is started before starting a telephone call with the second terminal and the voice information is recorded during the telephone call. 1. An account adding method performed by a social networking application running at a mobile terminal having one or more processors and memory storing a plurality of programs to be executed by the one or more processors , the method comprising:recording voice information from a second terminal using the social networking application;extracting character string information and voiceprint information from the voice information;sending the character string information and the voiceprint information to a server;receiving an account that matches the character string information and the voiceprint information and that is sent by the server; andadding the account to a contact list of the social networking application.2. The account adding method according to claim 1 , further comprising:generating a chat room with the account by sending a predefined message to the account.3. The account adding method according to claim 1 , wherein the operation of extracting character string information and voiceprint information from the voice information comprises:performing silence suppression on the voice information;performing framing ...

Подробнее
09-01-2020 дата публикации

WORD EXTRACTION DEVICE, RELATED CONFERENCE EXTRACTION SYSTEM, AND WORD EXTRACTION METHOD

Номер: US20200013389A1
Автор: UKAI Satoshi
Принадлежит:

A word extraction method according to at least one embodiment of the present disclosure includes: converting, with at least one processor operating with a memory device in a device, received speech information into text data; converting the text data into a string of words including a plurality of words; extracting, with the at least one processor operating with the memory device in the device, a keyword included in a keyword database from the plurality of words; and calculating, with the at least one processor operating with the memory device in the device, importance levels of the plurality of words based on timing of utterance of the keyword and timing of utterance of each of the plurality of words. 1. A word extraction device , comprising:at least one processor; and convert received speech information into text data;', 'convert the text data into a string of words including a plurality of words;', 'calculate appearance frequencies of the plurality of words; and', 'extract a keyword included in a keyword database from the plurality of words, obtain word-to-be-weighted information about a word to be weighted based on timing of utterance of the keyword and timing of utterance of each of the plurality of words, and calculate importance levels of the plurality of words based on the appearance frequencies and the word-to-be-weighted information., 'at least one memory device configured to store a plurality of instructions, which, when executed by the at least one processor, causes the at least one processor to operate to2. The word extraction device according to claim 1 , wherein the instructions further cause the at least one processor to estimate claim 1 , based on the speech information claim 1 , a location at which a speaker has spoken claim 1 , and to generate location information about the estimated location.3. The word extraction device according to claim 2 , wherein the instructions cause the at least one processor to estimate claim 2 , based on the location ...

Подробнее
09-01-2020 дата публикации

SPEECH WAKEUP METHOD, APPARATUS, AND ELECTRONIC DEVICE

Номер: US20200013390A1
Принадлежит:

A speech wakeup method, apparatus, and electronic device are disclosed in embodiments of this specification. The method includes: inputting speech data to a speech wakeup model trained with general speech data; and outputting, by the speech wakeup model, a result for determining whether to execute speech wakeup, wherein the speech wakeup model includes a Deep Neural Network (DNN) and a Connectionist Temporal Classifier (CTC). 1. A speech wakeup method , comprising:inputting speech data to a speech wakeup model trained with general speech data; andoutputting, by the speech wakeup model, a result for determining whether to execute speech wakeup, wherein the speech wakeup model includes a Deep Neural Network (DNN) and a Connectionist Temporal Classifier (CTC).2. The method of claim 1 , wherein the general speech data comprises a Large Vocabulary Continuous Speech Recognition (LVCSR) corpus.3. The method of claim 1 , further comprising:training the speech wakeup model with the general speech data, wherein the training includes:iteratively optimizing parameters in the speech wakeup model with the general speech data by means of an asynchronous stochastic gradient descent method until the training converges.4. The method of claim 3 , further comprising:acquiring keyword-specific speech data; andtraining the speech wakeup model with the keyword-specific speech data, wherein a learning rate used in the training is less than that used in the training of the speech wakeup model with the general speech data.5. The method of claim 3 , further comprising:cross-verifying the speech wakeup model with a verification data set in the training to determine whether the training converges.6. The method of claim 1 , wherein the outputting claim 1 , by the speech wakeup model claim 1 , a result for determining whether to execute speech wakeup comprises:extracting acoustic features from the input speech data;inputting the acoustic features to the DNN included in the speech wakeup model for ...

Подробнее
09-01-2020 дата публикации

Acoustic information based language modeling system and method

Номер: US20200013391A1
Принадлежит: LG ELECTRONICS INC

Disclosed are a speech data based language modeling system and method. The speech data based language modeling method includes transcription of text data, and generation of a regional dialect corpus based on the text data and regional dialect-containing speech data and generation of an acoustic model and a language model using the regional dialect corpus. The generation of an acoustic model and a language model is performed by machine learning of an artificial intelligence (AI) algorithm using speech data and marking of word spacing of a regional dialect sentence using a speech data tag. A user is able to use a regional dialect speech recognition service which is improved using 5G mobile communication technologies of eMBB, URLLC, or mMTC.

Подробнее
09-01-2020 дата публикации

EMOTION ESTIMATION SYSTEM AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Номер: US20200013428A1
Автор: Luo Xuan
Принадлежит: FUJI XEROX CO., LTD.

An emotion estimation system includes a feature amount extraction unit, a vowel section specification unit, and an estimation unit. The feature amount extraction unit analyzes recorded produced speech to extract a predetermined feature amount. The vowel section specification unit specifies, based on the feature amount extracted by the feature amount extraction unit, a section in which a vowel is produced. The estimation unit estimates, based on the feature amount in a vowel section specified by the vowel section specification unit, an emotion of a speaker. 1. An emotion estimation system comprising:a feature amount extraction unit that analyzes recorded produced speech to extract a predetermined feature amount;a vowel section specification unit that specifies, based on the feature amount extracted by the feature amount extraction unit, a section in which a vowel is produced; andan estimation unit that estimates, based on the feature amount in a vowel section specified by the vowel section specification unit, an emotion of a speaker.2. The emotion estimation system according to claim 1 ,wherein the estimation unit refers to a vowel probability database in which a feature amount pattern is recorded for each type of emotion set in advance for each vowel, to obtain a probability that a feature amount pattern in the vowel section specified by the vowel section specification unit corresponds to each of the types of emotion.3. The emotion estimation system according to claim 2 ,wherein the estimation unit divides the vowel section specified by the vowel section specification unit into a plurality of frames, refers to the vowel probability database to obtain a probability that a feature amount pattern in each of the plurality of frames corresponds to each of the types of emotion, and identifies a type of emotion corresponding to the vowel section, based on the obtained probability for the frame.4. The emotion estimation system according to claim 3 , further comprising:a ...

Подробнее
15-01-2015 дата публикации

COMPUTERIZED INFORMATION APPARATUS

Номер: US20150019226A1
Автор: Gazdzinski Robert F.
Принадлежит:

A computerized information apparatus for providing information to a user of transport device. In one embodiment, the apparatus includes data processing apparatus, speech recognition and synthesis apparatus, and a network interface to enable voice-driven provision of information obtained both locally within the transport device and from a remote source such as a networked server. In one implementation, the information relates to one or more business entities in an area local to the transport device's location. Information can be both displayed and provided to the user audibly in another implementation. 140-. (canceled)41. Computerized information apparatus for use in a personnel transport device , comprising:a wireless network interface;data processing apparatus, the data processing apparatus comprising at least a central processor and a digital signal processor (DSP), at least a portion of the data processing apparatus being in data communication with the wireless interface;a substantially flat-screen display device in data communication with at least a portion of the data processing apparatus;a speech processing apparatus in data communication with at least a portion of the data processing apparatus, the speech processing apparatus configured to receive a representation of a user's voice input, the voice input comprising at least part of a name of a business entity for which the user desires information, and to generate a digital domain output based thereon;a speech synthesis apparatus in data communication with at least a portion of the data processing apparatus and configured to synthesize human-intelligible speech output comprising one or more words;data storage apparatus in data communication with at least a portion of the data processing apparatus;a database of information stored on at least a portion of the data storage apparatus and relating to at least a plurality of business entities, the database further comprising information relating to a location of ...

Подробнее
03-02-2022 дата публикации

AUDIO-SPEECH DRIVEN ANIMATED TALKING FACE GENERATION USING A CASCADED GENERATIVE ADVERSARIAL NETWORK

Номер: US20220036617A1
Принадлежит: TATA CONSULTANCY SERVICES LIMITED

Conventional state-of-the-art methods are limited in their ability to generate realistic animation from audio on any unknown faces and cannot be easily generalized to different facial characteristics and voice accents. Further, these methods fail to produce realistic facial animation for subjects which are quite different than that of distribution of facial characteristics network has seen during training. Embodiments of the present disclosure provide systems and methods that generate audio-speech driven animated talking face using a cascaded generative adversarial network (CGAN), wherein a first GAN is used to transfer lip motion from canonical face to person-specific face. A second GAN based texture generator network is conditioned on person-specific landmark to generate high-fidelity face corresponding to the motion. Texture generator GAN is made more flexible using meta learning to adapt to unknown subject's traits and orientation of face during inference. Finally, eye-blinks are induced in the final animation face being generated. 1. A processor implemented method for generating audio-speech driven animated talking face using a cascaded generative adversarial network , the method comprising:obtaining, via one or more hardware processors, an audio speech and a set of identity images (SI) of a target individual;extracting, via the one or more hardware processors, one or more DeepSpeech features of the target individual from the audio speech;generating, using the extracted DeepSpeech features, via a first generative adversarial network (FGAN) of a cascaded GAN executed by the one or more hardware processors, a speech-induced motion (SIM) on a sparse representation (SR) of a neutral mean face, wherein the SR of the SIM comprises a plurality of facial landmark points with one or more finer deformations of lips;generating, via the one or more hardware processors, a plurality of eye blink movements from random noise input learnt from a video dataset, wherein the ...

Подробнее
03-02-2022 дата публикации

METHOD AND ELECTRONIC DEVICE FOR PROVIDING SIGN LANGUAGE

Номер: US20220036625A1
Принадлежит:

A method for providing sign language is disclosed. The method includes receiving, by an electronic device, a natural language information input from at least one source for conversion into sign language. The natural language information input includes at least one sentence. The method further includes predicting, by the electronic device, an emphasis score for each word of the at least one sentence based on acoustic components. The method further includes rephrasing, by the electronic device, the at least one sentence based on the emphasis score of each of the words. The method further includes converting, by the electronic device, the at least one rephrased sentence into the sign language. The method further includes delivering, by the electronic device, the sign language. 1. A method for providing sign language , the method comprising:receiving, by an electronic device, a natural language information input from at least one source for conversion into the sign language, wherein the natural language information input comprises at least one sentence;predicting, by the electronic device, an emphasis score for each word of the at least one sentence based on acoustic components;rephrasing, by the electronic device, the at least one sentence based on the emphasis score of each of the words;converting, by the electronic device, the at least one rephrased sentence into sign language; anddelivering, by the electronic device, the sign language.2. The method as claimed in claim 1 , wherein the method comprises:determining, by the electronic device, a sound direction corresponding to at least one word from the plurality of words of the at least one input sentence; andautomatically displaying, by the electronic device, an indication indicating the sound direction while delivering the at least one word of the at least one rephrased sentence in the sign language.3. The method as claimed in claim 1 , wherein predicting claim 1 , by the electronic device claim 1 , an emphasis score ...

Подробнее
21-01-2016 дата публикации

SYSTEMS AND METHODS FOR SPEECH ANALYTICS AND PHRASE SPOTTING USING PHONEME SEQUENCES

Номер: US20160019882A1
Принадлежит:

A contact center system can receive audio messages. The system can review audio messages by identifying phoneme strings within the audio messages associated with a characteristic. A phoneme can be a component of spoken language. Identified phoneme strings are used to analyze subsequent audio messages to determine the presence of the characteristic without requiring human analysis. Thus, the identification of phoneme strings then can be used to determine a characteristic of audio messages without transcribing the messages. 1. A method for determining a characteristic in an audio message , the method comprising:determining a phoneme in the audio message having a predetermined characteristic;identifying a first phoneme string in the audio message, wherein the phoneme string includes the phoneme, and wherein the first phoneme string is associated with the predetermined characteristic; andbased on the identification of the first phoneme string, determining that the first phoneme string indicates the characteristic.2. The method as defined in claim 1 , further comprising:receiving a second message; andidentifying the first phoneme string within the second message.3. The method as defined in claim 2 , further comprising determining statistical information about the first phoneme string.4. The method as defined in claim 3 , wherein the statistical information includes a confidence score that the first phoneme string indicates the characteristic.5. The method as defined in claim 4 , wherein the characteristic may be a sentiment claim 4 , and wherein the sentiment may be positive or negative.6. The method as defined in claim 4 , further comprising:receiving a new set of audio messages;identifying a second phoneme string in the new set of audio messages, wherein the second phoneme string includes a second phoneme, and wherein the second phoneme string is associated with a second characteristic;comparing the second phoneme string in the new set of audio messages with at least ...

Подробнее
19-01-2017 дата публикации

SPEECH RECOGNITION APPARATUS AND METHOD

Номер: US20170018270A1
Автор: MIN Yun Hong
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

A speech recognition apparatus includes a converter configured to convert a captured user speech signal into a standardized speech signal format, one or more processing devices configured to apply the standardized speech signal to an acoustic model, and recognize the user speech signal based on a result of application to the acoustic model. 1. A speech recognition apparatus , comprising:a converter configured to convert a captured user speech signal into a standardized speech signal format; apply the standardized speech signal to an acoustic model; and', 'recognize the user speech signal based on a result of application to the acoustic model., 'one or more processing devices configured to2. The speech recognition apparatus of claim 1 , wherein the format of the standardized speech signal includes a format of a speech signal that is generated using text-to-speech (TTS).3. The speech recognition apparatus of claim 1 , wherein the converter includes at least one of the following neural network models: autoencoder claim 1 , deep autoencoder claim 1 , denoising autoencoder claim 1 , recurrent autoencoder claim 1 , and a restricted Boltzmann machine (RBM) to convert the captured user speech signal into the standardized speech signal format.4. The speech recognition apparatus of claim 1 , wherein the converter is further configured to segment the user speech signal into a plurality of frames claim 1 , extract k-dimensional feature vectors from each of the frames claim 1 , and convert the extracted feature vectors into the standardized speech signal format.5. The speech recognition apparatus of claim 4 , wherein the standardized speech signal format includes at least one form of a mel-scale frequency cepstral coefficient (MFCC) feature vector and a filter bank claim 4 , and contains either or both of the number of frames and information regarding a dimension.6. The speech recognition apparatus of claim 1 , wherein the acoustic model is includes at least one of Gaussian ...

Подробнее
03-02-2022 дата публикации

A METHOD AND A DEVICE FOR PROVIDING A PERFORMANCE INDICATION TO A HEARING AND SPEECH IMPAIRED PERSON LEARNING SPEAKING SKILLS

Номер: US20220036751A1
Автор: Singh Shomeshwar
Принадлежит:

The present invention describes a technique for providing a performance indication to a hearing and speech impaired person learning speaking skills. The technique comprises selecting a phoneme from a plurality of phonemes displayed on a display device; receiving a phoneme produced by the hearing and speech impaired person on a microphone; creating a first mathematical representation for the selected phoneme; creating a second mathematical representation for the received phoneme; generating a first visual equivalent representing the selected phoneme based on the first mathematical model; generating a second visual equivalent representing the received phoneme based on the second mathematical model; displaying the first visual equivalent and the second visual equivalent on the display device for the hearing and speech impaired person to compare; comparing the first mathematical representation and second mathematical representation; generating a performance indication based on result of a comparison of the first mathematical representation and second mathematical representation. 1. A method for providing a performance indication to a hearing and speech impaired person learning speaking skills , the method comprising:selecting, on a display device, a phoneme;receiving, at a microphone, a phoneme produced by the hearing and speech impaired person;creating a first mathematical model off the selected phoneme;creating a second mathematical model off the received phoneme;generating a first visual equivalent representing the selected phoneme based on the first mathematical model;generating a second visual equivalent representing the received phoneme based on the second mathematical model;displaying the first visual equivalent and the second visual equivalent on the display device for the hearing and speech impaired person to compare;comparing the first mathematical model and second mathematical model;generating a performance indication based on a result of the comparison of ...

Подробнее
21-01-2016 дата публикации

VOICE SIGNAL MODULATION SERVICE FOR GEOGRAPHIC AREAS

Номер: US20160019912A1
Принадлежит:

Modulating a voice signal is provided. The voice signal corresponding to a voice communication is received from a sending voice communication device via a network. Voice signal features corresponding to the voice communication are extracted. A set of voice signal filters are selected to modulate the extracted voice signal features corresponding to the voice communication to an average voice signal associated with a geographic area where the voice communication is destined for. The voice signal features corresponding to the voice communication are modulated by applying the selected set of voice signal filters to generate the average voice signal associated with the geographic area where the voice communication is destined for.

Подробнее
03-02-2022 дата публикации

SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION SYSTEM, AND SPEECH RECOGNITION METHOD

Номер: US20220036877A1
Автор: BABA Naoya, KOJI Yusuke
Принадлежит: Mitsubishi Electric Corporation

A speech signal processing unit individually separates uttered speech of a plurality of passengers each seated in one of a plurality of speech recognition target seats in a vehicle. A speech recognition unit performs speech recognition on uttered speech of each of the passengers separated by the speech signal processing unit and calculates a speech recognition score. A score-using determining unit determines a speech recognition result of which of the passengers is to be used from among speech recognition results for the passengers, using the speech recognition score of each of the passengers. 112.-. (canceled)13. A speech recognition device comprising:processing circuitry toindividually separate uttered speech of a plurality of passengers each seated in one of a plurality of speech recognition target seats in a vehicle;perform speech recognition on the separated uttered speech of each of the passengers calculate a speech recognition score;comprehend intention of utterance of each of the passengers and calculate an intention comprehension score using a speech recognition result of each of the passengers; anddetermine an intention comprehension result of which of the passengers is to be used from among intention comprehension results for the respective passengers, using at least one of the speech recognition score or the intention comprehension score of each of the passengers;calculate a face feature amount for each of the passengers using an image capturing the plurality of passengers; anddetermine whether or not there is utterance for each of the passengers using the face feature amount from a start time to an end time of the uttered speech of each of the passengers,wherein, in a case where there are identical intention comprehension results that correspond to two or more passengers determined to be speaking, the processing circuitry determines an intention comprehension result of which of the passengers is adopted from among the intention comprehension results ...

Подробнее
03-02-2022 дата публикации

SPEECH ASSESSMENT USING DATA FROM EAR-WEARABLE DEVICES

Номер: US20220036878A1
Принадлежит:

A computing system may store user profile information of a user of an ear-wearable device, where the user profile information includes parameters that control operation of the ear-wearable device. The computing system may also obtain audio data from one or more sensors that are included in the ear-wearable device and determine whether to generate speech assessment data based on the user profile information of the user and audio data. In some examples, the computing system may compare one or more acoustic parameters determined based on the audio data with an acoustic criterion determined based on the user profile information of the user. If one or more acoustic parameters satisfy the acoustic criterion, the computing system may generate speech assessment data based on the determination. 1. A method comprising:storing user profile information of a user of an ear-wearable device, wherein the user profile information comprises parameters that control operation of the ear-wearable device;obtaining audio data from one or more sensors that are included in the ear-wearable device;determining whether to generate speech assessment data based on the user profile information of the user and the audio data, wherein the speech assessment data provides information regarding speech of the user; andgenerating the speech assessment data based on the determination to generate the speech assessment data.2. The method of claim 1 , wherein determining whether to generate the speech assessment data based on the user profile information of the user and the audio data further comprises:determining whether to generate speech assessment data based on sensor data or location data.3. The method of claim 1 , wherein determining whether to generate the speech assessment data based on the user profile information of the user and the audio data comprises:determining one or more acoustic parameters based on the audio data;determining an acoustic criterion based on the user profile information of the ...

Подробнее
03-02-2022 дата публикации

METHOD AND APPARATUS FOR MINING FEATURE INFORMATION, AND ELECTRONIC DEVICE

Номер: US20220036879A1

A method for mining feature information, an apparatus for mining feature information and an electronic device are disclosed. The method includes: determining a usage scenario of a target device; obtaining raw audio data including real scenario data, speech synthesis data, recorded audio data and other media data; generating target audio data of the usage scenario by simulating the usage scenario based on the raw audio data; and obtaining feature information of the usage scenario by performing feature extraction on the target audio data. 1. A method for mining feature information , comprising:determining a usage scenario of a target device;obtaining raw audio data including real scenario data, speech synthesis data, recorded audio data and other media data;generating target audio data of the usage scenario by simulating the usage scenario based on the raw audio data; andobtaining feature information of the usage scenario by performing feature extraction on the target audio data.2. The method according to claim 1 , wherein generating the target audio data of the usage scenario by simulating the usage scenario based on the raw audio data comprises:obtaining scenario audio data of the usage scenario; andgenerating the target audio data by adding the scenario audio data to the raw audio data.3. The method according to claim 2 , wherein the scenario audio data comprises spatial reverberation data of the usage scenario claim 2 , and obtaining the scenario audio data of the usage scenario comprises:obtaining attribute information of the usage scenario;obtaining state information of the target device in the usage scenario and device information of the target device; andgenerating the spatial reverberation data of the usage scenario based on the attribute information, the state information and the device information.4. The method according to claim 2 , wherein the scenario audio data comprises environmental noise data of the usage scenario claim 2 , and obtaining the scenario ...

Подробнее