Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 5613. Отображено 200.
10-01-2017 дата публикации

УСТРОЙСТВО СИНТЕЗА РЕЧИ

Номер: RU2606312C2

Изобретение направлено на формирование речевого сигнала с заданными характеристиками. Технический результат заключается в повышении естественности, разборчивости речи и ее эмоциональной окраски при генерации речевого сигнала при уменьшении вычислительной сложности за счет усовершенствования структуры элементов системы синтеза речи и разделения используемых баз данных. Устройство синтеза речи на основе текста, содержит блок, принимающий текстовую строку и формирующий из нее последовательность идентификаторов звуковых единиц, подаваемую в блок формирования звукового сигнала, а затем в блок воспроизведения. В блок формирования звукового сигнала параллельно подаются дополнительные команды управления, вырабатываемые из сформированной последовательности идентификаторов звуковых единиц блоком формирования управляющих команд, зависящих от языка, блоком формирования управляющих команд, зависящих от моделируемых параметров диктора, а именно параметров голоса и/или его физических и физиологических ...

Подробнее
10-05-2001 дата публикации

СПОСОБ ПРЕОБРАЗОВАНИЯ РЕЧИ И УСТРОЙСТВО ДЛЯ ЕГО ОСУЩЕСТВЛЕНИЯ

Номер: RU2166804C2

Использование: в средствах цифрового кодирования речевых сигналов для их комплексного представления в целях передачи и хранения. Сущность изобретения: способ преобразования речи, основанный на векторных квантовании и деквантовании, снабжен дополнительным управлением с помощью сигнала тон/шум, который используют в качестве управляющего параметра: Т/Ш, Т/Ш* соответственно. Устройство преобразования речи и варианты практической реализации блоков в составе векторного квантователя с конечным числом состояний и векторного деквантователя с конечным числом состояний, обеспечивающие дополнительное управление сигналом тон/шум в процедуре векторного квантования и деквантования, что позволяет разделить множество эталонных кодовых векторов на два подмножества, соответствующие вокализованным и невокализованным фрагментам речевого сигнала, что повышает качество синтезированного речевого сигнала без увеличения битовой скорости передачи и приводит к более точному описанию последовательности переходов речевого ...

Подробнее
07-05-2018 дата публикации

СПОСОБ ДИАЛОГА МЕЖДУ МАШИНОЙ, ТАКОЙ КАК ГУМАНОИДНЫЙ РОБОТ, И СОБЕСЕДНИКОМ-ЧЕЛОВЕКОМ, КОМПЬЮТЕРНЫЙ ПРОГРАММНЫЙ ПРОДУКТ И ГУМАНОИДНЫЙ РОБОТ ДЛЯ ОСУЩЕСТВЛЕНИЯ ТАКОГО СПОСОБА

Номер: RU2653283C2

Изобретение относится к способу диалога между машиной и человеческим существом. Технический результат заключается в обеспечении более естественного диалога с машиной, адаптированного к собеседнику (не стереотипного). Такой результат достигается за счет того, что идентифицируют собеседника-человека; извлекают из базы данных профиль собеседника, содержащий множество переменных диалога, по меньшей мере одно значение присвоено по меньшей мере одной из упомянутых переменных диалога; получают и анализируют по меньшей мере одну фразу от упомянутого собеседника и формулируют и выдают по меньшей мере одну ответную фразу в зависимости от по меньшей мере упомянутой фразы, полученной и интерпретированной на предыдущем этапе и от упомянутой переменной диалога упомянутого профиля собеседника, при этом анализ упомянутой фразы от упомянутого собеседника и формулировку упомянутой ответной фразы осуществляют посредством множества моделей фраз, представленных соответствующими синтаксическими деревьями. 2 ...

Подробнее
14-11-2023 дата публикации

СИСТЕМА ИНТЕРАКТИВНЫХ РЕЧЕВЫХ СИМУЛЯЦИЙ

Номер: RU2807436C1

Изобретение относится к области вычислительной техники для интерактивных речевых симуляций. Технический результат заключается в повышении скорости и точности оценки ответов пользователя во время речевой симуляции. Технический результат достигается за счет того, что система содержит: редактор для создания симуляций, с помощью которого создают симуляции, представляющие собой диалог между персонажем и пользователем, создают словарь стоп-слов, задают список возможных вариантов ответа пользователя, задают настройки симуляции, добавляют основные сценарии, добавляют дополнительные сценарии, добавляют триггерные фразы основного сценария, по которым активируется переход на дополнительный сценарий; плеер воспроизведения симуляций, с помощью которого автоматически воспроизводят заданный в редакторе диалог между персонажем и пользователем в соответствии с заданными настройками симуляции; блок сбора пользовательских данных, с помощью которого автоматически осуществляют сбор пользовательских данных в ...

Подробнее
27-10-1998 дата публикации

СПОСОБ ПОСТФИЛЬТРАЦИИ ОСНОВНОГО ТОНА СИНТЕЗИРОВАННОЙ РЕЧИ И ПОСТФИЛЬТР ОСНОВНОГО ТОНА

Номер: RU2121173C1
Принадлежит: Аудиокоудс, Лтд. (IL)

В соответствии с настоящим изобретением синтезированную речь пропускают через постфильтр, который производит вычисления на основании будущих и предшествующих данных. Кадры данных разделены на субкадры для назначения точек вычисления. Технический результат заключается в улучшении соответствия синтезированной и исходной речи. 2 с. и 8 з.п. ф-лы. 3 ил.

Подробнее
27-05-2002 дата публикации

СПОСОБ ОБНАРУЖЕНИЯ И КОРРЕКЦИИ ЛОЖНЫХ ИМПУЛЬСОВ ПРИ ПЕРЕДАЧЕ РЕЧИ МЕТОДОМ ИМПУЛЬСНО-КОДОВОЙ МОДУЛЯЦИИ (ИКМ)

Номер: RU2000116679A
Принадлежит:

Способ обнаружения и коррекции ложных импульсов при передаче речи методом ИКМ, заключающийся в отслеживании идущих друг за другом повышенных сдвоенных (разного знака) "скачков" уровня первой производной речевого сигнала на основе запоминания предшествующего данному отрезка речевого сигнала, измерения среднего модуля первой производной сигнала на протяжении этого отрезка путем накопления модулей разности двух соседних ИКМ-отсчетов сигнала и установке двух одинаковых адаптивных порогов разного знака, пропорциональных среднему модулю первой производной, превышение которых подряд двумя текущими разностями соседних ИКМ-отсчетов сигнала позволяет обнаружить кодовую комбинацию, содержащую ложный бит, отличающийся тем, что относительный уровень двух адаптивных порогов разного знака определяется также характером сигнала (речь или шум в паузах речи), при этом паузы речи отслеживаются по заниженному среднему модулю сигнала на интервале анализа, относительный уровень двух адаптивных порогов в моменты ...

Подробнее
20-08-2009 дата публикации

СИНТЕЗ АУДИОСИГНАЛА

Номер: RU2008105555A
Принадлежит:

... 1. Устройство (20) синтеза сигнала для синтезирования аудиосигнала (r'), содержащее: ! блок (23) синтеза синусоид для синтезирования аудиосигнала (r') с помощью по меньшей мере одного частотного параметра (f), представляющего частоту аудиосигнала, и по меньшей мере одного фазового параметра (ϕ'), представляющего фазу аудиосигнала, отличающееся тем, что содержит ! блок (22) выработки параметров для выработки фазового параметра (ϕ') с помощью частотного параметра (f') и аудиосигнала (r'). ! 2. Устройство по п.1, в котором синтезированный аудиосигнал (r') содержит временные сегменты, и при этом блок (22) выработки параметров выполнен с возможностью выработки текущего фазового параметра (ϕ') с помощью предыдущего временного сегмента аудиосигнала (r'). ! 3. Устройство по п.1, в котором блок (22) выработки параметров содержит блок (21') нахождения фазы, выполненный с возможностью нахождения набора пар фаза-частота, причем каждая пара фаза-частота представляет фазу частоты аудиосигнала (r'). !

Подробнее
05-02-1975 дата публикации

Устройство для синтеза речи

Номер: SU459797A1
Принадлежит:

Подробнее
15-08-1977 дата публикации

Устройство для синтеза речи

Номер: SU568962A1
Принадлежит:

Подробнее
15-03-1984 дата публикации

Цифровой синтезатор речи

Номер: SU1080198A2
Принадлежит:

ЦИФРОВОЙ СИНТЕЗАТОР РЕЧИ ПО авт.св. № 993315, отличающийся тем, что, с целью повышения качества синтезированной речи, в него введены второй мультиплексор, второй регистр сдвига параллельного кода, третий и четвертый блоки. вычисления скалярного произведения, при этом первый вход второго мультиплексора соединен с выходом второго блока вычисления скалярного произведения, выход второго мультиплексора соединен с первым входом второго регистра сдвига параллельного кода, первый выход которого соединен с вторьм входом второго мультиплексора, второй выход второго регистра сдвига параллельного кода соединен с первым входом третьего блока вычисления скалярного произведения , выход которого соединен с первым входом четвертого блока вычисления скалярного произведения, выход которого соединен с входом цифроаналогового преобразователя,выход генератора тактовых сигналов соединен с вторым входом второго регистра сдвига параллельного кода и третьими входами второго мульти- о плексора, третьего и четвертого ...

Подробнее
15-03-1987 дата публикации

Цифровой синтезатор речи

Номер: SU1297098A1
Принадлежит:

Изобретение относится к технике обработки сигналов для речевой информатики и синтеза речевых сообщений для абонентов телефонных связей.Цель изобретения - повышение качества звучания речи, синтезируемой под контролем компьютера, обслуживающего сеть цифровых абонентских синтезаторов речи. Устройство содержит интер- . фейс 1i логическое устройство 2, мультиплексоры 3, 4, 5, запоминающие устройства 6 и 7, дискретный возбудитель 8, запоминающие устройства 9 и 10 цифрового фильтра, регистры 11 и 12, цифровые умножители 12 и 15 и преобразователь 16, Введение новых элементов и образование новых связей между элементами устройства позволило ослабить искажение речевого сиг- . нала при окончании набора параметров этого сигнала, записанных в запоминающем устройстве 7 и формируемых из совокупности приращений вектора управления ,, пересьшаемых кодами пониженной разрядности компьютером, управляющим синтезом через интерфейс 1, логическое устройство 2, синхронизируемое этим же компьютером, и. мультиплексор ...

Подробнее
07-11-1980 дата публикации

Синтезатор речевых сигналов

Номер: SU777674A1
Принадлежит:

Подробнее
07-04-1984 дата публикации

Синтезатор речи

Номер: SU1084870A1
Принадлежит:

СИНТЕЗАТОР РЕЧИ, содержащий переключатель «тон-шум с входом управления от выделителя сигнала «тоншум , генератор основного тона с входом управления от выделителя частоты основного тона и дополнительным входом, генератор шума и трехканальный динамический формирователь огибающей мгновенного спектра с входами управления по частоте и амплитуде от выделителя формантных параметров, при этом выходы генератора основного тона и генератора шума подсоединены к переключателю «тон-шум, соединенному выходом с соответствующими входами возбуждения трехканального динамического формирователя огибающей мгновенного спектра, отличающийся тем, что, с целью повышения качества синтезированной речи, трехканальный динамический формирователь огибающей мгновенного спектра содержит в каждом канале последовательно соединенные управляемый фильтр и модулятор, выходы которых соединены с входами сумматора, а выход каждого управляемого фильтра соединен также через соответствующий резистор с дополниФ тельным входом генератора ...

Подробнее
28-03-2013 дата публикации

Korrigieren unverständlicher synthetischer Sprache

Номер: DE102012217160A1
Принадлежит:

Ein Verfahren und ein System für die Sprachsynthese. In einem Text-zu-Sprache-System wird eine Texteingabe empfangen und die Texteingabe wird unter Verwendung eines Prozessors des Systems zu synthetischer Sprache verarbeitet, von der festgestellt wird, dass sie unverständlich ist. Die Texteingabe wird zu nachfolgender synthetischer Sprache erneut verarbeitet und über einen Lautsprecher an einen Anwender ausgegeben, um die unverständliche synthetische Sprache zu korrigieren. In einer Ausführungsform kann festgestellt werden, dass die synthetische Sprache unverständlich ist, indem die Verständlichkeit der synthetischen Sprache vorhergesagt wird und bestimmt wird, dass die vorhergesagte Verständlichkeit niedriger als ein Minimumschwellenwert ist. In einer anderen Ausführungsform kann festgestellt werden, dass die synthetische Sprache unverständlich ist, indem die synthetische Sprache über den Lautsprecher an den Anwender ausgegeben wird und von dem Anwender eine Angabe empfangen wird, dass ...

Подробнее
06-11-2008 дата публикации

Navigationsvorrichtung für ein Fahrzeug

Номер: DE0019540864B4

Подробнее
25-06-2009 дата публикации

Kodierer mit sanften Tonübergängen

Номер: DE0069938490T2

Подробнее
02-10-2019 дата публикации

Generative Modellierung von neuronalen Netzen zum Transformieren von Sprachäußerungen und Erweitern von Trainingsdaten

Номер: DE102019107928A1
Принадлежит:

Diese Offenbarung stellt generative Modellierung von neuronalen Netzen zum Transformieren von Sprachäußerungen und Erweitern von Trainingsdaten bereit. Es werden Systeme, Verfahren und Vorrichtungen zur Sprachtransformation und zum Erzeugen von synthetischer Sprache unter Verwendung von tiefen generativen Modellen offenbart. Ein Verfahren der Offenbarung beinhaltet Empfangen von Eingabeaudiodaten, die eine Vielzahl von Iterationen einer Sprachäußerung von einer Vielzahl von Sprechern umfassen. Das Verfahren beinhaltet Erzeugen eines Eingabespektrogramms auf Grundlage der Eingabeaudiodaten und Übertragen des Eingabespektrogramms an ein neuronales Netz, das dazu konfiguriert ist, ein Ausgabespektrogramm zu erzeugen. Das Verfahren beinhaltet Empfangen des Ausgabespektrogramms von dem neuronalen Netz und auf Grundlage des Ausgabespektrogramms Erzeugen von synthetischen Audiodaten, die die Sprachäußerung umfassen.

Подробнее
14-11-2002 дата публикации

System zum Vorlesen von Text

Номер: DE0069621404T2

Подробнее
10-03-1999 дата публикации

Housing for pager receiver

Номер: GB0002293506B
Принадлежит: NEC CORP, * NEC CORPORATION

Подробнее
31-05-1973 дата публикации

AUDIO RESPONSE APPARATUS

Номер: GB0001318985A
Автор:
Принадлежит:

... 1318985 Speech synthesizer NIPPON TELEGRAPH & TELEPHONE PUBLIC CORP 9 Oct 1970 [7 Feb 1970 (2)] 48065/70 Heading H4R In an apparatus for synthesizing speech from samples stored, e.g., on a magnetic drum, which are extracted under the control of a computer to provide continuous speech, e.g. for answering calls in a telephone system, the speech samples are stored in the form of partial autocorrelation coefficients between two closely adjacent time instants of the speech signal, which define the spectrum function of the speech, and autocorrelation coefficients between more remotely separated samples which provide excitation source information. The partial autocorrelation coefficients are obtained by correlation of the differences between estimated values of the signal level and the actual signal level at two instants. The estimated values of the signal level are determined from sampled values of the signal level obtained between the two instants using the least squares method. As described ...

Подробнее
28-07-1982 дата публикации

VOICE SYNTHESIZER

Номер: GB0002057823B
Автор:
Принадлежит: FEDERAL SCREW WORKS

Подробнее
15-10-2009 дата публикации

AUDIO SIGNAL SYNTHESIS

Номер: AT0000443318T
Принадлежит:

Подробнее
15-02-2010 дата публикации

LANGUAGE DIFFERENTIATION

Номер: AT0000456845T
Автор: HAERMAE AKI, HAERMAE, AKI
Принадлежит:

Подробнее
15-10-2009 дата публикации

SPEECH SYNTHESIS

Номер: AT0000443908T
Принадлежит:

Подробнее
15-08-2010 дата публикации

METHODE ZUR TRENNUNG VON SIGNALPFADEN UND ANWENDUNG AUF DIE VERBESSERUNG VON SPRACHE MIT ELEKTRO-LARYNX

Номер: AT0000507844A1
Принадлежит:

In order to improve the speech quality of an electric larynx (EL) speaker, the speech signal of which is digitized by suitable means, the following steps are carried out: a) dividing a single-channel speech signal into a series of frequency channels by transferring it from a time domain into a discrete frequency domain; b) filtering out the modulation frequency of the EL by way of a high-pass or notch filter, in each frequency channel; and c) back-transforming the filtered speech signal from the frequency domain into the time domain and combining it into a single-channel output signal.

Подробнее
15-11-2010 дата публикации

METHODE ZUR TRENNUNG VON SIGNALPFADEN UND ANWENDUNG AUF DIE VERBESSERUNG VON SPRACHE MIT ELEKTRO-LARYNX

Номер: AT0000507844B1
Принадлежит:

In order to improve the speech quality of an electric larynx (EL) speaker, the speech signal of which is digitized by suitable means, the following steps are carried out: a) dividing a single-channel speech signal into a series of frequency channels by transferring it from a time domain into a discrete frequency domain; b) filtering out the modulation frequency of the EL by way of a high-pass or notch filter, in each frequency channel; and c) back-transforming the filtered speech signal from the frequency domain into the time domain and combining it into a single-channel output signal.

Подробнее
04-11-2021 дата публикации

Method for extracting speech from degraded signals by predicting the inputs to a speech vocoder

Номер: AU2020242078A1
Принадлежит:

A method for Parametric resynthesis (PR) producing an audible signal. A degraded audio signal is received which includes a distorted target audio signal. A prediction model predicts parameters of the audible signal from the degraded signal. The prediction model was trained to minimize a loss function between the target audio signal and the predicted audible signal. The predicted parameters are provided to a waveform generator which synthesizes the audible signal.

Подробнее
30-08-2007 дата публикации

Navigation device and method for receiving and playing sound samples

Номер: AU2007218375A1
Принадлежит:

Подробнее
03-03-2003 дата публикации

MASSIVELY ONLINE GAME COMPRISING A VOICE MODULATION AND COMPRESSION SYSTEM

Номер: AU2002322947A1
Автор: MORGAN, Olivier
Принадлежит:

Подробнее
17-01-2008 дата публикации

Printed Material Reader

Номер: AU2007201706A1
Автор:
Принадлежит:

Подробнее
22-02-1977 дата публикации

VOICE SYNTHESIZER

Номер: CA0001005913A1
Автор: GAGNON RICHARD T
Принадлежит:

Подробнее
25-11-1986 дата публикации

WAVE GENERATING METHOD AND APPARATUS USING SAME

Номер: CA1214559A

A wave generating method and a wave generating apparatus using the method are arranged such that plurality of wave samples, each being generated successively, are respectively weighted by, for example, being multiplied by a plurality of wave functions generated corresponding to the plurality of wave samples. The plurality of weighted wave samples are summed to obtain a desired wave. The kind of each of the plurality of wave samples generated successively is changed at each time when the value of corresponding one of the plurality of wave functions becomes zero. Therefore, the apparatus includes wave generators for generating the wave samples successively, wave function generators for generating the wave functions successively, multipliers for multiplying the wave samples by the wave functions respectively, an adder for adding all of the outputs of the multipliers to generate the desired wave, and a wave changing circuit for changing the kind of each of the wave samples when the corresponding ...

Подробнее
22-02-1977 дата публикации

VOICE SYNTHESIZER

Номер: CA1005913A
Автор:
Принадлежит:

Подробнее
29-05-2007 дата публикации

VOICE ACTIVATED LANGUAGE TRANSLATION

Номер: CA0002419112C
Автор: LORD, JOHN RAYMOND
Принадлежит: MITEL NETWORKS CORPORATION

A voice activated language translation system that is accessed by telephones where voice messages of a caller are translated into a selected language and returned to the caller or optionally sent to another caller. A voice recognition system converts the voice messages into text of a first language. The text is then translated into text of the selected language. The text of the selected language is then converted into voice.

Подробнее
20-04-1995 дата публикации

A METHOD FOR TRAINING A TEXT TO SPEECH SYSTEM, THE RESULTING APPARATUS, AND METHOD OF USE THEREOF

Номер: CA0002151399A1
Принадлежит:

A method of training a TTS (104) to assign intonational features, such as intonational phrase boundaries, to input text (110). The method of training involves taking a set of predetermined text (110) and having a human annotate it with intonational feature annotations. The text is passed through the preprocessor (120) and the phrasing module (122) wherein a set of decision nodes is generated by statistically analyzing information based upon the structure of the predetermined text. The statistical representation may then be stored and repeatedly used to generate synthesized speech, through the post processor (124), from new sets of input text without further training.

Подробнее
17-10-1996 дата публикации

WAVEFORM SPEECH SYNTHESIS

Номер: CA0002189666A1
Принадлежит:

Portions of speech waveform are joined by forming extrapolations at the end of one and the beginning of the next portion to create an overlap region with synchronous pitchmarks, and then forming a weighted sum across the overlap to provide a smooth transition.

Подробнее
27-02-1997 дата публикации

SPEECH SYNTHESIZER HAVING AN ACOUSTIC ELEMENT DATABASE

Номер: CA0002222582A1
Принадлежит:

A speech synthesis method employs an acoustic element database that is established from phonetic sequences occurring in an interval of a speech signal in establishing the database, trajectories are determined (220) for each of the phonetic sequences containing a phonetic segment that corresponds to a particular phoneme (210). A tolerance region is then identified based on a concentration of trajectories that correspond to different phoneme sequences (230). The acoustic elements for the database (260) are formed from portions of the phonetic sequences by identifying cut points (250) in the phonetic sequences which corespond to time points along the respective trajectories proximate the tolerance region (240). In this manner, it is possible to concatenate the acoustic elements having a common junction phonemes such that perceptible discontinuities at the junction phonemes are minimized. Computationally simple and fast methods for determining the tolerance region are also disclosed.

Подробнее
15-10-1981 дата публикации

SPRACHSYNTHETISATOR.

Номер: CH0000625900A5
Автор: MARK VINCENT DORAIS
Принадлежит: FEDERAL SCREW WORKS

Подробнее
03-11-2020 дата публикации

Speech synthesis and feature extraction model training method, device, medium and equipment

Номер: CN0111883107A
Автор:
Принадлежит:

Подробнее
25-12-2020 дата публикации

Cooking stove control method, device, stove and storage medium

Номер: CN0112128810A
Автор:
Принадлежит:

Подробнее
12-06-2020 дата публикации

Method for postal news voice synthesis and terminal thereof

Номер: CN0111276126A
Автор:
Принадлежит:

Подробнее
16-11-2011 дата публикации

Speech adaptation in speech synthesis

Номер: CN0102243870A
Принадлежит:

The invention relates to a speech adaptation in speech synthesis, especially to a method of and system for speech synthesis. First and second text inputs are received in a text-to-speech system, and processed into respective first and second speech outputs corresponding to stored speech respectively from first and second speakers using a processor of the system. The second speech output of the second speaker is adapted to sound like the first speech output of the first speaker.

Подробнее
19-05-2020 дата публикации

Method and system for calling telephone based on order

Номер: CN0111178807A
Автор:
Принадлежит:

Подробнее
09-02-2018 дата публикации

Smart voice cell phone or smart voice tablet computer

Номер: CN0107680595A
Автор: YU YANXING
Принадлежит:

Подробнее
02-11-2005 дата публикации

Speech synthesis apparatus with personalized speech segments

Номер: CN0001692403A
Принадлежит:

Подробнее
10-07-2020 дата публикации

Voice processing method, device, readable medium and electronic device

Номер: CN0111402856A
Автор:
Принадлежит:

Подробнее
18-09-2020 дата публикации

Method, device, equipment and medium for voice real-time cloning based on small sample

Номер: CN0111681635A
Автор:
Принадлежит:

Подробнее
30-05-2007 дата публикации

Prosodic control rule generation method and apparatus, and speech synthesis method and apparatus

Номер: CN0001971708A
Автор: XU DAWEI, DAWEI XU
Принадлежит:

Подробнее
11-12-2020 дата публикации

Recording method and device of voice data

Номер: CN0106710597B
Автор:
Принадлежит:

Подробнее
10-07-2013 дата публикации

Frequency axis elastic coefficient estimation device and system method

Номер: CN101809652B
Автор: EMORI TADASHI
Принадлежит:

Подробнее
16-05-2023 дата публикации

Speech synthesis model training method, speech synthesis method and related device

Номер: CN116129863A
Принадлежит:

The invention provides a speech synthesis model training method, a speech synthesis method and a related device. The speech synthesis model training method comprises the steps of obtaining a sample phoneme sequence and an acoustic feature tag of training sample data; encoding the sample phoneme sequence through an encoder to obtain a first sample phoneme encoding feature; inputting the acoustic feature tag into a reference encoder for feature extraction to obtain a word-level speech feature tag, and performing feature extraction on the speech feature tag to obtain a speech feature tag vector; performing feature fusion on the first sample phoneme coding feature and the speech feature tag vector to obtain a first sample rhythm fusion feature; decoding the first sample rhythm fusion feature through a decoder to obtain a sample acoustic feature; and calculating a first loss value between the sample acoustic feature and the acoustic feature tag by using the first loss function, and training ...

Подробнее
25-07-2023 дата публикации

Method for changing Bluetooth call background, medium and electronic equipment

Номер: CN116486814A
Автор: WANG JUN, LI TIANBIAN
Принадлежит:

The invention relates to the technical field of communication, in particular to a method for changing a Bluetooth communication background, a medium and electronic equipment. Simple voice training is carried out through AI, so that a voice feature model of the user is obtained, and the AI model can give out voice basically consistent with the user himself according to text content; the AI algorithm and the model data are stored in terminals such as an earphone, a mobile phone and cloud equipment; when a call is carried out, voice recognition is carried out, the voice is converted into text content, and emotion factors are extracted from a user voice clock; the extracted text content and emotion factors are converted into call voice basically consistent with the user through the AI algorithm and the model data; a user can select a field environment before conversation; the earphone or the device generates a simulated field environment sound background through the AI; the environmental background ...

Подробнее
25-04-2023 дата публикации

Speech synthesis model training platform

Номер: CN116010815A
Принадлежит:

The invention provides a speech synthesis model training platform. The platform comprises a front-end module used for creating a training task; the back-end module is used for acquiring the training task created by the front-end module, and if the idle core of a graphics processing unit (GPU) is queried, the computing resource of the idle core is allocated to the training task; and the algorithm module is used for training the training task to obtain a training result of the training task. According to the speech synthesis model training platform provided by the invention, after the establishment of the training task is determined, the user does not need to carry out any operation and directly waits for the final training result, so that the problem that the speech model training process is complicated is solved, and the training accuracy is improved.

Подробнее
12-01-1973 дата публикации

ELECTRONIC VOICE ANNUNCIATING SYSTEM HAVING BINARY DATA CONVERTED INTO AUDIO REPRESENTATIONS

Номер: FR0002047057B1
Автор:
Принадлежит:

Подробнее
06-03-2009 дата публикации

PROCESS OF VOICE SYNTHESIS AND PROCESS OF INTERPERSONAL COMMUNICATION, IN PARTICULAR FOR ONLINE GAMES MULTIJOUEURS

Номер: FR0002920583A1
Принадлежит: ALCATEL LUCENT

Procédé de synthèse vocale, ce procédé comprenant une étape de choix d'une voix de synthèse parmi un ensemble de voix ayant des signatures spectrales prédéterminées et une étape d'enregistrement de la voix naturelle d'une première personne, le procédé comprenant une étape de transformation de la voix naturelle enregistrée pour mise en conformité avec la signature spectrale de la voix de synthèse choisie, la voix naturelle ainsi transformée étant enregistrée, ce procédé comprenant une étape de détermination d'au moins un paramètre de situation pour un premier personnage parmi un ensemble de paramètres prédéfinis, chaque paramètre prédéfini étant associé à une altération spectrale de la voix émise, le paramètre de situation déterminé caractérisant notamment l'environnement, l'état physique ou psychologique du personnage, le procédé comprenant une étape d'altération spectrale de la voix naturelle transformée pour mise en conformité avec l'altération spectrale associée au paramètre de situation ...

Подробнее
12-09-2014 дата публикации

terminal apparatus and method for displaying talking head

Номер: KR0101439212B1
Автор:
Принадлежит:

Подробнее
22-12-2011 дата публикации

SYSTEM AND METHOD FOR SYNTHESIZING VOICE OF MULTI-LANGUAGE

Номер: KR0101097186B1
Автор:
Принадлежит:

Подробнее
26-05-2017 дата публикации

SYSTEM FOR EXPRESSING TEXT MESSAGE OF MOBILE TERMINAL

Номер: KR101740669B1
Автор: KIM, DO HUI
Принадлежит: KIM, DO HUI

The present invention relates to a system for expressing a text message of a mobile terminal to express the text message received by the mobile terminal through a doll, a desk article, etc. put indoor or the like. The system for expressing a text message of a mobile terminal selects an expression method suitable for an emotional expression to transfer to a received text message, and expresses the expression method. COPYRIGHT KIPO 2017 (10) Mobile terminal (20) Message expressing apparatus (A1,A2) No (B1,B2) Yes (S10) Start message communication? (S20) Relay a message (S20′) Express the message (S21) Extract a received message (S21′) Receive (S22) Analyze the message (S22′) Output the message (S23) Select an expression rule (S24) Transmit (S30) End the message communication? (S40) Notify that the communication is ended (S40′) Output the message after ending ...

Подробнее
19-04-2012 дата публикации

VOICE AND TEXT COMMUNICATION SYSTEM, METHOD AND APPARATUS

Номер: KR0101136769B1
Автор:
Принадлежит:

Подробнее
05-09-2007 дата публикации

TERMINAL DEVICE FOR EXECUTING SPEECH SYNTHESIS USING UTTERANCE DESCRIPTION LANGUAGE

Номер: KR0100754571B1
Автор:
Принадлежит:

Подробнее
07-10-2009 дата публикации

METHOD AND AN APPARATUS FOR CONFIRMING A DOCUMENT IMAGE TO A VISUAL DISABLED FOR PROVIDING A VISUAL CORRECTION DEVICE

Номер: KR1020090105531A
Принадлежит:

PURPOSE: A method and an apparatus for confirming a document image to a visual disabled for providing a visual correction device are provided to solve a problem about the visual disabled by letting them listen to geographical information when pressing a location button. CONSTITUTION: An apparatus for confirming a document image to a visual disabled for providing a visual correction device includes an HMD and an interface unit(20). A visual disabled captures the desired image through a camera sensor unit(22). A central processing unit(10) forms document data necessary for the visual disabled through a program recognizing the document and the image. The necessary information is transmitted to the visual disabled through a speaker and an earphone. COPYRIGHT KIPO 2010 ...

Подробнее
15-05-2006 дата публикации

METHOD AND AN APPARATUS FOR MIXING A SOUND, AND A RECORDING MEDIUM RECORDING A SOUND MIXING PROGRAM, PARTICULARLY FOR PERFORMING A SOUND CONVERSION PROCESS

Номер: KR1020060043023A
Принадлежит:

PURPOSE: A method and an apparatus for mixing a sound and a recording medium recording a sound mixing program are provided to perform a sound conversion process on the basis of the sound quality data number obtained from text information and to generate a mixing sound with various kinds of sound qualities. CONSTITUTION: An input unit(210) supplies text information to a text analyzing unit(220). The text analyzing unit analyzes the text information supplied from the input unit and supplies the analyzing result to a phoneme data acquisition unit(230), a tone quality change unit(250) and a sound signal generating unit(270), respectively. The phoneme data acquisition unit obtains phoneme data corresponding to the supplied phoneme information and supplies the data. The tone quality change unit changes the tone quality of the phoneme. © KIPO 2006 ...

Подробнее
21-10-2009 дата публикации

USER RECOGNITION SYSTEM AND A METHOD THEREOF, CAPABLE OF MAINTAINING THE SECURITY IN HIGH LEVEL BY DIFFERENTIATING WORDS WHENEVER PERFORMING VOICE RECOGNITION

Номер: KR1020090110013A
Принадлежит:

PURPOSE: A user recognition system and a method thereof are provided to re-check the authentication result of fingerprint recognition through voice recognition. CONSTITUTION: A user recognition method comprises the following steps of: registering a word group which includes basis fingerprint information and plural basic words corresponding to the basis fingerprint information; in case the fingerprint authentication of a user is successful, randomly selecting one or more basic words corresponding to the basis fingerprint information(S105); and re-checking the personal authentication by determining whether registered voice information coincides with the voice information of the user(S106,S107). COPYRIGHT KIPO 2010 ...

Подробнее
28-09-2007 дата публикации

APPARATUS AND METHOD FOR REGISTERING RECITATION INFORMATION, COMPUTER READABLE RECORDING MEDIUM STORING PROGRAM FOR EXECUTING THE METHOD, AND MOBILE TERMINAL FOR REGISTERING RECITATION INFORMATION IN MEMORY DICTIONARY AND OUTPUTTING REGISTERED RECITATION INFORMATION BY VOICE

Номер: KR1020070095162A
Принадлежит:

PURPOSE: An apparatus and a method for registering recitation information, a computer readable recording medium storing a program executing the method, and a mobile terminal are provided to register description information and recitation information corresponding to the description information at the same time. CONSTITUTION: A recitation information registration apparatus registers recitation information corresponding to description information and outputs the recitation information when the description information is searched. The apparatus includes a first registration region in which the description information is registered, a second registration region in which the recitation information is registered, and a processor(6) for processing registration of the recitation information for the second registration region in connection with the registration of the description information. The description information includes the name of a counterpart and words used for electronic mail. © KIPO ...

Подробнее
27-04-2006 дата публикации

TONE COLOR CONVERSION METHOD BY PHONEME-CLASSIFIED CODEBOOK MAPPING, PARTICULARLY FOR FINELY CHANGING A TONE COLOR PER EACH PHONEME

Номер: KR1020060035998A
Автор: KIM, DONG KWAN
Принадлежит:

PURPOSE: A tone color conversion method by phoneme-classified codebook mapping is provided to finely change a tone color per each phoneme by using a phoneme-classified code vector and to enable the tone color of a famous people to be used in broadcasting. CONSTITUTION: A codebook mapping table is generated according to a kind of phoneme. In the codebook mapping table, a block group including a plurality of field blocks having an index field and a mapping number field of a target speaker code vector is accessed by an index of a code vector included in a codebook of a source speaker. A candidate basic frequency of a voice frame is determined from a peak value of a normalization autocorrelation function for the voice frame, and a dynamic program for the voice frame is executed according to the candidate basic frequency and integration Gaussian distribution generated from the candidate basic frequency to determine a basic frequency for each voice frame(300,305). A kind of phoneme is discriminated ...

Подробнее
13-08-2019 дата публикации

Номер: KR1020190094296A
Автор:
Принадлежит:

Подробнее
25-06-2001 дата публикации

METHOD AND DEVICE FOR INFORMATION AND/OR MESSAGES BY MEANS OF SPEECH

Номер: KR20010052920A
Принадлежит:

The invention relates to a method and device for outputting information and/or messages from at least one device by means of speech, whereby the informationnd/or messages (KOM, Auff) required for said vocal output are archived in a voice memory, are read out by a processing device according to demand and are outputted via an acoustic output device. According to the invention, the information and/or messages (KOM, Auff) are outputted with various types of intonation (I) according to relevance. © KIPO & WIPO 2007 ...

Подробнее
22-05-2020 дата публикации

ELECTRONIC APPARATUS AND METHOD FOR CONTROLLING THEREOF

Номер: WO2020101263A1
Принадлежит:

An electronic apparatus, based on a text sentence being input, obtains prosody information of the text sentence, segments the text sentence into a plurality of sentence elements, obtains a speech in which prosody information is reflected to each of the plurality of sentence elements in parallel by inputting the plurality of sentence elements and the prosody information of the text sentence to a text to speech (TTS) module, and merges the speech for the plurality of sentence elements that are obtained in parallel to output speech for the text sentence.

Подробнее
27-07-2006 дата публикации

SYSTEM AND METHOD FOR SYNTHESIZING MUSIC AND VOICE, AND SERVICE SYSTEM AND METHOD THEREOF

Номер: WO2006078108A1
Автор: SEO, Moon-Jong
Принадлежит:

The present invention relates to a system and a method for synthesizing music and voice, and a service system and a service method using the same. The system and method according to the present invention is capable of making a listener feel maximum synthesizing effects to mix the voice and the music. Also, the system and method according to the present invention is capable of synthesizing the voice and music with various effects without the professional synthesizer's volume control.

Подробнее
03-01-2019 дата публикации

SINGING SOUND GENERATION DEVICE, METHOD AND PROGRAM

Номер: WO2019003350A1
Принадлежит:

Provided is a singing sound generation device capable of defining the sound production pitch of the singing sound to be generated at a period that corresponds to the syllable to be produced. A CPU 10 obtains a sound production or sound production removal instruction specifying pitch, determines the determination duration T according to the obtained syllable information, defines a single sound production pitch after the determination duration T elapses on the basis of the obtained sound production or sound production removal instruction, and generates a singing sound on the basis of the obtained syllable information and the defined sound production pitch.

Подробнее
03-09-2015 дата публикации

AUDIO SYSTEM

Номер: WO2015129372A1
Принадлежит:

An audio system or the like is constructed in which audio output convenient for a user is outputted from an electronic device. An audio system (100) is provided with a cloud server (20) for setting output conditions for each set of audio data and transmitting condition information indicating the output conditions to an electronic device, and a home appliance (10) for issuing audio output corresponding to the audio data in accordance with the received condition information.

Подробнее
23-11-2006 дата публикации

AUDIO INFORMATION RECORDING DEVICE

Номер: WO2006123837A1
Автор: YOSHIDA, Kenji
Принадлежит:

A link table is created and each dot pattern is correlated with audio information so that when the dot pattern is read by a scanner, the audio information correlated with the dot pattern is reproduced from a speaker. Thus, by printing a dot pattern on the surface of an illustrated book or a card, it is possible to reproduce audio information corresponding to a pattern or a story of an illustrated book or a character drawn on a card. Moreover, by the link table, it is possible to correlate the dot pattern with new audio information or release and modify the correlation.

Подробнее
03-06-2004 дата публикации

METHOD FOR THE REPRODUCTION OF SENT TEXT MESSAGES

Номер: WO2004047466A2
Автор: KRAKOWSKI, Claudiu
Принадлежит:

The invention relates to a method for the reproduction of sent text messages, whereby the received text is converted into an acoustic signal by means of speech synthesis, characterised in that the text message for transmission is provided with at least one transmitter specific parameter and that, on receipt, the transmitter specific parameter(s) is(are) taken account of in the speech synthesis.

Подробнее
25-11-1999 дата публикации

SCALABLE MIXING FOR SPEECH STREAMING

Номер: WO0009960815A3
Принадлежит:

L'invention porte sur un système de traitement de la parole recevant des flux multiples de cadres de paroles et sélectionnant parmi les cadres concurrents un sous-ensemble de cadres les plus significatifs en fonction des priorités préattribuées aux flux, et du contenu des cadres en énergie. Les cadres sélectionnés sont ensuite décodés et rendus, puis les signaux résultants sont mélangés. Cette architecture permet de faire varier la largeur de bande et/ou la puissance de traitement.

Подробнее
28-02-2019 дата публикации

READ-OUT SYSTEM AND READ-OUT METHOD

Номер: WO2019039591A4
Автор: SHIMAKAGE Keisuke
Принадлежит:

This read-out system is equipped with: an imaging unit which is provided to a wearable device that is used worn on the body of a user, and which captures images of the forward direction of the user; an extraction unit which extracts written characters from the images captured by the imaging unit; a conversion unit which converts the characters extracted by the extraction unit into voice audio; an output unit which is provided to the wearable device and outputs the voice audio; an input unit which is provided to the wearable device and receives input from the user; and a control unit which, on the basis of the input from the user received via the input unit, controls the playback speed of the voice audio output from the output unit.

Подробнее
10-12-2015 дата публикации

SYSTEMS AND METHODS FOR GENERATING SPEECH OF MULTIPLE STYLES FROM TEXT

Номер: WO2015184615A1
Принадлежит:

A text-to-speech (TTS)system includes components capable of supporting the generation of speech output in any of multiple styles,and may switch seamlessly from producing speech output in one style to producing speech output in another style. For example, a concatenative TTS system may include a speech base storing speech units associated with multiple speech styles,and a linguistic analysis component to generate a phonetic transcription specifying speech output in any of multiple styles. Text input may include a style indication associated with a particular segment of the input text. The linguistic analysis component may invoke encoded rules and/or components based upon the style indication,and generate a phonetic transcription specifying a speech style,which may be processed to generate output speech.

Подробнее
25-06-2015 дата публикации

COMPUTER-IMPLEMENTED METHOD, COMPUTER SYSTEM AND COMPUTER PROGRAM PRODUCT FOR AUTOMATIC TRANSFORMATION OF MYOELECTRIC SIGNALS INTO AUDIBLE SPEECH

Номер: WO2015090562A3
Принадлежит:

In one aspect, the present application is directed to a computer-implemented method, a computer program product, and a computer system for automatic transformation of myoelectric signals into speech output corresponding to audible speech. The computer-implemented method may comprise: capturing, from a human speaker, at least one myoelectric signal representing speech; converting at least part of the myoelectric signal to one or more speech features; and vocoding the speech features to generate and output the speech output corresponding to the myoelectric signal.

Подробнее
31-07-2003 дата публикации

PERSONALISATION OF THE ACOUSTIC PRESENTATION OF MESSAGES SYNTHESISED IN A TERMINAL

Номер: WO2003063133A1
Принадлежит:

The invention relates to the personalisation of the acoustic presentation of messages synthesised in a terminal (1), whereby acoustic characteristics (CV) which describe a voice (V) are selected from a catalogue of acoustic characteristics pre-recorded on a server (2) for transmission to a voice synthesiser (3). A text message (MT) which can be selected on the terminal is synthesised in the synthesiser, based on the selected acoustic characteristics, as a voice message (MS) which is transmitted to the terminal for listening. At least one noise (B) can be selected on the server for mixing with the voice message.

Подробнее
08-10-2020 дата публикации

SPEECH SYNTHESIS METHOD AND APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM

Номер: WO2020200178A1
Принадлежит:

A speech synthesis method, the method comprising: dividing text into a plurality of segments belonging to different language categories (S102); converting each segment into a corresponding phoneme according to the language category to which each segment belongs so as to generate a phoneme sequence of the text ( S104); inputting the phoneme sequence into a pre-trained speech synthesis model and converting same into vocoder feature parameters (S106); and inputting the vocoder feature parameters into a vocoder to generate speech (S108).

Подробнее
20-04-1995 дата публикации

A METHOD FOR TRAINING A SYSTEM, THE RESULTING APPARATUS, AND METHOD OF USE THEREOF

Номер: WO1995010832A1
Автор: HIRSCHBERG, Julia
Принадлежит:

A method of training a TTS (104) to assign intonational features, such as intonational phrase boundaries, to input text (110). The method of training involves taking a set of predetermined text (110) and having a human annotate it with intonational feature annotations. The text is passed through the preprocessor (120) and the phrasing module (122) wherein a set of decision nodes is generated by statistically analyzing information based upon the structure of the predetermined text. The statistical representation may then be stored and repeatedly used to generate synthesized speech, through the post processor (124), from new sets of input text without further training.

Подробнее
16-06-2020 дата публикации

Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment

Номер: US0010685643B2
Принадлежит: Vocollect, Inc., VOCOLLECT INC

A method and apparatus that dynamically adjust operational parameters of a text-to-speech engine in a speech-based system are disclosed. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.

Подробнее
13-07-2021 дата публикации

Modular systems and methods for selectively enabling cloud-based assistive technologies

Номер: US0011061532B2
Принадлежит: AudioEye, Inc., AUDIOEYE INC

Methods and systems for manual and programmatic remediation of websites. JavaScript code is accessed by a user device and optionally calls TTS, ASR, and RADAE modules from a remote server to thereby facilitate website navigation by people with diverse abilities.

Подробнее
03-08-2021 дата публикации

Modular systems and methods for selectively enabling cloud-based assistive technologies

Номер: US0011080469B1
Принадлежит: AudioEye, Inc., AUDIOEYE INC

Methods and systems for manual and programmatic remediation of websites. JavaScript code is accessed by a user device and optionally calls TTS, ASR, and RADAE modules from a remote server to thereby facilitate website navigation by people with diverse abilities.

Подробнее
16-02-1999 дата публикации

Method and apparatus for decoding and changing the pitch of an encoded speech signal

Номер: US5873059A
Автор:
Принадлежит:

A method and apparatus for reproducing speech signals at a controlled speed and for synthesizing speech includes a dividing unit that divides the input speech into time segments and an encoding unit that discriminates whether each of the speech segments is voiced or unvoiced. Based on the results of the discrimination, the encoding unit performs sinusoidal synthesis and encoding for voiced segments and vector quantization by closed-loop search for an optimum vector using an analysis-by-synthesis method for unvoiced segments in order to find encoded parameters. A period modification unit modifies the length of time associated with each signal segment and calculates a set of modified encoded parameters. In the speech synthesizing unit, encoded speech signal data is output from the encoding unit and pitch data and amplitude data specifying the spectral envelope are sent via a data conversion unit to a waveform synthesis unit, where the number of amplitude data points of the spectral envelope ...

Подробнее
23-09-1986 дата публикации

Speech synthesizer with function of developing melodies

Номер: US0004613985A
Автор:
Принадлежит:

A synthesizer is disclosed which has the function of generating synthesized human voices and melodies in the form of synthesized sounds. The speech synthesizer may be implemented with one or more LSI devices which include a central processor unit for receiving word codes or melody program codes and for controlling the synthesizer, a memory for storing the sequence of synthesis for each word and melody, a synthesized word generator for providing audible indications of the respective words in the form of a synthesized sound and a melody generator for providing melodies in the form of a synthesized sound. A decision circuit decides if word codes or melody programs are provided. The selected words are audibly delivered by fetching its associated sequence of synthesis from the memory and synthesizing those words through the synthesized word generator or the selected melodies are audibly delivered by fetching its associated sequence of synthesis from the memory and synthesizing the selected melodies ...

Подробнее
05-07-2005 дата публикации

Matching a synthetic disc jockey's voice characteristics to the sound characteristics of audio programs

Номер: US0006915261B2
Принадлежит: Intel Corporation, INTEL CORP, INTEL CORPORATION

A system and method for matching voice characteristics of a synthetic disc jockey are presented. A first segment of audio signal and a second segment of audio signal are received by a sound characteristic estimator. Corresponding first and second sets of sound characteristics are determined by the sound characteristic estimator. A voice characteristic transition for the disc jockey is interpolated from the first and second set of sound characteristics between a starting and an ending time.

Подробнее
15-10-2002 дата публикации

Simple and fast way for generating a harmonic signal

Номер: US0006466903B1
Принадлежит: AT&T Corp., AT & T CORP, AT&T CORP.

A fast and accurate method for generating a sampled version of the signalis achieved by retrieving from memory a pre-computed phase delay value corresponding to phik for a given fundamental frequency, expressed in numbers of samples, for a running value of the index k, subtracting it from a sample time index, t, that is multiplied by the value of k, and employing the subtraction result, expressed in a modulus related to the fundamental frequency, to retrieve a pre-computed sample value of cosine cos(komegaot) for the given fundamental frequency. The retrieved sample is multiplied by a retrieved coefficient Ak corresponding to the value of k and to the given fundamental frequency, and placed in an accumulator. The value of k is incremented, and the process for the sample value corresponding to the value of time sample t is repeated until the process completes for k=K.

Подробнее
20-10-2009 дата публикации

Audio signal processing device, signal recovering device, audio signal processing method and signal recovering method

Номер: US0007606711B2
Автор: Yasushi Sato, SATO YASUSHI

The pitch extracting part generates a pitch waveform signal in a manner making the time interval of the pitch of the input audio sound data to be the same. After the number of samples in each region is made to be the same by the re-sampling part, the pitch waveform signal is changed into a subband data that express a time-varying-strength of a basic frequency composition and a higher harmonic composition by the subband analyzing part. The subband data are superimposed by a modulation wave composition that expresses attaching data of an attaching object by the data attaching part and is regarded as a bit stream to output through a nonlinear quantizing. A portion expressing the higher harmonic composition that is made corresponding to the audio sound expressed by this audio sound data in the subband data are deleted by the encoding part.

Подробнее
09-06-2016 дата публикации

METHOD FOR SPEECH CODING, METHOD FOR SPEECH DECODING AND THEIR APPARATUSES

Номер: US20160163325A1
Принадлежит: BlackBerry Limited

A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result ...

Подробнее
09-02-2012 дата публикации

System and method for synthetic voice generation and modification

Номер: US20120035933A1
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

Подробнее
06-12-2012 дата публикации

System and Method for Enhancing Locative Response Abilities of Autonomous and Semi-Autonomous Agents

Номер: US20120306741A1
Автор: Kalyan M. Gupta
Принадлежит: KNEXUS RESEARCH Corp

A computer system and method according to the present invention can receive multi-modal inputs such as natural language, gesture, text, sketch and other inputs in order to simplify and improve locative question answering in virtual worlds, among other tasks. The components of an agent as provided in accordance with one embodiment of the present invention can include one or more sensors, actuators, and cognition elements, such as interpreters, executive function elements, working memory, long term memory and reasoners for responses to locative queries, for example. Further, the present invention provides, in part, a locative question answering algorithm, along with the command structure, vocabulary, and the dialog that an agent is designed to support in accordance with various embodiments of the present invention.

Подробнее
11-07-2013 дата публикации

METHODS AND APPARATUS FOR FORMANT-BASED VOICE SYNTHESIS

Номер: US20130179167A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method. 136.-. (canceled)37. A method of processing a voice signal to extract information to facilitate training a speech synthesis model for use with a formant-based text-to-speech synthesizer , the method comprising acts of:detecting a plurality of candidate features in the voice signal;grouping different combinations of the plurality of candidate features into a plurality of candidate feature sets;forming a plurality of voice waveforms, each of the plurality of voice waveforms formed, at least in part, by processing a respective one of the plurality of candidate feature sets;performing at least one comparison between the voice signal and each of the plurality of voice waveforms;selecting at least one of the plurality of candidate feature sets based, at least in part, on the at least one comparison with the voice signal; andtraining the speech synthesis model based, at least in part, on the selected at least one of the plurality of candidate feature sets.38. The method of claim 37 , further comprising an act of converting the voice signal into a same format as the plurality of voice waveforms prior to performing the at least one comparison.39. The method of claim 37 , wherein forming the plurality of voice waveforms includes forming the plurality of voice waveforms in a same format as the ...

Подробнее
09-01-2014 дата публикации

PROSODY GENERATOR, SPEECH SYNTHESIZER, PROSODY GENERATING METHOD AND PROSODY GENERATING PROGRAM

Номер: US20140012584A1
Принадлежит: NEC Corporation

There is provided a prosody generator that generates prosody information for implementing highly natural speech synthesis without unnecessarily collecting large quantities of learning data. A data dividing means divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms. A density information extracting means extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing means . A prosody information generating method selecting means selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics. 1. A prosody generator comprising:a data dividing unit which divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms;a density information extracting unit which extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing unit, anda prosody information generating method selecting unit which selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics.2. The prosody generator according to claim 1 , further comprisinga prosody generation model preparing unit which prepares a prosody generation model representative of relations between speech and the prosody information by use of a learning ...

Подробнее
05-01-2017 дата публикации

METHOD FOR BUILDING A SPEECH FEATURE LIBRARY, AND METHOD, APPARATUS, DEVICE, AND COMPUTER READABLE STORAGE MEDIA FOR SPEECH SYNTHESIS

Номер: US20170004820A1
Принадлежит:

The present invention provides a method for building a speech feature library, as well as a method, an apparatus, a device and corresponding non-volatile, non-transitory computer readable storage media for speech synthesis. Because the speech feature library used in the present invention saves at least one context corresponding to each piece of personalized textual information and at least one piece of textual information semantically identical to the personalized textual information, when performing speech synthesis, even if the provided textual information is not personalized textual information corresponding to the desired personalized speech, personalized textual information semantically identical to the textual information to be subject to speech synthesis may be first found in the speech feature library to thereby achieve personalized speech synthesis, such that use of the personalized speech will not be restricted by aging, sickness, and death of a person. 1. A method for building a speech feature library , comprising:converting speech recording of an object into personalized textual information;analyzing and obtaining at least one context corresponding to each piece of personalized textual information and at least one semantically identical piece of textual information;saving, in a speech feature library of the object, each piece of personalized textual information and a corresponding linguistic feature, each linguistic feature indicating a context and a piece of textural information that correspond;performing audio sampling to the speech recording to obtain an audio sample value; andsaving an audio feature in the speech feature library of the object, the audio feature indicating an audio sample value.2. The method according to claim 1 , further comprising:saving a speech feature corresponding to each piece of personalized textual information in the speech feature library, each speech feature indicating a piece of linguistic feature and a piece of audio ...

Подробнее
05-01-2017 дата публикации

TESTING WORDS IN A PRONUNCIATION LEXICON

Номер: US20170004823A1
Принадлежит:

A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence. 1. A method performed in one or more of computers , for testing words defined in a pronunciation lexicon used in an automatic speech recognition system , wherein the method comprises the following steps:obtaining a plurality of test sentences which can be accepted by a language model used in the automatic speech recognition system, wherein the test sentences cover the words defined in the pronunciation lexicon;obtaining variations of speech data corresponding to each of the test sentences;obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data;constructing a word graph, using the plurality of texts, for each of the test sentences, wherein each word in the word graph corresponds to each of the words defined in the pronunciation lexicon; anddetermining whether or not all or parts of words in a test sentence of the test sentences are present in a path of the word graph derived from the test sentence.2. The method according to claim 1 , wherein the generated test ...

Подробнее
05-01-2017 дата публикации

SYSTEM AND METHOD FOR DATA-DRIVEN SOCIALLY CUSTOMIZED MODELS FOR LANGUAGE GENERATION

Номер: US20170004825A1
Принадлежит:

Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user's social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user. 1. A method comprising:identifying, via a processor configured to perform speech analysis, an identity of a user based on characteristics of received speech during a dialog between the user and a dialog system, to yield a user identification;generating a personalized natural language generation model based on a stylistic analysis on a literary narrative and the user identification; andapplying the personalized natural language generation model while performing, as part of the dialog, one of automatic speech recognition or natural language generation.2. The method of claim 1 , wherein the stylistic analysis identifies connections between two or more of a personality independent quotation lattice claim 1 , personality independent attributes claim 1 , personality dependent attributes claim 1 , and speakers within the literary narrative.3. The method of claim 2 , wherein stylistic analysis further comprises:identifying the speakers in the literary narrative, to yield identified speakers;attributing quoted utterances in the literary narrative to the identified speakers, to yield a quotation lattice;identifying the personality independent attributes and the personality dependent attributes of the quoted utterances within the quotation lattice; andorganizing the quotation lattice based on the personality ...

Подробнее
07-01-2016 дата публикации

Devices and Methods for a Universal Vocoder Synthesizer

Номер: US20160005392A1
Принадлежит:

A device may receive an input indicative of acoustic feature parameters associated with speech. The device may determine a modulated noise representation for noise pertaining to one or more of an aspirate or a fricative in the speech based on the acoustic feature parameters. The aspirate may be associated with a characteristic of an exhalation of at least a threshold amount of breath. The fricative may be associated with a characteristic of airflow between two or more vocal tract articulators. The device may also provide an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation. 1. A method comprising:receiving, by a device that includes one or more processors, an input indicative of acoustic feature parameters associated with speech;determining, based on the acoustic feature parameters, a modulated noise representation for noise pertaining to one or more of an aspirate or a fricative in the speech, wherein the aspirate is associated with a characteristic of an exhalation of at least a threshold amount of breath, and wherein the fricative is associated with a characteristic of airflow between two or more vocal tract articulators; andproviding, by the device, an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.2. The method of claim 1 , further comprising:determining a representation of the speech that includes the acoustic feature parameters mapped to harmonic frequencies of the speech, wherein the representation includes the modulated noise representation mapped also to the harmonic frequencies, and wherein the audio signal is based on the representation of the speech.3. The method of claim 1 , further comprising:determining, based on the input, the acoustic feature parameters including spectral parameters associated with the speech, aperiodicity parameters associated with the speech, and phase parameters associated with the speech.4. ...

Подробнее
07-01-2016 дата публикации

Voice Prompt Generation Combining Native and Remotely-Generated Speech Data

Номер: US20160005393A1
Принадлежит:

An electronic device includes a processor and a memory coupled to the processor. The memory stores instructions that, when executed by the processor, cause the processor to perform operations including determining whether a text prompt received from a wireless device corresponds to first synthesized speech data stored at the memory. The operations include, in response to a determination that the text prompt does not correspond to the first synthesized speech data, determining whether a network is accessible. The operations include, in response to a determination that the network is accessible, sending a text-to-speech (TTS) conversion request to a server via the network. The operation further include, in response to receiving second synthesized speech data from the server, storing the second synthesized speech data at the memory. 1. An electronic device comprising:a processor; and determining whether a text prompt received from a wireless device corresponds to first synthesized speech data stored at the memory;', 'in response to a determination that the text prompt does not correspond to the first synthesized speech data, determining whether a network is accessible;', 'in response to a determination that the network is accessible, sending a text-to-speech (TTS) conversion request to a server via the network; and', 'in response to receiving second synthesized speech data from the server, storing the second synthesized speech data at the memory., 'a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the processor to perform operations comprising2. The electronic device of claim 1 , wherein the operations further comprise determining whether the second synthesized speech data is received prior to expiration of a threshold time period.3. The electronic device of claim 2 , wherein the operations further comprise claim 2 , in response to a determination that the second synthesized speech data is received prior to ...

Подробнее
02-01-2020 дата публикации

Customer voice order triggered mutual affinity merchant donation

Номер: US20200005276A1
Принадлежит: Edatanetworks Inc

A customer uses a mobile device to verbally request an offer that includes an incentive to transact at a merchant's brick and mortar store in the customer's local community in exchange for the merchant's agreement to make an auditable donation to a charity serving the local community. Business rules limit the merchant's charitable donations over calendar periods, which donations can be made directly by the merchant to the community charity, or indirectly to the charity by way of a blind donation made by the merchant to a donation disbursement agency acting on the merchant's behalf to satisfy the merchant's commitment to donate.

Подробнее
02-01-2020 дата публикации

VOICE SYNTHESIS METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM

Номер: US20200005761A1
Автор: Yang Jie
Принадлежит:

Provided are a voice synthesis method, an apparatus, a device, and a storage medium, involving obtaining text information and determining characters in the text information and a text content of each of the characters; performing a character recognition on the text content of each of the characters, to determine character attribute information of each of the characters; obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, where the speakers are pre-stored pronunciation object having the character attribute information; and generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information. These improve pronunciation diversities of different characters in the synthesized voices, improve an audience's discrimination between different characters in the synthesized voices, and thereby improve experience of a user. 1. A voice synthesis method , comprising:obtaining text information and determining characters in the text information and a text content of each of the characters;performing a character recognition on the text content of each of the characters, to determine character attribute information of the each of the characters;obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, wherein the speakers are pre-stored speakers having the character attribute information; andgenerating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information.2. The method according to claim 1 , wherein the character attribute information comprises a basic attribute claim 1 , and the basic attribute comprises at least one of a gender attribute and an age attribute;before the obtaining speakers in one-to-one correspondence with the characters ...

Подробнее
02-01-2020 дата публикации

INTERACTIVE METHOD AND DEVICE OF ROBOT, AND DEVICE

Номер: US20200005772A1
Автор: DAI Jun, Liu Ying
Принадлежит:

Embodiments of the present disclosure provide an interactive method of a robot, an interactive device of a robot and a device. The method includes: obtaining voice information input by an interactive object, and performing semantic recognition on the voice information to obtain a conversation intention; obtaining feedback information corresponding to the conversation intention based on a conversation scenario knowledge base pre-configured by a simulated user; and converting the feedback information into a voice of the simulated user, and playing the voice to the interactive object. 1. An interactive method for a robot , comprising:obtaining voice information input by an interactive object, and performing semantic recognition on the voice information to obtain a conversation intention;obtaining feedback information corresponding to the conversation intention based on a conversation scenario knowledge base pre-configured by a simulated user; andconverting the feedback information into a voice of the simulated user, and playing the voice to the interactive object.2. The method according to claim 1 , wherein obtaining the feedback information corresponding to the conversation intention based on the conversation scenario knowledge base pre-configured by the simulated user comprises:querying the conversation scenario knowledge base based on the conversation intention, to obtain a query path;when the query path shows a preset path, querying rich media knowledge pre-configured by the simulated user and/or structured knowledge related to user characteristics and pre-configured by the simulated user, to obtain the feedback information corresponding to the conversation intention.3. The method according to claim 2 , wherein after querying the conversation scenario knowledge base based on the conversation intention to obtain the query path claim 2 , the method further comprises:when the query path shows an external path, querying a search engine pre-configured by the simulated ...

Подробнее
02-01-2020 дата публикации

Voice interaction method and device

Номер: US20200005780A1
Автор: Yongshuai LU

Embodiments of the present disclosure provide voice interaction method and device. The method includes: determining whether a first query statement currently received is a query statement first received within a preset time period; if not, obtaining a second query statement, where the second query statement is a query statement lastly received before receiving the first query statement; obtaining a third sentence vector according to a first sentence vector of the first query statement and a second sentence vector of the second query statement; and obtaining, from a bottom corpus, a first question and answer result corresponding to a fourth sentence vector a similarity between which and the third sentence vector satisfies a preset condition, and returning the first question and answer result. The method provided in the embodiment can return a bottom reply irrelevant to the query statement to the user, thereby improving the user experience.

Подробнее
12-01-2017 дата публикации

METHODS EMPLOYING PHASE STATE ANALYSIS FOR USE IN SPEECH SYNTHESIS AND RECOGNITION

Номер: US20170011733A1
Принадлежит:

A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods. 110-. (canceled)11. A method for categorically mapping the relationship of at least one text unit in a sequence of text to at least one corresponding prosodic phonetic unit , to at least one linguistic feature category in the sequence of text , and to at least one speech utterance represented in a synthesized speech signal , the method comprising:(a) identifying, and optionally modifying, acoustic data representing the at least one speech utterance, to provide the synthesized speech signal;(b) identifying, and optionally modifying, the acoustic data representing the at least one utterance to provide the at least one speech utterance with an expressive prosody determined according to prosodic rules; and(c) identifying acoustic unit feature vectors for each of the at least one prosodic phonetic units, each acoustic unit feature vector comprising a bundle of feature values selected according to proximity to a statistical mean of the values of acoustic unit candidates available ...

Подробнее
03-02-2022 дата публикации

SINGING VOICE CONVERSION

Номер: US20220036874A1
Принадлежит: Tencent America LLC

A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features. 1. A method of converting a first singing voice to a second singing voice , comprising:encoding, by a computer, a context associated with one or more phonemes corresponding to the first singing voice and outputting a sequence of one or more hidden states containing a sequential representation associated with the one or more phonemes;aligning, by the computer, the one or more phonemes to one or more target acoustic frames based on the encoded context;recursively generating, by the computer, one or more mel-spectrogram features from the aligned one or more phonemes and the one or more target acoustic frames; andconverting, by the computer, a sample corresponding to the first singing voice to a sample corresponding to the second singing voice using the generated one or more mel-spectrogram features,wherein the aligning the one or more phonemes to the one or more target acoustic frames comprises expanding the one or more hidden states of the output sequence based on a duration associated with each phoneme, and aligning the expanded one or more hidden states to the one or more target acoustic frames.2. The method of claim 1 , wherein the aligning the one or more phonemes to the one or more target acoustic frames further comprises claim 1 , prior to the expanding the one or more hidden states: ...

Подробнее
18-01-2018 дата публикации

SPEECH SYNTHESIS APPARATUS, SPEECH SYNTHESIS METHOD, SPEECH SYNTHESIS PROGRAM, PORTABLE INFORMATION TERMINAL, AND SPEECH SYNTHESIS SYSTEM

Номер: US20180018956A1
Автор: TAKATSUKA Susumu
Принадлежит: SONY MOBILE COMMUNICATIONS INC.

A speech synthesis apparatus includes a content selection unit that selects a text content item to be converted into speech; a related information selection unit that selects related information which can be at least converted into text and which is related to the text content item selected by the content selection unit; a data addition unit that converts the related information selected by the related information selection unit into text and adds text data of the text to text data of the text content item selected by the content selection unit; a text-to-speech conversion unit that converts the text data supplied from the data addition unit into a speech signal; and a speech output unit that outputs the speech signal supplied from the text-to-speech conversion unit. 1a content selection unit that selects a text content item to be converted into speech;a related information selection unit that selects related information which can be at least converted into text and which is related to the text content item selected by the content selection unit;a data addition unit that converts the related information selected by the related information selection unit into text and adds text data of the text to text data of the text content item selected by the content selection. unit;a text-to-speech conversion unit that converts the text data supplied from the data addition unit into a speech signal; anda speech output unit that outputs the speech signal supplied from the text-to-speech conversion unit.. A speech synthesis apparatus comprising: The present application is a continuation of and claims the benefit of priority under 35 U.S.C. §120 from U.S. application Ser. No. 12/411,031, filed Mar. 25, 2009, which contains subject matter related to that disclosed in Japanese Priority Patent Application JP 2008-113202 filed in the Japan Patent Office on Apr. 23, 2008, the entire content of both of which are hereby incorporated herein by reference.The present invention relates to a ...

Подробнее
18-01-2018 дата публикации

SOUND CONTROL DEVICE, SOUND CONTROL METHOD, AND SOUND CONTROL PROGRAM

Номер: US20180018957A1
Принадлежит:

A sound control device includes: a detection unit that detects a first operation on an operator and a second operation on the operator, the second operation being performed after the first operation; and a control unit that causes output of a second sound to be started, in response to the second operation being detected. The control unit causes output of a first sound to be started before causing the output of the second sound to be started, in response to the first operation being detected. 1. A sound control device comprising:a detection unit that detects a first operation on an operator and a second operation on the operator, the second operation being performed after the first operation; anda control unit that causes output of a second sound to be started, in response to the second operation being detected,wherein the control unit causes output of a first sound to be started before causing the output of the second sound to be started, in response to the first operation being detected.2. The sound control device according to claim 1 ,wherein the operator accepts push-in by a user,the detection unit detects, as the first operation, that the operator has been pushed in by a first distance from a reference position, andthe detection unit detects, as the second operation, that the operator has been pushed in by a second distance from the reference position, the second distance being longer than the first distance.3. The sound control device according to claim 1 ,wherein the detection unit comprises a first and second sensors provided in the operator,the first sensor detects the first operation, andthe second sensor detects the second operation.4. The sound control device according to claim 1 , wherein the operator comprises a keyboard that accepts the first and second operations.5. The sound control device according to claim 1 , wherein the operator comprises a touch panel that accepts the first and second operations.6. The sound control device according to claim 1 , ...

Подробнее
18-01-2018 дата публикации

System and Method for Handling Missing Speech Data

Номер: US20180018962A1
Принадлежит:

Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for handling missing speech data. The computer-implemented method includes receiving speech with a missing segment, generating a plurality of hypotheses for the missing segment, identifying a best hypothesis for the missing segment, and recognizing the received speech by inserting the identified best hypothesis for the missing segment. In another method embodiment, the final step is replaced with synthesizing the received speech by inserting the identified best hypothesis for the missing segment. In one aspect, the method further includes identifying a duration for the missing segment and generating the plurality of hypotheses of the identified duration for the missing segment. The step of identifying the best hypothesis for the missing segment can be based on speech context, a pronouncing lexicon, and/or a language model. Each hypothesis can have an identical acoustic score. 1. A method comprising:evaluating, by a system and based on a mean and a variance of duration for individual context-dependent phoneme acoustic models, a plurality of hypothetical segments, the plurality of hypothetical segments being generated to fill a missing segment in speech data, according to a context of speech determined from speech data and a duration of the missing segment to yield an evaluation;identifying, based on the evaluation, a possible segment that represents the missing segment of the speech data to yield an identified segment; andinserting the identified segment into the speech data to replace the missing segment.2. The method of claim 1 , wherein the evaluating of the plurality of hypothetical segments is further based on an acoustic feature of the speech data.3. The method of claim 1 , further comprising:generating the plurality of hypothetical segments by identifying hypothetical segments having a similar duration to the duration of the missing segment.4. The method of claim 1 ...

Подробнее
26-01-2017 дата публикации

SYSTEM AND METHOD FOR THE TRANSLATION OF SIGN LANGUAGES INTO SYNTHETIC VOICES

Номер: US20170024380A1
Принадлежит: MAP CARDOSO

A system and method for the translation of sign languages into synthetic voices. The present invention refers to the field of assistive technologies, and comprises an instantaneous communication system between hearing- and speech-impaired individuals with hearing-able individuals. More specifically, the invention relates to a method for translating, in real time, the sign language of one individual into oral language by employing biometric sensors, wireless data communication, and a built-in software in a cellphone or another compatible mobile computing device. In certain exemplar embodiments, the invention facilitates associating the recognition of movements and gestures to letters, words, and sentences, and synthesizing the same into an electronic voice. 1. A method for the instantaneous communication between hearing-/speech-impaired individuals with hearing-able individuals employing real-time translation of sign languages into synthetic voices , comprising:a) providing a biometric sensor on the forearm, below the elbow, of a hearing- or speech-impaired individual, the biometric sensor configured to detect a biological signal representative of a movement and gesture performed by the arm, the hand, or the fingers of the hearing- or speech-impaired individual, the biometric sensor communicatively coupled to a mobile computing device running built-in software;b) receiving and capturing the biological signal, via a wireless data communication, at the mobile computing device;c) processing, via the built-in software, the biological signal to ascertain the movement and gesture performed by the arm, the hand, or the fingers of the hearing- or speech-impaired individual;d) associating, via the built-in software, the ascertained movement and gesture with a letter, word, clause, or sentence; ande) synthesizing, by the mobile computing device, the associated letter, word, clause, or sentence into an electronic voice;wherein the built-in software running on the mobile ...

Подробнее
28-01-2016 дата публикации

METHOD FOR FORMING THE EXCITATION SIGNAL FOR A GLOTTAL PULSE MODEL BASED PARAMETRIC SPEECH SYNTHESIS SYSTEM

Номер: US20160027430A1
Принадлежит:

A system and method are presented for forming the excitation signal for a glottal pulse model based parametric speech synthesis system. The excitation signal may be formed by using a plurality of sub-band templates instead of a single one. The plurality of sub-band templates may be combined to form the excitation signal wherein the proportion in which the templates are added is dynamically based on determined energy coefficients. These coefficients vary from frame to frame and are learned, along with the spectral parameters, during feature training. The coefficients are appended to the feature vector, which comprises spectral parameters and is modeled using HMMs, and the excitation signal is determined. 1. A method for creating parametric models for use in training a speech synthesis system , wherein the system comprises at least a training text corpus , a speech database , and a model training module , the method comprising:a. obtaining, by the model training module, speech data for the training text corpus, wherein the speech data comprises recorded speech signals and corresponding transcriptions;b. converting, by the model training module, the training text corpus into context dependent phone labels;c. extracting, by the model training module, for each frame of speech in the speech signal from the speech training database, at least one of: spectral features, a plurality of band excitation energy coefficients, and fundamental frequency values;d. forming, by the model training module, a feature vector stream for each frame of speech using the at least one of: spectral features, a plurality of band excitation energy coefficients, and fundamental frequency values;e. labeling speech with context dependent phones;f. extracting durations of each context dependent phone from the labelled speech;g. performing parameter estimation of the speech signal, wherein the parameter estimation is performed comprising the features, HMM, and decision trees; andh. identifying a ...

Подробнее
10-02-2022 дата публикации

TRANSLATING BETWEEN SPOKEN LANGUAGES WITH EMOTION IN AUDIO AND VIDEO MEDIA STREAMS

Номер: US20220044668A1
Принадлежит:

Systems and methods are described herein for generating alternate audio for a media stream. The media system receives media that is requested by the user. The media comprises a video and audio. The audio includes words spoken in a first language. The media system stores the received media in a buffer as it is received. The media system separates the audio from the buffered media and determines an emotional state expressed by spoken words of the first language. The media system translates the words spoken in the first language into words spoken in a second language. Using the translated words of the second language, the media system synthesizes speech having the emotional state previously determined. The media system then retrieves the video of the received media from the buffer and synchronizes the synthesized speech with the video to generate the media content in a second language. 127-. (canceled)28. A method comprising:accessing a media asset, the media asset featuring a speaker;receiving a request to translate the media asset;searching for another media asset featuring the speaker;extracting a voice sample from the another media asset featuring the speaker;calculating vocal characteristics based on the voice sample from the another media asset featuring the speaker; andgenerating a translation of the media asset using the calculated vocal characteristics.29. The method of claim 28 , wherein the accessing the media asset comprises receiving the media asset comprising audio claim 28 , wherein the audio comprises a first plurality of spoken words claim 28 , the method further comprising:retrieving audio from the received media asset;determining an emotional state expressed by the first plurality of spoken words based on a first set of characteristics associated with the first plurality of spoken words;identifying an identifier of a speaker that produces the first plurality of spoken words based on metadata associated with the media asset; searching, based on the ...

Подробнее
29-01-2015 дата публикации

COMPUTERIZED INFORMATION AND DISPLAY APPARATUS

Номер: US20150032459A1
Автор: Gazdzinski Robert F.
Принадлежит:

Computerized apparatus useful for obtaining and presenting information to users. In one embodiment, the computerized apparatus includes a display device and speech recognition apparatus configured to receive user speech input and enable performance of various tasks, such as obtaining desired information relating to an entity, maps or directions, weather, news, or any number of other topics. The obtained data may also, in one variant, be displayed with contextually related content. In another variant, retrieved data can be downloaded to a portable user device. 140-. (canceled)41. Computer readable apparatus configured to aid a user in locating an organization or business entity , the apparatus being part of a computerized information system , the apparatus comprising a storage medium having at least one computer program configured to run on at least one processor , the at least one program configured to , when executed on the at least one processor:obtain a representation of a first speech input from the user, the first speech input received via at least one microphone of the computerized information system and relating to at least part of a name of a desired organization or business entity;cause use of at least a speech recognition apparatus to process the representation to identify at least one word therein;cause use at least the identified at least one word to identify a plurality of possible matches for the at least part of the name;receive a subsequent input from the user in order to identify of one of the plurality of possible matches which best correlates to the desired organization or business entity; andcause presentation of a visual representation of a location associated with the identified one of the plurality of possible matches, as well as at least immediate surroundings of the location and one or more other organizations or business entities proximate to the location, on a display device in communication with the computerized information system and ...

Подробнее
04-02-2016 дата публикации

Estimation of target character train

Номер: US20160034446A1
Автор: Kazuhiko Yamamoto
Принадлежит: Yamaha Corp

A desired character train included in a predefined reference character train, such as lyrics, is set as a target character train, and a user designates a target phoneme train that is indirectly representative of the target character train by use of a limited plurality of kinds of particular phonemes, such as vowels and a particular consonants. A reference phoneme train indirectly representative of the reference character train by use of the particular phonemes is prepared in advance. Based on a comparison between the target phoneme train and the reference phoneme train, a sequence of the particular phonemes in the reference phoneme train that matches the target phoneme train is identified, and a character sequence in the reference character train that corresponds to the identified sequence of the particular phonemes is identified. The thus-identified character sequence estimates the target character train.

Подробнее
17-02-2022 дата публикации

TRANSIENT PERSONALIZATION MODE FOR GUEST USERS OF AN AUTOMATED ASSISTANT

Номер: US20220051663A1
Принадлежит:

Implementations set forth herein relate to an automated assistant that can operate in a transient personalization mode, and/or assist a separate automated assistant with providing output according to a transient personalization mode. The transient personalization mode can allow a guest user of an assistant enabled-device to receive personalized responses from the assistant-enabled device—despite not being signed into the assistant-enabled device. A host automated assistant of the assistant-enabled device can securely communicate with a guest user's automated assistant through a backend process. In this way, input queries from the guest user to the host automated assistant can be personalized according to the guest automated assistant—without the guest user directly engaging with their own personal device. 1. A method implemented by one or more processors , the method comprising: wherein each of the first computing device and the second computing device are located in a common environment and provide access to a respective automated assistant, and', 'wherein the second computing device encrypts the request using signature data that is generated by the second computing device using a biometric signature that corresponds to the user;, 'receiving, at a first computing device, a request for the first computing device to process a spoken utterance that was submitted by a user to a second computing device,'}processing, by the first computing device, the request from the second computing device to identify one or more assistant requests embodied in the request;generating, by the first computing device, assistant response data characterizing one or more automated assistant responses that are responsive to the one or more assistant requests; andcausing, by the first computing device, the second computing device to render the one or more automated assistant responses for the user using the assistant response data.2. The method of claim 1 , wherein processing the request from ...

Подробнее
09-02-2017 дата публикации

VISUAL LIVENESS DETECTION

Номер: US20170039440A1
Принадлежит:

In an approach for visual liveness detection, a video-audio signal related to a speaker speaking a text is obtained. The video-audio signal is split into a video signal which records images of the speaker and an audio signal which records a speech spoken by the speaker. Then a first sequence indicating visual mouth openness is obtained from the video signal, and a second sequence indicating acoustic mouth openness is obtained based on the text and the audio signal. Synchrony between the first and second sequences is measured, and the liveness of the speaker is determined based on the synchrony. 1. A method for visual liveness detection , the method comprising:obtaining, by one or more computer processors, a video-audio signal related to a speaker speaking a text;splitting, by one or more computer processors, the video-audio signal into a video signal which records images of the speaker and an audio signal which records a speech spoken by the speaker;obtaining, by one or more computer processors, a first sequence indicating visual mouth openness from the video signal;obtaining, by one or more computer processors, a second sequence indicating acoustic mouth openness based on the text and the audio signal;measuring, by one or more computer processors, synchrony between the first sequence and the second sequence; anddetermining, by one or more computer processors, liveness of the speaker based on the synchrony.2. The method of claim 1 , wherein obtaining a first sequence comprises:sampling, by one or more computer processors, the video signal to obtain video frames;determining, by one or more computer processors, a mouth-open-status of the speaker for a video frame of the video signal; andsequencing, by one or more computer processors, the mouth-open-statuses of the video frames of the video signal in time order to generate the first sequence.3. The method of claim 1 , wherein obtaining a second sequence comprises:segmenting, by one or more computer processors, the text ...

Подробнее
07-02-2019 дата публикации

AUTOMATIC SPEECH IMITATION

Номер: US20190043472A1
Автор: Garcia Jason
Принадлежит:

Embodiments of systems, apparatuses, and/or methods are disclosed for automatic speech imitation. An apparatus may include a machine learner to perform an analysis of tagged data that is to be generated based on a speech pattern and/or a speech context behavior in media content. The machine learner may further generate, based on the analysis, a trained speech model that is to be applied to the media content to transform speech data to mimic data. The apparatus may further include a data analyzer to perform an analysis of the speech pattern, the speech context behavior, and/or the tagged data. The data analyzer may further generate, based on the analysis, a programmed speech rule that is to be applied to transform the speech data to the mimic data. 1. A computer system comprising: [ perform a first analysis of tagged data that is to be generated based on one or more of a speech pattern or a speech context behavior in media content; and', 'generate, based on the first analysis, a trained speech model that is to be applied to transform speech data to mimic data; or, 'a machine learner to, perform a second analysis of one or more of the speech pattern, the speech context behavior, or the tagged data; and', 'generate, based on the second analysis, a programmed speech rule that is to be applied to transform the speech data to the mimic data; and, 'a data analyzer to], 'a training data provider to provide training data including one or more ofa speech device to output imitated speech based on the mimic data.2. The system of claim 1 , further including claim 1 ,a speech pattern identifier to identify one or more of an ordered speech pattern, a literary point of view, or a disordered speech pattern in the media content; anda context behavior identifier to identify one or more of a trained behavior, a replacement behavior, or an additive behavior in the media content.3. The system of claim 1 , further including a media content tagger to:modify the media content with a speech ...

Подробнее
16-02-2017 дата публикации

TEXT-TO-SPEECH METHOD AND MULTI-LINGUAL SPEECH SYNTHESIZER USING THE METHOD

Номер: US20170047060A1
Принадлежит:

A text to-speech method and a multi-lingual speech synthesizer using the method are disclosed. The multi-lingual speech synthesizer and the method executed by processor are applied for processing a multi-lingual text message in a mixture of a first language and a second language into a multi-lingual voice message. The multi-lingual speech synthesizer comprises a storage device configured to store a first language model database, second language model database a broadcasting device configured to broadcast the multi-lingual voice message, and a processor, connected to the storage de ice and the broadcasting device, configured to ex cute the method disclosed herein. 2. The text-to-speech method of claim 1 , wherein when the one of the at least one first language phoneme label sequence is in front of the one of the at least one second language phoneme label sequence claim 1 , the step of producing the inter-lingual connection tone information comprises:replacing a first phoneme label of the at least one second language phoneme label sequence with a corresponding phoneme label of the first language phoneme labels which has a closest pronunciation to the first phoneme label of the at least one second language phoneme label sequence; andlooking up the first language model database using the corresponding, phoneme label of the first language phoneme labels thereby obtaining a corresponding cognate connection tone information of the first language model database between a last phoneme label of the at least one first language phoneme label sequence and the corresponding phoneme label of the first language phoneme labels, wherein the corresponding cognate connection tone information of the first language model database serves as the inter-lingual connection tone information at the boundary between the one of the at least one first language phoneme label sequence and the one of the at least one second language phoneme label sequence.3. The text-to-speech method of claim 1 , ...

Подробнее
16-02-2017 дата публикации

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM

Номер: US20170047064A1
Автор: KIRIHARA REIKO
Принадлежит: SONY CORPORATION

There is provided an information processing device, an information processing method, and a program that can allow a user to intuitively recognize other information corresponding to a speech output, the information processing device including: a control unit configured to control an output of other information different from a speech output related to a predetermined function on the basis of timing information on timing at which the speech output of an expression related to the function among a set of expressions is made, the set of expressions including the expression related to the function. 1. An information processing device comprising:a control unit configured to control an output of other information different from a speech output related to a predetermined function on the basis of timing information on timing at which the speech output of an expression related to the function among a set of expressions is made, the set of expressions including the expression related to the function.2. The information processing device according to claim 1 , whereinthe other information is display information displayed on a display unit.3. The information processing device according to claim 2 , whereinthe control unit controls a speech output of the set of expressions.4. The information processing device according to claim 3 , whereinthe control unit controls the speech output of the set of expressions on the basis of speech synthesis processing performed by a speech synthesis unit.5. The information processing device according to claim 4 , whereinthe speech synthesis processing is processing executed on the basis of a speech input of a user.6. The information processing device according to claim 5 , whereinthe set of expressions is generated in accordance with semantic content indicated by the speech input of the user.7. The information processing device according to claim 4 , whereinthe control unit controls an output of display information related to the function in ...

Подробнее
19-02-2015 дата публикации

COMPUTER GENERATED EMULATION OF A SUBJECT

Номер: US20150052084A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject's face and voice; 1. A system for emulating a subject , to allow a user to interact with a computer generated talking head with the subject's face and voice;said system comprising a processor, a user interface and a personality storage section,the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject's face and output speech from the mouth of the face with the subject's voice, the user interface further comprising a receiver for receiving a query from the user, the emulated subject being configured to respond to the query received from the user,the processor comprising a dialogue section and a talking head generation section,wherein said dialogue section is configured to generate a response to a query inputted by a user from the user interface and generate a response to be outputted by the talking head, the response being generated by retrieving information from said personality storage section, said personality storage section comprising content created by or about the subject,and said talking head generation section is configured to:convert said response into a sequence of acoustic units, the talking head generation section further comprising a statistical model, said statistical model comprising a plurality of model parameters, said model parameters being derived from said personality storage section, the model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject's face and said speech vector comprising a plurality of parameters which define the subject's voice, the talking head generation section being further configured to output a sequence of speech vectors and image vectors which are synchronised such that the head appears to talk.2. A system ...

Подробнее
03-03-2022 дата публикации

Dialog apparatus, method and program for the same

Номер: US20220067300A1
Автор: Hiroaki Sugiyama
Принадлежит: Nippon Telegraph and Telephone Corp

A dialog apparatus and the like are provided which achieve a natural dialog in which topics minutely connect, starting from a user utterance, by making, after a robot's response to the user utterance, robots exchange additional questions and answers, which reflect the contents of the response, between them. A plurality of sets of four utterances, each set being made up of four sentences: an assumed user utterance sentence, a response sentence for user utterance, a subsequent utterance sentence for these sentences, and a subsequent response sentence as a unit, are stored, and, in response to input of text data corresponding to a user utterance, the dialog apparatus performs control such that any one of a plurality of agents utters, of a set of four utterances that begins with an assumed user utterance sentence which is similar to the text data corresponding to the user utterance, each of a response sentence for user utterance, a subsequent utterance sentence for these sentences, and a subsequent response sentence, different agents utter the response sentence for user utterance and the subsequent utterance sentence, and different agents utter the subsequent utterance sentence and the subsequent response sentence.

Подробнее
25-02-2016 дата публикации

SYSTEM AND METHOD FOR AUTOMATICALLY CONVERTING TEXTUAL MESSAGES TO MUSICAL COMPOSITIONS

Номер: US20160055838A1
Принадлежит:

A method for converting textual messages to musical messages comprising receiving a text input and receiving a musical input selection. The method includes analyzing the text input to determine text characteristics and analyzing a musical input corresponding to the musical input selection to determine musical characteristics. Based on the text characteristic and the musical characteristic, the method includes correlating the text input with the musical input to generate a synthesizer input, and sending the synthesizer input to a voice synthesizer. The method includes receiving a vocal rendering of the text input from the voice synthesizer, generating a musical message from the vocal rendering and the musical input, and outputting the musical message. 1. A computer implemented method for automatically converting textual messages to musical messages , the computer implemented method comprising:receiving a text input;receiving a musical input selection;analyzing, via one or more processors, the text input to determine at least one text characteristic of the text input;analyzing, via the one or more processors, a musical input corresponding to the musical input selection to determine at least one musical characteristic of the musical input;based on the at least one text characteristic and the at least one musical characteristic, correlating, via the one or more processors, the text input with the musical input to generate a synthesizer input;sending the synthesizer input to a voice synthesizer;receiving, from the voice synthesizer, a vocal rendering of the text input;generating a musical message from the vocal rendering of the text input and the musical input; andoutputting the musical message.2. The method of claim 1 , wherein receiving the text input further comprises receiving the text input from a client device via a digital communications network.3. The method of claim 1 , wherein outputting the musical message further comprises sending the musical message to a ...

Подробнее
25-02-2016 дата публикации

System and Method for Enhancing Locative Response Abilities of Autonomous and Semi-Autonomous Agents

Номер: US20160055843A1
Автор: Gupta Kalyan M.
Принадлежит:

A computer system and method according to the present invention can receive multi-modal inputs such as natural language, gesture, text, sketch and other inputs in order to simplify and improve locative question answering in virtual worlds, among other tasks. The components of an agent as provided in accordance with one embodiment of the present invention can include one or more sensors, actuators, and cognition elements, such as interpreters, executive function elements, working memory, long term memory and reasoners for responses to locative queries, for example. Further, the present invention provides, in part, a locative question answering algorithm, along with the command structure, vocabulary, and the dialog that an agent is designed to support in accordance with various embodiments of the present invention. 1. A system for locating a virtually displayed object in a virtual environment , comprising:at least one input device adapted to receive at least one of speech, gesture, text and touchscreen inputs; and receiving user input via the at least one input device, wherein the user input comprises a query regarding a location of at least one target object in the virtual environment;', 'interfacing with the virtual environment using a virtual agent;', 'sensing, by the virtual agent, at least one candidate landmark object and the at least one object in the virtual environment; and', 'deriving an optimum natural language response describing the location of the at least one target object in response to the query by determining a minimum response processing cost based on at least one of: the respective locations of the at least one target object and the at least one candidate landmark object, the respective sizes of the at least one target object and the at least one candidate landmark object, the visibility of the at least one target object to the agent from the at least one candidate landmark object and a spatial relationship between the at least one target object ...

Подробнее
13-02-2020 дата публикации

TAILORING AN INTERACTIVE DIALOG APPLICATION BASED ON CREATOR PROVIDED CONTENT

Номер: US20200051568A1
Принадлежит:

Implementations relate to executing a tailored version of a dynamic interactive dialog application, where the tailored version is tailored based on structured content that is specified by a creator of the tailored version. Executing the tailored version of the interactive dialog application can be in response to receiving, via an assistant interface of an assistant application, an invocation phrase assigned to the tailored version and/or other user interface input that identifies the tailored version. In some implementations, a tailored version of a dynamic interactive dialog application is executed with persona value(s) that are specified by a creator of the tailored version and/or that are predicted based on structured content and/or other input provided by the creator in creating the tailored version. In some implementations, structured content and/or other input provided by a creator in creating a tailored version of an interactive dialog application is utilized in indexing the tailored version. 1. A method implemented by one or more processors , comprising: an indication of a dynamic interactive dialog application,', 'structured content for executing a tailored version of the dynamic interactive dialog application, and', 'wherein the indication, the structured content, and the at least one invocation phrase are transmitted in one or more data packets generated by a client device of a user in response to interaction with the client device by the user;', 'at least one invocation phrase for the tailored version of the dynamic interactive dialog application,'}], 'receiving, via one or more network interfacesprocessing one, or both of: the indication, and the structured content, to automatically select a plurality of persona values for the tailored version of the interactive dialog application, 'wherein the persona values include particular terms or phrases to be provided during execution of the tailored version of the interactive dialog application, and include non ...

Подробнее
03-03-2016 дата публикации

SYSTEMS AND METHODS FOR NOISE REDUCTION USING SPEECH RECOGNITION AND SPEECH SYNTHESIS

Номер: US20160064008A1
Автор: Graham Derek
Принадлежит: CLEARONE INC.

The present disclosure describes a system () for reducing background noise from a speech audio signal generated by a user. The system () includes a user device () receiving the speech audio signal, a noise reduction device () in communication with a stored data repository (), where the noise reduction device is configured to convert the speech audio signal to text; generate synthetic speech based on the converted text; optionally determine the user as an actual subscriber based on a comparison between the speech audio signal with the synthetic speech; and selectively transmit the speech audio signal or the synthetic speech based on comparison between the predicted subjective quality of the recorded speech and the synthetic speech. 1. A system using a user device in communication with a stored data repository , that reduces the background noise from a speech audio signal generated by a user , comprising:a user device, with a processor and a memory, receiving a speech audio signal; and convert said received speech audio signal to text;', 'generate synthetic speech based on a speech data corpus or speech model data of the user stored in said stored data repository and said converted text;', 'determine the predicted subjective quality of the received speech audio signal if that signal were to be transmitted to a far end listener;', 'determine the predicted subjective quality of said synthetic speech; and', 'transmit, selectively, said speech audio signal or said synthetic speech, whichever has higher predicted quality based on a comparison between the value of objective quality metrics computed for the speech audio signal and the synthetic speech signal., 'a noise reduction device, in communication with a stored data repository, and in communication with said user device, is configured to2. The claim according to claim 1 , wherein said stored data repository is on said user device and or a server via a network.3. The claim according to claim 1 , wherein said received ...

Подробнее
20-02-2020 дата публикации

System and method for analyzing partial utterances

Номер: US20200058295A1
Принадлежит: Accenture Global Solutions Ltd

The system and method generally include identifying whether an utterance spoken by a user (e.g., customer) is a complete or incomplete sentence. For example, the system may include a partial utterance detection module that determines whether an utterance spoken by a user is a partial utterance. The detection process may include providing a detection advice code that gives a recommendation for handling the utterance of interest. If it is determined that the utterance is an incomplete sentence, then the system and method can identify the type of utterance. For example, the system may include a partial utterance classification module that predicts the class of a partial utterance. The classification process may include providing a classification advice code that gives a recommendation for handling the utterance of interest. Once a partial utterance is detected and classified, the system and method can further determine what the user meant by the utterance and can recommend a response to the user's utterance that further advances a conversation between the user and a virtual agent.

Подробнее
05-03-2015 дата публикации

IMAGE PROCESSING METHOD AND ELECTRONIC DEVICE THEREOF

Номер: US20150066511A1
Автор: BAEK In-Ho
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

A method and an electronic device for processing an image in an electronic device are provided. The method includes determining whether the electronic device is mounted in a cradle comprising at least one guide region, scanning an image in the guide region using a camera, and outputting the scanned image or image information based on the scanned image. The method can easily process the image and provide the user with the output information. Therefore, the output information is favorable to the blind people or the illiterate, and the usability and the reliability of the electronic device can be enhanced. 1. A method in an electronic device , the method comprising:determining whether the electronic device is mounted in a cradle comprising at least one guide region;scanning an image in the guide region using a camera; andoutputting the scanned image or image information based on the scanned image.2. The method of claim 1 , wherein the determining of whether the electronic device is mounted in the cradle comprises receiving a signal from a sensor of the electronic device that detects a target of the cradle.3. The method of claim 2 , wherein the target includes a magnet and the sensor includes one of a hall sensor or a reed switch for detecting a magnetic force of the magnet.4. The method of claim 1 , further comprising:after scanning the image in the guide region, converting text of the scanned image into text data using an Optical Character Reader (OCR) function.5. The method of claim 4 , further comprising:after converting the image to the text data, automatically converting the converted text data into voice data using a Text to Speech (TTS) function.6. The method of claim 1 , wherein the outputting of the scanned image or the image information based on the scanned image comprises:displaying a corresponding image or image information on a display of the electronic device.7. The method of claim 6 , wherein the displayed image or image information is resized and output ...

Подробнее
28-02-2019 дата публикации

COMPRESSION OF WORD EMBEDDINGS FOR NATURAL LANGUAGE PROCESSING SYSTEMS

Номер: US20190065486A1
Принадлежит: Microsoft Technology Licensing, LLC

Described herein are systems and methods that provide a natural language processing system (NLPS) that employs compressed word embeddings. An auto-encoder that includes encoder circuitry and decoder circuitry can be used to produce the compressed word embeddings. The decoder circuitry is trained to decompress the word embeddings with reduced or minimal differences between the original uncompressed word embeddings and the corresponding decompressed word embeddings. One or more parameters of the trained decoder circuitry are transferred to the NLPS, where the NLPS is then trained using the compressed word embeddings to improve the correctness of the responses or actions determined by the NLPS. 1. A system , comprising: encoder circuitry; and', 'decoder circuitry operably connected to the encoder circuitry; and, 'an auto-encoder processing unit comprising compressing, by the encoder circuitry, one or more uncompressed word embeddings to produce one or more compressed word embeddings for use in a natural language processing system;', 'decompressing, by the decoder circuitry, the one or more compressed word embeddings to produce one or more decompressed word embeddings; and', 'a second storage device storing one or more parameters of the decoder circuitry., 'a first storage device storing computer executable instructions that when executed by the auto-encoder processing unit, performs a method comprising2. The system of claim 1 , wherein the auto-encoder further comprises activation function circuitry operably connected to the encoder circuitry and the operation of compressing the one or more uncompressed word embeddings comprises compressing claim 1 , by the encoder circuitry and the activation circuitry claim 1 , the one or more uncompressed word embeddings to produce the one or more compressed word embeddings.3. The system of claim 2 , wherein the auto-encoder comprises a multi-layer neural network with the encoder circuitry comprising a first layer claim 2 , the ...

Подробнее
10-03-2016 дата публикации

METHOD AND SYSTEM TO AUTOMATICALLY GENERATE MEANINGFUL STATEMENTS IN PLAIN NATURAL LANGUAGE FROM QUANTITATIVE PERSONALIZED CONTENT FOR PATIENT CENTRIC TOOLS

Номер: US20160070867A1
Принадлежит:

A system () and method () translates quantitative personalized decision content to natural language. Quantitative personalized decision content for a patient is received. The quantitative personalized decision content includes quantitative outcomes of treatment options. Contributing factors for the quantitative outcomes are determined. The treatment options are ranked based on a quantitative measure of the quantitative outcomes. Natural language explanations are presented to the patient describing the most highly ranked treatment option in terms of the contributing factors. 1. A system for translating quantitative personalized decision content to natural language , said system comprising: receive quantitative personalized decision content for a patient, the quantitative personalized decision content including quantitative outcomes of treatment options;', 'determine contributing factors for the quantitative outcomes;', 'rank the treatment options based on a quantitative measure of the quantitative outcomes; and', 'present natural language explanations to the patient describing the most highly ranked treatment option in terms of the contributing factors., 'at least one processor programmed to2. The system according to claim 1 , wherein the at least one processor is further programmed to:select the most highly ranked treatment option; and select the next most highly ranked treatment option; and', 'present natural language explanations to the patient describing the contributing factors of the selected treatment option., 'until the difference between the quantitative outcome of the selected treatment option and the quantitative outcome of the next most highly ranked treatment option in terms of the quantitative measure exceeds a threshold, repeatedly3. The system according to claim 2 , wherein the at least one processor is further programmed to:categorize the presented natural language explanations into positive and negative arguments towards the treatment options; ' ...

Подробнее
28-02-2019 дата публикации

AUDIO DATA LEARNING METHOD, AUDIO DATA INFERENCE METHOD AND RECORDING MEDIUM

Номер: US20190066657A1
Принадлежит:

Provided is an audio data processing system that performs processing at high speed and obtains high-quality audio data in audio data processing using a raw audio generative model. An audio data learning apparatus of the audio data learning processing system divides full band waveform data into subband signals and performs model-learning (optimization) using the divided subband signal by a subband learning model unit. In an audio data inference apparatus, a subband learned model unit receiving at least one of an auxiliary input and subband signals performs inference processing in parallel, and a subband synthesis unit synthesizes subband signals after processing. This allows the audio data processing system to perform audio data processing using the raw audio generative at high speed. 1. An audio data learning method comprising:a subband dividing step of obtaining a subband signal by performing processing to limit frequency bands with respect to audio data;a down-sampling processing step of performing down-sampling processing on the subband signal by thinning out sample data obtained by sampling a signal value of the subband signal with a sampling frequency; anda subband learning step of performing learning of a raw audio generative model using an auxiliary data and the subband data obtained by the down-sampling step.2. The audio data learning method according to claim 1 , wherein{'sub': 1', 'k', 'N, 'the subband dividing step obtains N subband signals (N is a natural number) as a first subband signal x_sub, . . . , a k-th subband signal x_sub(k is a natural number satisfying 1≤k≤N), . . . , an N-th subband signal x_sub,'}{'sub': 1', 'k', '1', 'k', 'N, 'the down-sampling processing step obtains signals obtained by performing down-sampling on the filter subband signal x_sub, . . . , the k-th subband signal x_sub(k is a natural number satisfying 1≤k≤N) as a first down-sampling subband signal x_d, . . . , a k-th down-sampling subband signal x_d, . . . , an N-th down- ...

Подробнее
12-03-2015 дата публикации

DEEP NETWORKS FOR UNIT SELECTION SPEECH SYNTHESIS

Номер: US20150073804A1
Принадлежит: GOOGLE INC.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample. 1. A method comprising:receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features;determining a distance between the target acoustic features and acoustic features of a stored acoustic sample;selecting the acoustic sample to be used in speech synthesis based at least on the determined distance; andsynthesizing speech based on the selected acoustic sample.2. The method of claim 1 , further comprising:providing the synthesized speech for output.3. The method of claim 1 , wherein the target acoustic features comprise aplurality of values describing acoustic characteristics.4. The method of claim 3 , wherein determining a distance between the target acoustic features and acoustic features of a stored acoustic sample comprises:calculating an Euclidean distance between a point represented by the values of the target acoustic features and a point represented by values describing the acoustic features of the stored acoustic sample.5. The method of claim 1 , wherein selecting the acoustic sample to be used in speech synthesis based on at least the determined distance comprises:determining the acoustic sample corresponds to a cost based on the determined distance that is less than or equal to costs based on other ...

Подробнее
17-03-2016 дата публикации

TEXT-TO-SPEECH WITH EMOTIONAL CONTENT

Номер: US20160078859A1
Автор: HE LEI, Leung Max, Luan Jian
Принадлежит:

Techniques for converting text to speech having emotional content. In an aspect, an emotionally neutral acoustic trajectory is predicted for a script using a neutral model, and an emotion-specific acoustic trajectory adjustment is independently predicted using an emotion-specific model. The neutral trajectory and emotion-specific adjustments are combined to generate a transformed speech output having emotional content. In another aspect, state parameters of a statistical parametric model for neutral voice are transformed by emotion-specific factors that vary across contexts and states. The emotion-dependent adjustment factors may be clustered and stored using an emotion-specific decision tree or other clustering scheme distinct from a decision tree used for the neutral voice model. 1. An apparatus for text-to-speech conversion comprising:a neutral generation block configured to generate an emotionally neutral representation of a script, the emotionally neutral representation comprising at least one parameter associated with each of a plurality of phonemes; andan adjustment block configured to adjust the at least one parameter distinctly for each of the plurality of phonemes based on an emotion type to generate a transformed representation.2. The apparatus of claim 1 , further comprising a vocoder configured to synthesize a speech waveform from the transformed representation.3. The apparatus of claim 1 , each phoneme comprising a plurality of frames claim 1 , the at least one parameter comprising a fundamental frequency associated with each frame of each phoneme claim 1 , the adjustment block configured to adjust the fundamental frequency distinctly for each of the plurality of frames by adding an adjustment factor based on the emotion type and the linguistic-contextual identity of each phoneme.4. The apparatus of claim 1 , each phoneme comprising a plurality of frames claim 1 , the at least one parameter comprising a spectral coefficient associated with each frame ...

Подробнее
24-03-2022 дата публикации

SYSTEMS AND METHODS FOR SHORT- AND LONG-TERM DIALOG MANAGEMENT BETWEEN A ROBOT COMPUTING DEVICE/DIGITAL COMPANION AND A USER

Номер: US20220092270A1
Принадлежит:

Systems and methods for managing conversations between a robot computing device and a user are disclosed. Exemplary implementations may: initiate a first-time user experience sequence with the user; teach the user the robot computing capabilities and/or characteristics; initiate, utilizing a dialog manager, a conversation with the user; receive, one or more command files from the user via one or more microphones; and generate conversation response files and communicating the generated conversation files to the dialog manager in response to the one or more received user global command files to initiate an initial conversation exchange. 2. The method of claim 1 , wherein executing the computer-readable instructions further comprising continuing to engage in conversation exchanges with the user by receiving communication files from the user claim 1 , generating associated conversation response files and communicating the conversation response files to the user.3. The method of claim 1 , wherein executing the computer-readable instructions further comprising:receiving tangential conversation files from the user, the tangential conversation files not being responsive to the conversation response files generated by the robot computing device,engaging in one or more tangential conversation exchanges with the user by generating tangential responsive conversation files, the tangential responsive conversation files responsive to the tangential conversation files, andgenerating, at the one or more speakers, audible tangential responses to the user based at least in part on the tangential responsive conversation files; andreturning to the initial conversation exchange with the user upon completion of the one or more tangential conversation exchanges.4. The method of claim 3 , wherein executing the computer-readable instructions further comprising further comprising utilizing a hierarchical conversation stack module to allow the user to engage in the one or more tangential ...

Подробнее
15-03-2018 дата публикации

Method and system for Using A Vocal Sample to Customize Text to Speech Applications

Номер: US20180075838A1
Автор: Mason Paul Wendell
Принадлежит:

Apparatus and methods consistent with the present invention measure one or more of the characteristics of a voice recording and use such measurements to create a synthetic voice that approximates the recorded voice and uses such created synthetic voice to verbalize the content of an electronically conveyed written message such as an SMS text message. The vocal characteristics measured may include frequency, timbre, intensity, rhythm, and rate of speech as well as others. 1. A method comprising:receiving, via a client application interface, a recorded sample of a sender's voice;measuring the vocal characteristics of the recorded sample of the sender's voice including its frequency, intensity, rhythm and rate of speech;receiving a text-based message originating from the sender;converting the text-based message to a speech format wherein the measured vocal characteristics are used to form a synthetic voice that approximates the voice of the sender; andsending an audio file of the sender's message as converted to an address that corresponds to the address of the text-based message.2. The method of wherein the recorded sample of the sender's voice is made by sampling at a rate of at least 40 claim 1 ,000 Hertz.3. The method of wherein the sample of the sender's voice consists of a sequence of predetermined words.4. The method of wherein the recorded sample is at least 20 syllables long.5. The method of wherein the sample of the sender's voice comprises the sender's voicemail greeting.6. The method of wherein the sender's voicemail greeting is accessed telephonically.7. The method of wherein one or more acronyms in the text-based message are audibly expressed as full words or phrases.8. The method of wherein the measured vocal characteristics include timbre.9. The method of wherein profane words are filtered out of the audio file of the sender's message.10. A method claim 1 , comprising:recording, with a sender device, a sample of a sender's voice;receiving, with a ...

Подробнее
19-03-2015 дата публикации

PROSODY EDITING DEVICE AND METHOD AND COMPUTER PROGRAM PRODUCT

Номер: US20150081306A1
Принадлежит:

According to an embodiment, a prosody editing device includes an approximate contour generator, a setter, a display controller, an operation receiver, and an updater. The approximate contour generator approximates a contour representing a time series of prosody information with a parametric curve including a control point to generate an approximate contour. The setter sets, on the approximate contour, an operation point corresponding to the control point. The display controller displays, on a display device, an operation screen including the approximate contour on which the operation point is shown. The operation receiver receives an operation to move the operation point optionally selected on the operation screen. The updater calculates a position of the control point from a moving amount of the operation point and updates the approximate contour. 1. A prosody editing device comprising:an approximate contour generator to approximate a contour representing a time series of prosody information with a parametric curve including a control point to generate an approximate contour;a setter to set, on the approximate contour, an operation point corresponding to the control point;a display controller to display, on a display device, an operation screen including the approximate contour on which the operation point is shown;an operation receiver to receive an operation to move the operation point optionally selected on the operation screen; andan updater to calculate a position of the control point from a moving amount of the operation point and update the approximate contour.2. The device according to claim 1 , further comprising a speech synthesizer to generate a synthetic speech by using the approximate contour.3. The device according to claim 1 , wherein the approximate contour generator generates the approximate contour by using a Bézier curve as the parametric curve.4. The device according to claim 1 , wherein when a position of the control point in a time-axis ...

Подробнее
05-03-2020 дата публикации

MULTIMEDIA PROCESSING METHOD AND ELECTRONIC SYSTEM

Номер: US20200074980A1
Принадлежит:

An electronic system is provided. The electronic system includes a host, an audio output device and a display. The host includes an audio processing module, a relay processing module, a smart interpreter engine and a driver. The audio processing module is utilized for acquiring audio data corresponding to a first language from audio streams processed by an application program executed on the host. The smart interpreter engine is utilized for converting the audio data corresponding to the first language into text data corresponding to a second language. The relay processing module is utilized for transmitting the text data corresponding to the second language to the display for displaying. The driver is utilized for converting the data corresponding to the first language into an analog audio signal corresponding to the first language and transmitting the analog audio signal corresponding to the first language to the audio output device for playback. 1. An electronic system , comprising: an audio processing module for acquiring audio data corresponding to a first language from audio streams processed by an application program executed on the host;', 'a relay processing module, for receiving the audio data corresponding to the first language from the audio processing module;', 'a smart interpreter engine for receiving the audio data corresponding to the first language from the relay processing module and converting the audio data corresponding to the first language into text data corresponding to a second language, wherein the smart interpreter engine transmits the text data corresponding to the second language to the relay processing module; and', 'a driver for converting the audio data corresponding to the first language into an analog speech signal corresponding to the first language;, 'a host, comprisingan audio output device for playing the analog speech signal corresponding to the first language; anda display for receiving the text data corresponding to the ...

Подробнее
18-03-2021 дата публикации

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM

Номер: US20210082392A1
Принадлежит:

[Problem] Provided are an information processing device, an information processing method, and a program that are able to audibly present, to a user, word-of-mouth information in accordance with the user's latent demand. [Solving mean] Proposed is an information processing device including a controller that performs control to estimate a latent demand on the basis of a current user condition, search for word-of-mouth information corresponding to the demand, and present the searched word-of-mouth information to a user. 1. An information processing device comprising a controller that performs control to estimate a latent demand on a basis of a current user condition , search for word-of-mouth information corresponding to the demand , and present the searched word-of-mouth information to a user.2. The information processing device according to claim 1 , wherein the controller searches word-of-mouth information submitted by another user in a region around the user.3. The information processing device according to claim 1 , wherein the controller performs control to convert the searched word-of-mouth information into a predetermined form and present it.4. The information processing device according to claim 3 , wherein the controller makes a volume level of the word-of-mouth information lower than a predetermined value.5. The information processing device according to claim 4 , wherein the controller causes an ambient sound to overlap the word-of-mouth information.6. The information processing device according to claim 3 , wherein the controller converts the word-of-mouth information into vibration.7. The information processing device according to claim 3 , wherein the controller converts the word-of-mouth information into smell.8. The information processing device according to claim 3 , wherein the controller converts the word-of-mouth information into an associated melody.9. The information processing device according to claim 1 , further comprising a sound output unit ...

Подробнее
24-03-2016 дата публикации

Speech Recognition Model Construction Method, Speech Recognition Method, Computer System, Speech Recognition Apparatus, Program, and Recording Medium

Номер: US20160086599A1
Принадлежит:

A construction method for a speech recognition model, in which a computer system includes; a step of acquiring alignment between speech of each of a plurality of speakers and a transcript of the speaker; a step of joining transcripts of the respective ones of the plurality of speakers along a time axis, creating a transcript of speech of mixed speakers obtained from synthesized speech of the speakers, and replacing predetermined transcribed portions of the plurality of speakers overlapping on the time axis with a unit which represents a simultaneous speech segment; and a step of constructing at least one of an acoustic model and a language model which make up a speech recognition model, based on the transcript of the speech of the mixed speakers. 1. A construction method for a speech recognition model , comprising the steps , executed by a computer system , of:acquiring alignment between speech of each of a plurality of speakers and a transcript of the speaker;joining transcripts of the respective ones of the plurality of speakers along a time axis, creating a transcript of speech of mixed speakers obtained from synthesized speech of the speakers, and replacing predetermined transcribed portions of the plurality of speakers overlapping on the time axis with a unit which represents a simultaneous speech segment; andconstructing at least one of an acoustic model and a language model which make up a speech recognition model, based on the transcript of the speech of the mixed speakers.2. The construction method according to claim 1 , wherein the constructing step comprises a step claim 1 , executed by the computer system claim 1 , of defining an acoustic unit associated with a unit which represents the simultaneous speech segment and configured to represent the simultaneous speech segment claim 1 , and constructing the acoustic model which makes up the speech recognition model and contains the acoustic unit.3. The construction method according to claim 2 , wherein the ...

Подробнее
12-06-2014 дата публикации

TENNIS UMPIRE

Номер: US20140163990A1
Автор: Street Christopher
Принадлежит:

A system for communicating game information in real time by audibly announcing game information in human speech and in a preferred embodiment also visually displaying game information, the system comprising an electronic umpire unit that receives game change information transmitted by an electronic point transmitter that may be worn by a game player or spectator. 1. An electronic umpire system for visually displaying and audibly announcing game information in real time comprising:a point transmitter;said point transmitter having a point signal actuator for generating a point signal representing a change in game information;said point transmitter having a transmitter for transmitting said point signal;an umpire unit;said umpire unit having a receiver for receiving said transmitted point signal;said umpire unit having a processor for producing a game information update based on said received point signal;said umpire unit having a speech synthesizer for generating a human speech representation of said game information update;said umpire unit having a first audio speaker for audibly transmitting said human speech representation of said game information update; andan umpire unit carrier.2. The electronic umpire system of further comprising said umpire unit having a first visual display for visually displaying said game information update on a first display side of said umpire unit.3. The electronic umpire system of further comprising a second visual display for visually displaying said game information update on a second display side of said umpire unit.4. The electronic umpire system of further comprising said first audio speaker being on said first display side claim 3 , and a second audio speaker on said second display side.5. The electronic umpire system of further comprising said point transmitter having a separate point signal actuator for each team in a game.6. The electronic umpire system of further comprising said point transmitter having a user attachment means ...

Подробнее
12-06-2014 дата публикации

FACILITATING TEXT-TO-SPEECH CONVERSION OF A DOMAIN NAME OR A NETWORK ADDRESS CONTAINING A DOMAIN NAME

Номер: US20140163993A1
Принадлежит: BlackBerry Limited

To facilitate text-to-speech conversion of a username, a first or last name of a user associated with the username may be retrieved, and a pronunciation of the username may be determined based at least in part on whether the name forms at least part of the username. To facilitate text-to-speech conversion of a domain name having a top level domain and at least one other level domain, a pronunciation for the top level domain may be determined based at least in part upon whether the top level domain is one of a predetermined set of top level domains. Each other level domain may be searched for one or more recognized words therewithin, and a pronunciation of the other level domain may be determined based at least in part on an outcome of the search. The username and domain name may form part of a network address such as an email address, URL or URI. 121-. (canceled)22. A method for a computing device for text to speech conversion of a network address comprising: generating a phonetic representation of each character in the top level domain pronounced individually; and,', 'generating a tokenized representation of each individual character of the top level domain suitable for interpretation by a text-to-speech engine; and, 'determining, at the computing device, whether a top level domain of the network address is one of a set of top level domains that are pronounced as a whole and, when the top level domain is not one of the set, then one or more offor each other level domain of the network address, determining, at the computing device, a pronunciation of the other level domain.23. The method of claim 22 , wherein when the top level domain is one of the set claim 22 , the method further comprising one or more of:generating a phonetic representation of the top level domain pronounced as a whole; and,generating a tokenized representation of the top level domain pronounced as a whole, suitable for interpretation by the text-to-speech engine.24. The method of claim 22 , ...

Подробнее
12-03-2020 дата публикации

System and method for speech synthesis

Номер: US20200082805A1
Автор: HUI Zhang, Xiulin Li

The present disclosure relates to a method and system for generating a speech from a text. According to certain embodiments, the method includes: identifying a plurality of phonemes from the text; determining a first set of acoustic features for each identified phoneme; selecting a sample phoneme corresponding to each identified phoneme from a speech database based on at least one of the first set of acoustic features; determining a second set of acoustic features for each selected sample phoneme; and generating the speech using a generative model based on at least one of the second set of acoustic features.

Подробнее
25-03-2021 дата публикации

Method and Machine for Predictive Animal Behavior Analysis

Номер: US20210089945A1
Автор: De Suranjan, Gibbs Andy H.
Принадлежит:

A system for predictive animal behavior analysis for recommending deliverables for a pet based on the pet's behavior and helping a guardian of the pet understand the needs of the pet. The system generally includes a sensor that acquires a pet behavior data of a pet, a database storing a plurality of pet behaviors and a plurality of corresponding deliverables, and a server computer in configured to compare the pet behavior data to the plurality of pet behaviors in the database to identify a selected deliverable from the plurality of corresponding deliverables that corresponds to the pet behavior data. The selected deliverable is then communicated to the guardian which may be purchased by the guardian. In another embodiment, the server computer is configured to inform the guardian of the pet regarding the wants and needs of the pet. 1. A system for recommending a pet deliverable for a pet , comprising:a sensor associated with a pet, wherein the sensor acquires a pet behavior data of the pet corresponding to at least one behavior of the pet;a database storing a plurality of pet behaviors and a plurality of corresponding deliverables;a server computer in communication with the sensor and the database, wherein the sensor is configured to communicate the pet behavior data for the pet to a server computer and wherein the server computer is configured to compare the pet behavior data to the plurality of pet behaviors in the database to identify at least one selected deliverable from the plurality of corresponding deliverables that corresponds to the pet behavior data; anda computer device accessible by a guardian of the pet, wherein the server computer communicates to the computer device the at least one selected deliverable and wherein the computer device communicates the at least one selected deliverable to the guardian of the pet.2. The system of claim 1 , wherein the deliverables are a pet product or a pet service.3. The system of claim 1 , wherein the sensor is ...

Подробнее
31-03-2016 дата публикации

METHOD AND APPARATUS TO SYNTHESIZE VOICE BASED ON FACIAL STRUCTURES

Номер: US20160093284A1
Принадлежит:

Disclosed are embodiments for use in an articulatory-based text-to-speech conversion system configured to establish an articulatory speech synthesis model of a person's voice based on facial characteristics defining exteriorly visible articulatory speech synthesis model parameters of the person's voice and on a predefined articulatory speech synthesis model selected from among stores of predefined models. 1. An apparatus for use in an articulatory-based text-to-speech conversion system to establish an articulatory speech synthesis model of a person's voice , the apparatus comprising:a facial structure input device to acquire image data representing a visage of a person, in which the visage includes facial characteristics defining exteriorly visible articulatory speech synthesis model parameters of the person's voice;a facial characteristics matching system to select a predefined articulatory speech synthesis model from among stores of predefined models, the selection based at least in part on one or both of the facial characteristics or the exteriorly visible articulatory speech synthesis model parameters; andan articulatory system to associate at least a portion of the selected predefined articulatory speech synthesis model with the articulatory speech synthesis model of the person's voice.2. The apparatus of claim 1 , in which the selection is based on a measure of a face-matching correlation between the facial characteristics of the visage of the person and facial characteristics defining visible articulatory speech synthesis model parameters of the predefined models.3. The apparatus of claim 2 , in which the measure of face-matching correlation is derived using a hidden Markovian model.4. The apparatus of claim 1 , in which the facial structure input device is configured to acquire the image data by capturing an image with an imager in a user equipment device.5. The apparatus of claim 1 , in which the facial characteristics matching system is configured to ...

Подробнее
31-03-2016 дата публикации

SYNTHESIZING AN AGGREGATE VOICE

Номер: US20160093286A1
Принадлежит:

A system and computer-implemented method for synthesizing multi-person speech into an aggregate voice is disclosed. The method may include crowd-sourcing a data message configured to include a textual passage. The method may include collecting, from a plurality of speakers, a set of vocal data for the textual passage. Additionally, the method may also include mapping a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice. 1. A computer implemented method for synthesizing multi-person speech into an aggregate voice , the method comprising:crowd-sourcing a data message configured to include a textual passage;collecting, from a plurality of speakers, a set of vocal data for the textual passage; andmapping a source voice profile to a subset of the set of vocal data to synthesize the aggregate voice.2. The method of claim 1 , wherein mapping the source voice profile to a subset of the set of vocal data to synthesize the aggregate voice includes:extracting phonological data from the set of vocal data, wherein the phonological data includes pronunciation tags, intonation tags, and syllable rates;converting, based on the phonological data including pronunciation tags, intonation tags and syllable rates, the set of vocal data into a set of phoneme strings; andapplying, to the set of phoneme strings, the source voice profile.3. The method of claim 1 , wherein the set of vocal data includes a first set of enunciation data corresponding to a first portion of the textual passage claim 1 , a second set of enunciation data corresponding to a second portion of the textual passage claim 1 , and a third set of enunciation data corresponding to both the first and second portions of the textual passage.4. The method of claim 1 , wherein the source voice profile includes a predetermined set of phonological and prosodic characteristics corresponding to a voice of a first individual.5. The method of claim 4 , wherein the phonological and prosodic ...

Подробнее
31-03-2016 дата публикации

SYSTEMS AND METHODS FOR MULTI-STYLE SPEECH SYNTHESIS

Номер: US20160093289A1
Автор: Pollet Vincent
Принадлежит:

Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a first speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identified plurality of speech segments comprising a first speech segment having the first speaking style and a second speech segment having a second speaking style different from the first speaking style; and rendering the text as speech having the first speaking style, at least in part, by using the identified plurality of speech segments. 1. A speech synthesis method , comprising: obtaining input comprising text and an identification of a first speaking style to use in rendering the text as speech;', 'identifying a plurality of speech segments for use in rendering the text as speech, the identified plurality of speech segments comprising a first speech segment having the first speaking style and a second speech segment having a second speaking style different from the first speaking style; and', 'rendering the text as speech having the first speaking style, at least in part, by using the identified plurality of speech segments., 'using at least one computer hardware processor to perform2. The speech synthesis method of claim 1 , wherein the identifying comprises:identifying the second speech segment based, at least in part, on how well acoustic characteristics of the second speech segment match acoustic characteristics associated with the first speaking style.3. The speech synthesis method of claim 2 , wherein the identifying the second speech segment is based claim 2 , at least in part claim 2 , on how well prosodic characteristics of the second speech segment match prosodic characteristics associated with the first speaking style.4. The speech synthesis method of claim 2 , wherein identifying the second speech segment ...

Подробнее
05-05-2022 дата публикации

PERFORMING ARTIFICIAL INTELLIGENCE SIGN LANGUAGE TRANSLATION SERVICES IN A VIDEO RELAY SERVICE ENVIRONMENT

Номер: US20220139417A1
Автор: Maxwell Conrad A.
Принадлежит:

Video relay services, communication systems, non-transitory machine-readable storage media, and methods are disclosed herein. A video relay service may include at least one server configured to receive a video stream including sign language content from a video communication device during a real-time communication session. The server may also be configured to automatically translate the sign language content into a verbal language translation during the real-time communication session without assistance of a human sign language interpreter. Further, the server may be configured to transmit the verbal language translation during the real-time communication session.

Подробнее
09-04-2015 дата публикации

Intelligent state aware system control utilizing two-way voice / audio communication

Номер: US20150100321A1
Принадлежит: Naviscent LLC

The embodiments provide a method and system for enabling an intelligent state aware system control utilizing two-way voice/audio communication using an electronic device. The method includes receiving voice commands from a user and identifying one or more actions associated with the voice command. Further, the method includes maintaining internal states of the actions based on one or more rules, where the internal states are dynamically defined based on a response to the voice command and the action. Further, the method includes computing application commands by performing the actions in accordance to the internal state, and providing a voice response to the user from the electronic device in response to execution of the application commands on corresponding applications.

Подробнее
01-04-2021 дата публикации

INFORMATION PROCESSING METHOD, INFORMATION PROCESSING DEVICE, AND PROGRAM

Номер: US20210097973A1
Принадлежит:

An information processing method is realized by a computer, and includes setting a pronunciation style with regard to a specific range on a time axis, arranging one or more notes in accordance with an instruction from a user within the specific range for which the pronunciation style has been set, and generating a characteristic transition, which is a transition of acoustic characteristics of voice that pronounces the one or more notes within the specific range in the pronunciation style set for the specific range. 1. An information processing method realized by a computer , comprising:setting a pronunciation style with regard to a specific range on a time axis;arranging one or more notes in accordance with an instruction from a user within the specific range for which the pronunciation style has been set; andgenerating a characteristic transition, which is a transition of acoustic characteristics of voice that pronounces the one or more notes within the specific range in the pronunciation style set for the specific range.2. The information processing method according to claim 1 , further comprisingdisplaying the one or more notes within the specific range and the characteristic transition within the specific range within a musical score area in which the time axis is set.3. The information processing method according to claim 1 , whereinin the generating of the characteristic transition, the characteristic transition of the specific range is changed each time of editing of the one or more notes within the specific range.4. The information processing method according to claim 1 , whereinthe one or more notes include a first note and a second note, andthe generating of the characteristic transition is performed such that a portion of the characteristic transition corresponding to the first note is different between the characteristic transition in a first state in which the first note is set within the specific range, and the characteristic transition in a second ...

Подробнее
12-05-2022 дата публикации

MULTI-MODAL MODEL FOR DYNAMICALLY RESPONSIVE VIRTUAL CHARACTERS

Номер: US20220148248A1
Принадлежит: ARTIE, INC.

The disclosed embodiments relate to a method for controlling a virtual character (or “avatar”) using a multi-modal model. The multi-modal model may process various input information relating to a user and process the input information using multiple internal models. The multi-modal model may combine the internal models to make believable and emotionally engaging responses by the virtual character. The link to a virtual character may be embedded on a web browser and the avatar may be dynamically generated based on a selection to interact with the virtual character by a user. A report may be generated for a client, the report providing insights as to characteristics of users interacting with a virtual character associated with the client. 1. A method for controlling a virtual character , the method comprising:receiving multi-modal input information from a device, the multi-modal input information including any of speech information, facial expression information, and environmental information representing an environment surrounding the device;displaying the virtual character in a position in a display environment presented on the device;implementing at least two internal models to identify characteristics of the multi-modal input information;inspecting the identified characteristics of the at least two internal models to determine whether a first identified characteristic of the identified characteristics includes a threshold number of similar features of a second identified characteristic of the identified characteristics;comparing the first identified characteristic and the second identified characteristic against information specific to the virtual character included in a virtual character knowledge model to select a selected characteristic based on determining that the first identified characteristic includes the threshold number of similar features of the second identified characteristic of the identified characteristics;accessing a library of potential actions ...

Подробнее
28-03-2019 дата публикации

METHOD AND APPARATUS FOR GENERATING SPEECH SYNTHESIS MODEL

Номер: US20190096385A1
Автор: KANG Yongguo
Принадлежит:

The present disclosure discloses a method and apparatus for generating a speech synthesis model. A specific embodiment of the method comprises: acquiring a plurality of types of training samples, each of the plurality of types of training samples including a text of the type, and a speech of the text having a style of speech corresponding to the type read by an announcer corresponding to the type; and training a neural network corresponding to a speech synthesis model using the plurality of types of training samples and an annotation of the style of speech in the each of the plurality of types of training samples to obtain the speech synthesis model, the speech synthesis model being used to synthesize speech of the announcer corresponding to each of the plurality of types having a plurality of styles. 1. A method for generating a speech synthesis model , comprising:acquiring a plurality of types of training samples, each of the plurality of types of training samples including a text of the type, and a speech of the text having a style of speech corresponding to the type read by an announcer corresponding to the type in the style; andtraining a neural network corresponding to a speech synthesis model using the plurality of types of training samples and an annotation of the style of speech in the each of the plurality of types of training samples to obtain the speech synthesis model, the speech synthesis model being used to synthesize the speech of the announcer corresponding to each of the plurality of types having a plurality of the styles.2. The method according to claim 1 , wherein the training a neural network corresponding to a speech synthesis model using the plurality of types of training samples and an annotation of the style of speech in the each of the plurality of types of training samples to obtain the speech synthesis model comprises:combining the annotation of the style of speech in the each of the plurality of types of training samples and an output of ...

Подробнее
28-03-2019 дата публикации

METHOD AND APPARATUS FOR GENERATING SPEECH SYNTHESIS MODEL

Номер: US20190096386A1
Автор: Li Hao
Принадлежит:

The present disclosure discloses a method and apparatus for generating a speech synthesis model. A specific embodiment of the method comprises: acquiring a text characteristic of a text and an acoustic characteristic of a speech corresponding to the text used for training a neural network corresponding to a speech synthesis model, fundamental frequency data in the acoustic characteristic of the speech corresponding to the text used for the training being extracted through a fundamental frequency data extraction model, and the fundamental frequency data extraction model being generated based on pre-training a neural network corresponding to the fundamental frequency data extraction model using the speech including each frame of speech having corresponding fundamental frequency data; and training the neural network corresponding to the speech synthesis model using the text characteristic of the text and the acoustic characteristic of the speech corresponding to the text. 1. A method for generating a speech synthesis model , comprising:acquiring a text characteristic of a text and an acoustic characteristic of a speech corresponding to the text used for training a neural network corresponding to a speech synthesis model, fundamental frequency data in the acoustic characteristic of the speech corresponding to the text used for the training being extracted through a fundamental frequency data extraction model, and the fundamental frequency data extraction model being generated based on pre-training a neural network corresponding to the fundamental frequency data extraction model using the speech comprising each frame of speech having corresponding fundamental frequency data; andtraining the neural network corresponding to the speech synthesis model using the text characteristic of the text and the acoustic characteristic of the speech corresponding to the text.2. The method according to claim 1 , further comprising:acquiring a speech used for training the neural network ...

Подробнее
08-04-2021 дата публикации

VOICE ASSISTANT WITH CONTEXTUALLY-ADJUSTED AUDIO OUTPUT

Номер: US20210104220A1
Принадлежит:

A voice assistant has a contextually-adjusted audio output. The audio output can be adjusted, for example, based on media content characteristics. 1. A method for generating synthesized speech of a voice assistant having a contextually-adjusted audio output using a voice-enabled device , the method comprising:identifying media content characteristics associated with media content;identifying base characteristics of audio output;generating contextually-adjusted characteristics of audio output based at least in part on the base characteristics and the media content characteristics; andusing the contextually-adjusted audio output characteristics to generate the synthesized speech.2. The method of claim 1 , wherein the contextually-adjusted characteristics of audio output are further based on user-specific adjustments to the base characteristics of audio output.3. The method of claim 1 , wherein using the contextually-adjusted audio output comprises receiving voice content and generating the synthesized speech to convey the voice content to the user according to the contextually-adjusted audio output.4. The method of claim 1 , wherein identifying the media content characteristics comprises:analyzing audio of the media content to determine musical characteristics of the media content; andanalyzing media content metadata to determine metadata-based characteristics.5. The method of claim 4 , wherein generating a contextually-adjusted audio output is based at least in part upon the musical characteristics of the media content.6. The method of claim 5 , wherein generating the contextually-adjusted audio output comprises generating mood-related attributes that are compatible with the musical characteristics of the media content.7. The method of claim 5 , wherein generating the contextually-adjusted audio output comprises generating mood-related attributes that are compatible with metadata-based characteristics of the media content.8. The method of claim 1 , wherein the user- ...

Подробнее
08-04-2021 дата публикации

Hotword-Aware Speech Synthesis

Номер: US20210104221A1
Принадлежит: Google LLC

A method includes receiving text input data for conversion into synthesized speech and determining, using a hotword-aware model trained to detect a presence of a hotword assigned to a user device, whether a pronunciation of the text input data includes the hotword. The hotword is configured to initiate a wake-up process on the user device for processing the hotword and/or one or more other terms following the hotword in the audio input data. When the pronunciation of the text input data includes the hotword, the method also includes generating an audio output signal from the text input data and providing the audio output signal to an audio output device to output the audio output signal. The audio output signal when captured by an audio capture device of the user device, configured to prevent initiation of the wake-up process on the user device. 1. A method comprising:receiving, at data processing hardware of a speech synthesis device, text input data for conversion into synthesized speech;determining, by the data processing hardware and using a hotword-aware model trained to detect a presence of at least one hotword assigned to a user device, whether a pronunciation of the text input data includes the hotword, the hotword, when included in audio input data received by the user device, configured to initiate a wake-up process on the user device for processing the hotword and/or one or more other terms following the hotword in the audio input data; and generating an audio output signal from the text input data; and', 'providing, by the data processing hardware, the audio output signal o an audio output device to output the audio output signal, the audio output signal when captured by an audio capture device of the user device, configured to prevent initiation of the wake-up process on the user device., 'when the pronunciation of the text input data includes the hotword2. The method of claim 1 , wherein determining whether the pronunciation of the text input data ...

Подробнее
08-04-2021 дата публикации

Wearable electronic device for emitting a masking signal

Номер: US20210104222A1
Принадлежит:

A signal processing method and a wearable electronic device such as a headphone or an earphone comprising a microphone arranged to pick up an acoustic signal and convert the acoustic signal to a microphone signal (x); a loudspeaker arranged in an earpiece; and a processor configured to control the volume of a masking signal (m); and supply the masking signal (m) to the loudspeaker. Further, the processor is further configured to detect voice activity and generate a voice activity signal (y) which is, concurrently with the microphone signal, sequentially indicative of one or more of: voice activity and voice in-activity; and control the volume of the masking signal (m) in response to the voice activity signal (y) in accordance with supplying the masking signal (m) to the loudspeaker at a first volume at times when the voice activity signal (y) is indicative of voice activity and at a second volume at times when the voice activity signal (y) is indicative of voice in-activity. 1. A wearable electronic device comprising:an electro-acoustic input transducer arranged to pick up an acoustic signal and convert the acoustic signal to a microphone signal (x);a loudspeaker and control the volume of a masking signal (m); and', 'supply the masking signal (m) to the loudspeaker;, 'a processor configured to based on processing at least the microphone signal (x), detect voice activity and generate a voice activity signal (y) which is, concurrently with the microphone signal, sequentially indicative of one or more of: voice activity and voice in-activity; and', 'control the volume of the masking signal (m) in response to the voice activity signal (y) in accordance with supplying the masking signal (m) to the loudspeaker at a first volume at times when the voice activity signal (y) is indicative of voice activity and at a second volume at times when the voice activity signal (y) is indicative of voice in-activity., 'wherein the processor is further configured to2. A wearable device ...

Подробнее
21-04-2016 дата публикации

Device for Extracting Information From a Dialog

Номер: US20160110350A1
Автор: Waibel Alexander
Принадлежит:

Computer-implemented systems and methods for extracting information during a human-to-human mono-lingual or multi-lingual dialog between two speakers are disclosed. Information from either the recognized speech (or the translation thereof) by the second speaker and/or the recognized speech by the first speaker (or the translation thereof) is extracted. The extracted information is then entered into an electronic form stored in a data store. 140-. (canceled)41. A method comprising:receiving a first speech input from a first speaker;determining, by a speech translation system, a first recognized speech result based on the speech input;determining, by the speech translation system, whether there exists a recognition ambiguity in the first recognized speech result, wherein the recognition ambiguity indicates more than one possible match for the first recognized speech result;upon a determination that there is recognition ambiguity in the first recognized speech result of the first speaker, determining a confidence score based on the recognition ambiguity; andresponsive to the confidence score being below a threshold, issuing a first disambiguation query to the first speaker via the speech translation system, wherein a response to the first disambiguation query resolves the recognition ambiguity.42. The method of claim 41 , further comprising:receiving a second speech input from a second speaker;determining, by the speech translation system, a second recognized speech result based on the second speech input;extracting information from the second recognized speech input from the second speaker;entering the extracted information into an electronic form; anddisplaying the electronic form.43. The method of claim 42 , wherein the determination of whether there exists an ambiguity in the recognized speech result of the first speaker is based one or more of:an acoustic confidence score in the recognized speech result of the first speaker;a context of the electronic form; anda ...

Подробнее
02-06-2022 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND FOOD PRESERVATION APPARATUS

Номер: US20220171837A1
Автор: IKENAGA Mari
Принадлежит:

The present technology relates to an information processing apparatus, an information processing method, and a food preservation apparatus each capable of more appropriately outputting a response concerning a food. 1. An information processing apparatus comprisinga processing unit configured to:manage, as information regarding a food preserved in a preservation cabinet, food information containing ownership right information regarding an ownership right for each food;generate, in a case where information from an identified user contains the information regarding the ownership right, a response according to the ownership right information regarding a desired food of the user, on a basis of the managed food information; andoutput the generated response.2. The information processing apparatus according to claim 1 , whereinwhen the user puts a desired food in the preservation cabinet, the processing unit registers the food information regarding the food put in the preservation cabinet, andwhen the user takes a desired food out of the preservation cabinet, the processing unit generates a response according to the ownership right information regarding the food taken out of the preservation cabinet.3. The information processing apparatus according to claim 2 , whereinthe food information contains ingredient information regarding an ingredient of the food, andin a case where the food taken out of the preservation cabinet contains a foodstuff which the user cannot eat, the processing unit generates a response according to information regarding the foodstuff, on a basis of the ingredient information.4. The information processing apparatus according to claim 3 , whereinthe processing unit manages, as information regarding the user, user information containing information regarding a foodstuff which the user cannot eat for each user, andin a case where the food taken out of the preservation cabinet contains a foodstuff which the user cannot eat, the processing unit generates a ...

Подробнее
21-04-2016 дата публикации

VOICE AND TEXT COMMUNICATION SYSTEM, METHOD AND APPARATUS

Номер: US20160111082A1
Принадлежит:

The disclosure relates to systems, methods and apparatus to convert speech to text and vice versa. One apparatus comprises a vocoder, a speech to text conversion engine, a text to speech conversion engine, and a user interface. The vocoder is operable to convert speech signals into packets and convert packets into speech signals. The speech to text conversion engine is operable to convert speech to text. The text to speech conversion engine is operable to convert text to speech. The user interface is operable to receive a user selection of a mode from among a plurality of modes, wherein a first mode enables the speech to text conversion engine, a second mode enables the text to speech conversion engine, and a third mode enables the speech to text conversion engine and the text to speech conversion engine. 121-. (canceled)22. An apparatus for wireless communications , said apparatus comprising:a module configured to receive text that has been entered by a user of a mobile communications device and to convert the received text to a synthesized speech signal;a vocoder configured to encode the synthesized speech signal to produce a plurality of corresponding speech packets; anda transceiver configured to transmit the plurality of corresponding speech packets over a wireless communications link,wherein said module includes a voice synthesizer configured to store characteristics of a voice of the user and to use said stored characteristics to produce the synthesized speech signal.23. The apparatus according to claim 22 , wherein said module includes a text-to-speech conversion engine configured to convert said received text to a speech signal claim 22 , andwherein said voice synthesizer is arranged to produce the synthesized speech signal from said speech signal.24. The apparatus according to claim 22 , wherein the mobile communications device includes said apparatus.25. The apparatus according to claim 24 , wherein said stored characteristics include pitch.26. The ...

Подробнее
19-04-2018 дата публикации

LOW-DIMENSIONAL REAL-TIME CONCATENATIVE SPEECH SYNTHESIZER

Номер: US20180108342A1
Принадлежит:

A method of providing real-time speech synthesis based on user input includes presenting a graphical user interface having a low-dimensional representation of a multidimensional phoneme space, a first dimension representing degree of vocal tract constriction and voicing, a second dimension representing location in a vocal tract. One example employs a disk-shaped layout. User input is received via the interface and translated into a sequence of phonemes that are rendered on an audio output device. Additionally, a synthesis method includes maintaining a library of prerecorded samples of diphones organized into diphone groups, continually receiving a time-stamped sequence of phonemes to be synthesized, and selecting a sequence of diphone groups with their time stamps. A best diphone within each group is identified and placed into a production buffer from which diphones are rendered according to their time stamps. 1. A method of operating a computerized device to provide real-time synthesis of speech based on user input , comprising:presenting a graphical user interface having a low-dimensional representation of a multi-dimensional phoneme space, a first dimension representing degree of vocal tract constriction and voicing, a second dimension representing location in a vocal tract;receiving user input via the interface and translating received user input into a sequence of phonemes; andrendering the sequence of phonemes on an audio output device.2. The method of claim 1 , wherein the first dimension further represents single versus diphthong vowel sounds.3. The method of claim 1 , wherein the low-dimensional representation has a disk-shaped layout claim 1 , and the first and second dimensions are selected from a radial dimension and an angular dimension of the disk-shaped layout.4. The method of claim 3 , wherein the first dimension is the radial dimension claim 3 , and the second dimension is the angular dimension.5. The method of claim 1 , wherein the computerized ...

Подробнее
20-04-2017 дата публикации

SYSTEMS AND METHODS FOR MULTI-STYLE SPEECH SYNTHESIS

Номер: US20170110110A1
Автор: Pollet Vincent
Принадлежит: NUANCE COMMUNICATIONS, INC.

Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a first speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identified plurality of speech segments comprising a first speech segment having the first speaking style and a second speech segment having a second speaking style different from the first speaking style; and rendering the text as speech having the first speaking style, at least in part, by using the identified plurality of speech segments. 120-. (canceled)21. A speech synthesis method , comprising: obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech;', 'identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style;', 'synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and', 'outputting the synthesized speech., 'using at least one computer hardware processor to perform22. The speech synthesis method of claim 21 , wherein the identifying the first speech segment is based at least in part on how well acoustic characteristics of the first speech segment match acoustic characteristics associated with the desired speaking style.23. The speech synthesis method of claim 22 , wherein the identifying the first speech segment is based at least in part on how well prosodic characteristics of the first speech segment match prosodic characteristics associated with the desired speaking style.24. The speech ...

Подробнее
20-04-2017 дата публикации

TECHNOLOGY FOR RESPONDING TO REMARKS USING SPEECH SYNTHESIS

Номер: US20170110111A1
Принадлежит:

The present invention is provided with: a voice input section that receives a remark (a question) via a voice signal; a reply creation section that creates a voice sequence of a reply (response) to the remark; a pitch analysis section that analyzes the pitch of a first segment (e.g., word ending) of the remark; and a voice generation section (a voice synthesis section, etc.) that generates a reply, in the form of voice, represented by the voice sequence. The voice generation section controls the pitch of the entire reply in such a manner that the pitch of a second segment (e.g., word ending) of the reply assumes a predetermined pitch (e.g., five degrees down) with respect to the pitch of the first segment of the remark. Such arrangements can realize synthesis of replying voice capable of giving a natural feel to the user. 1. A voice synthesis apparatus comprising:a voice input section configured to receive a voice signal of a remark;a pitch analysis section configured to analyze a pitch of a first segment of the remark;an acquisition section configured to acquire a reply to the remark; anda voice generation section configured to generate voice of the reply acquired by said acquisition section, said voice generation section controlling a pitch of the voice of the reply in such a manner that a second segment of the reply has a pitch associated with the pitch of the first segment analyzed by said pitch analysis section,wherein said voice generation section controls the pitch of the voice of the reply in such a manner that an interval of the pitch of said second segment relative to the pitch of said first segment becomes a consonant interval.2. The voice synthesis apparatus as claimed in claim 1 , wherein the first segment is a word ending of the remark being a question claim 1 , and said second segment is a word beginning or word ending of the reply.3. The voice synthesis apparatus as claimed in claim 1 , wherein said voice generation section controls the pitch of the ...

Подробнее
20-04-2017 дата публикации

Normalized, User Adjustable, Stochastic, Lightweight, Media Environment

Номер: US20170110112A1

Software which uses text to speech technology to perform electronic screenplays, including plays depicting debates, multi-lingual conversations, and scientific conference presentations, using one or more distinctly identifiable voices. Screenplays are stored by the software as collections of text fragments whose relationships to each other have been documented in machine readable form, so as to make it possible for the software to comply with user requests that various aspects of the presentation be altered. This includes complying with user requests to increase or decrease the level of detail of a presentation, requests to increase or decrease the information density of a presentation, and requests to present again material that has already been presented using alternative wordings. User adjustable cognitive burden allows transformation of professional development into entertainment, and highly compact normalized information representation allows storage of thousands of hours of user adjustable material on hand held electronic devices. 1. Software which uses speech synthesizers to perform electronic screenplays , including plays depicting debates , conversations , and the kinds of presentations one typically sees at scientific conferences , using one or more distinctly identifiable voices , which may very well speak in several different languages during the course of any one presentation. These screenplays are stored by the software as collections of text fragments whose relationships to each other have been documented in machine readable form , so as to make it possible for the software to comply with user requests that various aspects of the presentation be altered. This includes complying with user requests to increase or decrease the level of detail of a presentation , requests to increase or decrease the information density of a presentation , and requests to present again material that has already been presented using alternative wordings.2. Software that can ...

Подробнее
11-04-2019 дата публикации

SYSTEMS AND METHODS FOR MULTI-STYLE SPEECH SYNTHESIS

Номер: US20190108830A1
Автор: Pollet Vincent
Принадлежит: NUANCE COMMUNICATIONS, INC.

Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a desired speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech. 1. A speech synthesis method , comprising: obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech;', 'identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style;', 'synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and', 'outputting the synthesized speech., 'using at least one computer hardware processor to perform2. The speech synthesis method of claim 1 , wherein the identifying the first speech segment is based at least in part on how well acoustic characteristics of the first speech segment match acoustic characteristics associated with the desired speaking style.3. The speech synthesis method of claim 2 , wherein the identifying the first speech segment is based at least in part on how well prosodic characteristics of the first speech segment match prosodic ...

Подробнее
27-04-2017 дата публикации

METHODS AND SYSTEMS FOR MANAGING DIALOGS OF A ROBOT

Номер: US20170113353A1
Принадлежит:

A computer-implemented method of handling an audio dialog between a robot and a human user comprises: during the audio dialog, receiving audio data and converting audio data into text data; in response to text data, determining a dialog topic, the dialog topic comprising a dialog content and a dialog voice skin; wherein a dialog content comprises a plurality of sentences; determining a sentence to be rendered in audio by the robot; receiving a modification request of the determined dialog sentence. Described developments for example comprise different regulation schemes (e.g. open-loop or closed-loop), the use of moderation rules (centralized or distributed) and the use of priority levels and/or parameters depending on the environment perceived by the robot. 1. A computer-implemented method of handling an audio dialog between a robot and a human user , the method comprising:during said audio dialog, receiving audio data and converting said audio data into text data;in response to said text data, determining a dialog topic, said dialog topic comprising a dialog content and a dialog voice skin; wherein a dialog content comprises a plurality of sentences;determining a sentence to be rendered in audio by the robot;receiving a modification request of said determined dialog sentence;applying one or more moderation rules to the modified determined dialog sentence according to said modification request.2. The method of claim 1 , further comprising accepting said modification request and restituting in audio the modified determined dialog sentence.3. The method of claim 2 , further comprising receiving the feedback of a user after restituting in audio the modified determined dialog sentence.4. The method of claim 1 , wherein the one or more moderation rules are predefined.5. The method of claim 1 , wherein the one or more moderation rules are retrieved from a network.6. The method of claim 1 , wherein the one or more moderation rules comprise one or more filters claim 1 , ...

Подробнее
09-06-2022 дата публикации

ELECTRONIC APPARATUS AND METHOD FOR CONTROLLING THEREOF

Номер: US20220180872A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

An electronic apparatus, based on a text sentence being input, obtains prosody information of the text sentence, segments the text sentence into a plurality of sentence elements, obtains a speech in which prosody information is reflected to each of the plurality of sentence elements in parallel by inputting the plurality of sentence elements and the prosody information of the text sentence to a text to speech (TTS) module, and merges the speech for the plurality of sentence elements that are obtained in parallel to output speech for the text sentence. 1. An electronic apparatus comprising:a memory configured to store at least one instruction; and based on obtaining a text input, obtain prosody information of the text input,', 'segment the text input into a plurality of segments based on a processing time for converting the plurality of segments into speech segments,', 'obtain speech segments in which the prosody information is reflected to each segment of the plurality of segments in parallel by inputting the plurality of segments and the prosody information to a text-to-speech (TTS) module, and', 'obtain a speech for the text input by merging the speech segments., 'a processor configured to execute the at least one instruction stored in the memory, wherein when the at least one instruction is executed, the at least one instruction causes the processor to control to2. The electronic apparatus of claim 1 , wherein when executing the at least one instruction claim 1 , the processor is further configured to:obtain a plurality of first segments by segmenting the text input based on a first criterion, and based on a first processing time for converting the plurality of first segments to the speech segments being less than a predetermined time, input the plurality of first segments to the TTS module,based on the first processing time for converting at least one first segment of the plurality of first segments to the speech segments being greater than or equal to the ...

Подробнее
27-04-2017 дата публикации

Voice Synthesizing Apparatus, Voice Synthesizing Method, and Storage Medium Therefor

Номер: US20170116978A1
Автор: Hiroaki Matsubara
Принадлежит: Yamaha Corp

A voice synthesizing apparatus includes: a voice inputter ( 102 ) configured to input a voice; an obtainer ( 22 ) configured to obtain a primary response to the voice inputted by the voice inputter ( 102 ); an analyzer ( 112 ) configured to analyze whether the primary response includes a repetition target; and a voice synthesizer ( 24 ) configured to, in a case where the analyzed primary response is determined to include the repetition target, synthesize a voice from a secondary response that includes the repetition target repeated at least twice to output the voice.

Подробнее
18-04-2019 дата публикации

SYSTEMS AND METHODS FOR PROVIDING NON-LEXICAL CUES IN SYNTHESIZED SPEECH

Номер: US20190115007A1
Принадлежит:

Systems and methods are disclosed for providing non-lexical cues in synthesized speech. Original text is analyzed to determine characteristics of the text and/or to derive or augment an intent (e.g., an intent code). Non-lexical cue insertion points are determined based on the characteristics of the text and/or the intent. One or more non-lexical cues are inserted at insertion points to generate augmented text. The augmented text is synthesized into speech, including converting the non-lexical cues to speech output. 1. A system that converts text to speech , the system comprising:an intent analyzer to analyze original text received by the system to derive an intent representative of intended meaning to be conveyed by non-lexical cues;a non-lexical cue insertion engine to determine insertion points of non-lexical cues based on the derived intent and to insert a non-lexical cue at the insertion point within the original text to generate augmented text; anda speech synthesizer to synthesize speech from the augmented text.225.-. (canceled) Embodiments herein relate generally to speech synthesis, and more particularly relate to providing non-lexical cues in text-to-speech output.Natural language interfaces are becoming commonplace in computing devices generally, and particularly in mobile computing devices, such as smartphones, tablets, and laptop computers. Current natural language interfaces often synthesize speech that sounds artificial because the synthesized speech does not include non-lexical expressive features of natural language.Natural language interfaces are presently available on a variety of computing devices generally, and particularly in mobile computing devices, such as smartphones, tablets, and laptop computers. These natural language interfaces presently provide output speech that is primarily, or even purely, lexical (i.e., of or relating to words or vocabulary of a language) and that often sounds mechanical and/or artificial. One reason for the ...

Подробнее
09-04-2020 дата публикации

SYSTEMS AND METHODS FOR GENERATING ALTERNATE AUDIO FOR A MEDIA STREAM

Номер: US20200111474A1
Принадлежит:

Systems and methods are described herein for generating alternate audio for a media stream. The media system receives media that is requested by the user. The media comprises a video and audio. The audio includes words spoken in a first language. The media system stores the received media in a buffer as it is received. The media system separates the audio from the buffered media and determines an emotional state expressed by spoken words of the first language. The media system translates the words spoken in the first language into words spoken in a second language. Using the translated words of the second language, the media system synthesizes speech having the emotional state previously determined. The media system then retrieves the video of the received media from the buffer and synchronizes the synthesized speech with the video to generate the media content in a second language. 1. A method for generating alternate audio for a media stream , the method comprising:receiving media comprising video and audio, wherein the audio comprises a first plurality of spoken words in a first language;buffering the media in a buffer as it is received; determining an emotional state expressed by the first plurality of spoken words;', 'translating the first plurality of spoken words of the first language into a second plurality of spoken words of a second language; and', 'synthesizing, from the second plurality of spoken words of the second language, speech having the determined emotional state;, 'retrieving audio from the buffer andretrieving video from the buffer; andgenerating for output the retrieved video and the synthesized speech, wherein the synthesized speech is output with the video instead of the first plurality of spoken words.2. The method of claim 1 , wherein translating the first plurality of spoken words into the second plurality of spoken words of the second language comprises:transcribing the first plurality of spoken words in the first language;translating the ...

Подробнее
07-05-2015 дата публикации

Method and System for Cross-Lingual Voice Conversion

Номер: US20150127349A1
Принадлежит: GOOGLE INC.

A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis. 1. A method comprising:training an output hidden Markov model (HMM) based speech features generator implemented by one or more processors of a system using speech signals of an output speaker speaking an output language, wherein the output HMM based speech features generator comprises a first configuration of output HMM state models, each of the output HMM state models having a set of generator-model functions;training an auxiliary HMM based speech features generator implemented by one or more processors of the system using speech signals of an auxiliary speaker speaking an input language, wherein the auxiliary HMM based speech features generator comprises a second configuration of auxiliary HMM state models, each of the auxiliary HMM state models having a set of generator-model functions;for each given output HMM state model of the first configuration, determining a particular set of generator-model functions from among the auxiliary HMM state models of the second configuration that most closely matches the set of generator-model functions of the given output HMM;determining a fundamental ...

Подробнее
05-05-2016 дата публикации

SYSTEM AND METHOD FOR TEXT NORMALIZATION USING ATOMIC TOKENS

Номер: US20160125872A1
Принадлежит: AT&T Intellectual Property I, L.P.

A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone. 1. A method comprising:receiving a text corpus;tokenizing, via a processor configured to perform speech generation, the text corpus into tokens, each token comprising one of a sequence of letters, a sequence of digits, and punctuation;comparing the tokens to a language-independent pattern list, to yield a token comparison;identifying pronunciation guidelines associated with each token in the tokens; andgenerating, via the processor, speech from the tokens in the text corpus using the token comparison and the pronunciation guidelines.2. The method of claim 1 , wherein the pronunciation guidelines comprise at least one of spell claim 1 , expand claim 1 , reorder claim 1 , asword claim 1 , digits claim 1 , cardinal claim 1 , split claim 1 , none claim 1 , and foreign.3. The method of claim 1 , wherein speech is further generated for a given token based on one of N tokens to a left context and N tokens to a right context of the given token.4. The method of claim 1 , wherein the generating of the speech further comprises generating pronunciation guidelines for at least one of the tokens.5. The method of claim 1 , wherein the generating of the speech further comprises instructing a text-to-speech module how to ...

Подробнее
04-05-2017 дата публикации

System And Method For Synthesizing Human Speech

Номер: US20170125009A1
Принадлежит:

A method and apparatus are described for detecting voice related vibration in the upper region of the chest and synthesizing human speech. The innovation finds its use in speech rehabilitation applications among others, specifically in speech impairments and speech disability arising due to accident, congenital defects or other reasons. A set of piezoelectric based sensors are placed on an upper region of the chest atop or near sound tendons. The sensors pick up the vibrations in the sound tendons and convert the vibrations into electrical output signals. These signals are filtered, amplified and processed using the signal recognition unit. Subsequently, a set of parameters are extracted and used to generate speech or a written text. The sensors incorporate piezoelectric or other transducing materials. These sensors are externally affixed to a human body surface corresponding to the position of the sounds tendons in the upper chest/neck region. 1. A method for synthesizing speech using piezoelectric material in contact with a chest of a subject , the method comprising:obtaining a piezoelectric electrical signal from the piezoelectric material, the piezoelectric electrical signal based on mechanical movement of the chest, the mechanical movement related to movement of sound tendons of the subject during an act of speaking;communicating the piezoelectric electrical signal to a signal processor;manipulating the piezoelectric electrical signal in the signal processor using signal processing techniques;extracting from the manipulated piezoelectric electrical signal a set of speech identifying parameters; andgenerating a speech signal using the set of speech identifying parameters, the speech signal corresponding to the piezoelectric electrical signal.2. The method of claim 1 , the method further comprising conveying the generated speech signal using a speaker.3. The method according to claim 1 , wherein manipulating the piezoelectric electrical signal comprises: ...

Подробнее
25-08-2022 дата публикации

MODULAR SYSTEMS AND METHODS FOR SELECTIVELY ENABLING CLOUD-BASED ASSISTIVE TECHNOLOGIES

Номер: US20220269849A1
Принадлежит:

Methods and systems for manual and programmatic remediation of websites. JavaScript code is accessed by a user device and optionally calls TTS, ASR, and RADAE modules from a remote server to thereby facilitate website navigation by people with diverse abilities. 141.-. (canceled)42. A computer-implemented method of programmatically assigning a descriptive attribute to an untagged element on a web page to enable an audible description of the untagged element , the web page having an associated document object model (DOM) , the method comprising:dynamically analyzing, by a computer system, code associated with the web page, the code comprising at least the DOM or HTML code;detecting, by the computer system, one or more compliance issues in the code, wherein at least one of the one or more compliance issues comprises the untagged element lacking an adequate attribute; accessing the code associated with the web page;', 'receiving, by the remote server system, via a remediation interface input from a user, the input comprising a remediation action to manually remediate a non-programmatically-fixable compliance issue associated with the web page;', 'generating, by the remote server system, the one or more pre-existing remediations based on the input comprising the remediation action to manually remediate the non-programmatically-fixable compliance issue;', 'storing, by the remote server system, the one or more pre-existing remediations in an electronic data storage medium; and, 'applying, by the computer system, one or more pre-existing remediations to the one or more compliance issues, wherein the one or more pre-existing remediations is generated by a remote server system performing at leastassigning, by the computer system, an attribute to the untagged element in the code of the web page based on the one or more pre-existing remediations, the attribute assigned to the untagged element.43. The computer-implemented method of claim 42 , wherein the one or more pre- ...

Подробнее
27-05-2021 дата публикации

MODULAR SYSTEMS AND METHODS FOR SELECTIVELY ENABLING CLOUD-BASED ASSISTIVE TECHNOLOGIES

Номер: US20210157473A1
Принадлежит:

Methods and systems for manual and programmatic remediation of websites. JavaScript code is accessed by a user device and optionally calls TTS, ASR, and RADAE modules from a remote server to thereby facilitate website navigation by people with diverse abilities. 141-. (canceled)42. A computer-implemented method of programmatically assigning a descriptive attribute to an untagged element on a web page to enable an audible description of the untagged element , the web page having an associated document object model (DOM) , the method comprising:dynamically analyzing, by a computer system, code associated with the web page, the code comprising at least the DOM or HTML code;detecting, by the computer system, one or more compliance issues relating to web accessibility standards in the code, wherein at least one of the one or more compliance issues comprises the untagged element lacking an adequate descriptive attribute; accessing the code associated with the web page;', 'receiving, by the remote server system, via a remediation interface input from a user, the input comprising a remediation action to manually remediate a non-programmatically-fixable compliance issue associated with the web page;', 'generating, by the remote server system, the one or more pre-existing remediations based on the input comprising the remediation action to manually remediate a non-programmatically-fixable compliance issue;', 'storing, by the remote server system, the one or more pre-existing remediations in an electronic data storage medium; and, 'applying, by the computer system, one or more pre-existing remediations to the one or more compliance issues, wherein the one or more pre-existing remediations is generated by a remote server system performing at leastassigning, by the computer system, a descriptive attribute to the untagged element in the code of the web page based on the one or more pre-existing remediations, the descriptive attribute assigned to the untagged element adapted to ...

Подробнее
25-04-2019 дата публикации

Systems and methods for neural text-to-speech using convolutional sequence learning

Номер: US20190122651A1
Принадлежит: Baidu USA LLC

Described herein are embodiments of a fully-convolutional attention-based neural text-to-speech (TTS) system, which various embodiments may generally be referred to as Deep Voice 3. Embodiments of Deep Voice 3 match state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. Deep Voice 3 embodiments were scaled to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, common error modes of attention-based speech synthesis networks were identified and mitigated, and several different waveform synthesis methods were compared. Also presented are embodiments that describe how to scale inference to ten million queries per day on one single-GPU server.

Подробнее
25-04-2019 дата публикации

HIERARCHICAL INTIMACY FOR COGNITIVE ASSISTANTS

Номер: US20190122668A1
Автор: ANDERSON Ryan R.
Принадлежит:

A computer-implemented method executed by a cognitive system for incorporating hierarchy knowledge. In one embodiment, the computer-implemented method includes the steps of identifying one or more participants during an interaction; obtaining profile information for each of the participants; determining a hierarchy score for each of the participants based on a plurality of factors using the profile information for each of the participants; monitoring and analyzing communications between the participants during the interaction to identify boundary conditions based on the hierarchy score; and interacting with one or more the participants in a manner consistent with the hierarchy score of the participants. 1. A computer-implemented method executed by a cognitive system for incorporating hierarchy knowledge , the computer-implemented method comprising:identifying, using a processor, participants during an interaction;obtaining, using the processor, profile information for each of the participants;determining, using the processor, a hierarchy score for each of the participants based on a comparison of a plurality of factors using the profile information for each of the participants; andinteracting, using the processor, with the participants in a manner consistent with the hierarchy score of the participants.2. The computer-implemented method of claim 1 , further comprising updating the hierarchy score for each of the participants in real time based on communications between the participants during the interaction.3. The computer-implemented method of claim 1 , further comprising storing data associated with communications between the participants as part of the profile information for each of the participants.4. The computer-implemented method of claim 1 , wherein identifying the participants during the interaction comprises performing voice recognition on the participants.5. The computer-implemented method of claim 1 , wherein identifying the participants during the ...

Подробнее
16-04-2020 дата публикации

INFORMATION PROCESSING METHOD, INFORMATION PROCESSING DEVICE, AND COMPUTER-READABLE RECORDING MEDIUM RECORDING INFORMATION PROCESSING PROGRAM

Номер: US20200118540A1
Принадлежит:

An action notification device acquires uttered voice data indicating an action of a user and being uttered by the user; detects a moving motion of the user; detects a stationary motion of the user; determines whether a predetermined time has elapsed in a state where the user is stationary; and notifies the user of contents of the action of the user based on the uttered voice data when it is determined that the predetermined time has elapsed in the state where the user is stationary. 1. An information processing method comprising , by a computer:acquiring uttered voice data indicating an action of a user and being uttered by the user;detecting a moving motion of the user;detecting a stationary motion of the user;determining whether a predetermined time has elapsed in a state where the user is stationary; andnotifying the user of contents of the action of the user based on the uttered voice data when it is determined that the predetermined time has elapsed in the state where the user is stationary.2. The information processing method according to claim 1 , further comprising:acquiring ambient voice data;storing the voice data in a memory; anddetecting a predetermined motion of the user,wherein the acquiring the uttered voice data includes, when the predetermined motion of the user is detected, extracting, from the voice data stored in the memory, voice data uttered by the user within a predetermined period including a time point at which the predetermined motion of the user is detected as the uttered voice data.3. The information processing method according to claim 2 , wherein the predetermined motion is a standing motion of the user.4. The information processing method according to claim 2 , wherein the predetermined motion is the moving motion of the user.5. The information processing method according to claim 1 , further comprising:acquiring ambient voice data;storing the voice data in a memory; andsubjecting the voice data to voice recognition,wherein the ...

Подробнее
27-05-2021 дата публикации

INTERMEDIARY VIRTUAL ASSISTANT FOR IMPROVED TASK FULFILLMENT

Номер: US20210158798A1
Принадлежит:

A system including: A main virtual assistant (VA) that is configured to operate a back-end system according to instructions. An intermediary VA that is configured to: learn, by conversing with the human user and by analyzing responses from the main VA to the human user, to perform a task that is associated with the back-end system; hold a conversation with the main VA, wherein, in the conversation, the instructions are formulated and relayed from the intermediary VA to the main VA based on the learning and on further conversing with the human user, such that the main VA operates the back-end system according to the instructions; and formulate and relay responses to the instructions from the main VA to the human user. 1. A system comprising:a main virtual assistant (VA) that is configured to operate a back-end system according to instructions; and learn, by conversing with a human user and by analyzing responses from said main VA to the human user, to perform a task that is associated with the back-end system,', 'hold a conversation with said main VA, wherein, in the conversation, the instructions are formulated and relayed from said intermediary VA to said main VA based on the learning and on further conversing with the human user, such that said main VA operates the back-end system according to the instructions, and', 'formulate and relay responses to the instructions from said main VA to the human user., 'an intermediary VA that is configured to2. The system according to claim 1 , wherein the learning by said intermediary VA comprises learning a model which comprises: entities claim 1 , and fields that can store values for each of the entities.3. The system according to claim 2 , wherein the instructions are formulated and relayed to said main VA based on the learned model.4. The system according to claim 3 , wherein the instructions are with respect to implicit fields and values that are not explicitly mentioned in an utterance of the human user claim 3 , and the ...

Подробнее
11-05-2017 дата публикации

Method and apparatus for using a vocal sample to customize text to speech applications

Номер: US20170133005A1
Автор: Mason Paul Wendell
Принадлежит:

Apparatus and methods consistent with the present invention measure one or more of the characteristics of a voice recording and use such measurements to create a synthetic voice that approximates the recorded voice and uses such created synthetic voice to verbalize the content of an electronically conveyed written message such as an SMS text message. The vocal characteristics measured may include frequency, timbre, intensity, rhythm, and rate of speech as well as others. 1. A method comprising:receiving, via a client application interface, a recorded sample of a sender's voice;measuring the vocal characteristics of the recorded sample of the sender's voice including its frequency, intensity, rhythm and rate of speech;receiving a text-based message originating from the sender;converting the text-based message to a speech format wherein the measured vocal characteristics are used to form a synthetic voice that approximates the voice of the sender;sending an audio file of the sender's message as converted to an address that corresponds to the address of the text-based message.2. The method of wherein the recorded sample of the sender's voice is made by sampling at a rate of at least 40 claim 1 ,000 Hertz.3. The method of wherein the sample of the sender's voice consists of a sequence of predetermined words.4. The method of wherein the recorded sample is at least 20 syllables long.5. The method of wherein the sample of the sender's voice comprises the sender's voicemail greeting.6. The method of wherein the sender's voicemail greeting is accessed telephonically.7. The method of wherein the sample of the sender's voice is searched for words or phrases commonly used in the context of a voicemail greeting and the sample of the sender's voice subjected to measurement of frequency and intensity characteristics is limited to such commonly used words or phrases.8. The method of wherein one or more acronyms in the text-based message are audibly expressed as full words or ...

Подробнее
02-05-2019 дата публикации

ANIMATED PRESENTATION CREATOR

Номер: US20190129920A1
Принадлежит:

Aspects create a multimedia presentation wherein processors are configured to calculate a time it would take to narrate a plurality of words in a document at a specified speech speed in response to determining that the time it would take to narrate the plurality of words in the document at the specified speech speed exceeds a specified maximum time, generate a long summary of the document as a subset of the plurality of words, generate audio content for a first portion of the plurality of words of the long summary by applying a text-to-speech processing mechanism to the portion of the long summary at the desired speech speed, and create a multimedia slide of a multimedia presentation by adding the generated audio content to a presentation of text from a remainder portion of the plurality of words of the long summary. 1. A computer-implemented method for creating a presentation based on a text document , comprising executing on a computer processor:calculating a time it would take to narrate a plurality of words in a document at a specified speech speed;in response to determining that the time it would take to narrate the plurality of words in the document at the specified speech speed exceeds a specified maximum time, generating a long summary of the document as a subset of the plurality of words that comprises the highest relevant information identified by applying natural language processing to the document, and that requires a time to narrate at the specified speech speed that is less than the specified maximum time;generating audio content for a first portion of the plurality of words of the long summary by applying a text-to-speech processing mechanism to the portion of the long summary at the desired speech speed; andcreating a multimedia slide of a multimedia presentation by adding the generated audio content to a presentation of text from a remainder portion of the plurality of words of the long summary.2. The method of claim 1 , further comprising: ...

Подробнее
03-06-2021 дата публикации

MODULAR SYSTEMS AND METHODS FOR SELECTIVELY ENABLING CLOUD-BASED ASSITIVE TECHNOLOGIES

Номер: US20210165951A1
Принадлежит:

Methods and systems for manual and programmatic remediation of websites. JavaScript code is accessed by a user device and optionally calls TTS, ASR, and RADAE modules from a remote server to thereby facilitate website navigation by people with diverse abilities. 141.-. (canceled)42. A computer-implemented method of programmatically assigning a descriptive attribute to an untagged element in a web page to enable an audible description of the untagged element , the web page having an associated document object model (DOM) , the computer-implemented method comprising:accessing, by a computer system, code associated with the web page, the code comprising at least HTML or the DOM;retrieving, by the computer system, at least one remediation code from a remote server; and identifying, by the computer system, the untagged element in the code associated with the web page, the untagged element comprises a file path unassociated with a descriptive attribute;', 'assigning, by the computer system, a descriptive attribute to the file path based on data associated with the at least one remediation code retrieved from the remote server, the descriptive attribute assigned to the file path adapted to enable an assistive technology to speak the descriptive attribute to a user;, 'applying, by the computer system, the at least one remediation code to the web page, the applying comprisesrendering, by the computer system, to the user the web page based on the code and the assigned descriptive attribute.43. The computer-implemented method of claim 42 , wherein the assigning comprises changing the DOM or HTML code associated with the web page.44. The computer-implemented method of claim 42 , wherein the file path is a URL.45. The computer-implemented method of claim 42 , wherein the descriptive attribute is determined based on using an artificial intelligence algorithm to generate the descriptive attribute.46. The computer-implemented method of claim 42 , wherein the descriptive attribute ...

Подробнее
30-04-2020 дата публикации

DIALOGUE APPARATUS AND CONTROL PROGRAM FOR DIALOGUE APPARATUS

Номер: US20200130195A1
Принадлежит: TOYOTA JIDOSHA KABUSHIKI KAISHA

A dialogue apparatus includes a display unit, a first dialogue control unit configured to display a first character on the display unit and simulate a speech function of an external communication robot capable of having a dialogue to conduct the dialogue with a user, a second dialogue control unit configured to display a second character on the display unit and conduct the dialogue so as to mediate the dialogue between the user and the first dialogue control unit, and a transmission unit configured to transmit, to the external communication robot, dialogue information about the dialogue conducted by the first dialogue control unit and the second dialogue control unit. 1. A dialogue apparatus comprising:a display unit;a first dialogue control unit configured to display a first character on the display unit and simulate a speech function of an external communication robot capable of having a dialogue to conduct the dialogue with a user;a second dialogue control unit configured to display a second character on the display unit and conduct the dialogue so as to mediate the dialogue between the user and the first dialogue control unit; anda transmission unit configured to transmit, to the external communication robot, dialogue information about the dialogue conducted by the first dialogue control unit and the second dialogue control unit.2. The dialogue apparatus according to claim 1 , further comprising a reception unit configured to receive the dialogue information about the dialogue conducted by the external communication robot claim 1 , whereinthe first dialogue control unit conducts the dialogue based on the dialogue information received by the reception unit.3. The dialogue apparatus according to claim 2 , wherein the first dialogue control unit does not conduct the dialogue in an environment where the user can have the dialogue with the external communication robot.4. The dialogue apparatus according to claim 3 , wherein the second dialogue control unit conducts ...

Подробнее
09-05-2019 дата публикации

LANGUAGE-ADAPTED USER INTERFACES

Номер: US20190139427A1
Принадлежит: Shanghai Index Ltd.

A computer-implemented training system includes data storage storing: (i) a first set of visual instruction materials, (ii) a second set of audio instruction materials, and (iii) a set of user data. The user data includes, for each user, (a) a first indication of a visual instruction status and (b) a second indication of an audio instruction status. Also stored is a presentation control processing arrangement configured for selecting an instruction material of the first set or the second set and causing it to be presented to a user. The presentation control processing arrangement is configured to, for a specific user, select one of (a) a visual instruction material from the first set based on the first indication for that user and (b) an audio instruction material from the second set based on the second indication for that user, and presenting the selected instruction material to the user. 1. A computer system configured to implement a user interface for the teaching of a logographic language , the system being configured to:select a word in the logographic language that is represented in the language by multiple written characters;cause multiple placeholder icons to be displayed to a user of the system, there being one placeholder icon for each character of the selected word;enabling a user to select one of the placeholder icons; andin response to the selection of one of the placeholder icons causing information to be presented to the user relating to a corresponding character of the multicharacter word.2. A computer system as claimed in claim 1 , wherein each placeholder icon has an appearance different from the appearance of any character in the logographic language.3. A computer system as claimed in claim 1 , wherein all the placeholder icons have the same appearance.4. A computer system as claimed in claim 3 , wherein the placeholder icons are geometric shapes of a common colour.5. A computer system as claimed in claim 1 , wherein the placeholder icons are ...

Подробнее
10-06-2021 дата публикации

SECURE TEXT-TO-VOICE MESSAGING

Номер: US20210174807A1
Автор: Mohapatra Bibhudendu
Принадлежит:

A personal voice model is created using a person's voice, essentially, voice cloning. When user wants to send a message to another person from his mobile phone or similar device, the user types the message and it is converted to a speech message using the voice model created. The speech message is delivered either to a voicemail or another medium which accepts a voice message. The text can be converted to a different language as well and send a voice message in a different language. 1. An assembly , comprising:at least one processor;at least one display configured to communicate with the processor;the processor being programmed to access instructions executable by the processor to:execute a messaging app for sending messages from a sender to recipients, the instructions being executable to employ the messaging app to:receive text input;convert the text input to a voice signal synthesized to be in the sender's voice; andsend the voice signal to at least one recipient device for audible playback on the recipient device.2. The assembly of claim 1 , wherein the instructions are executable to:compare the text message to a list of terms; andbased at least in part on the compare, determine whether to convert the text message to the voice signal.3. The assembly of claim 2 , wherein the list of terms comprises a blacklist claim 2 , and the instructions are executable to not convert the text message to the voice signal responsive to the text message containing at least one term on the blacklist.4. The assembly of claim 3 , wherein the instructions are executable to claim 3 , responsive to the text message containing at least one term on the blacklist claim 3 , present on the assembly a warning.5. The assembly of claim 3 , wherein the instructions are executable to claim 3 , responsive to the text message containing at least one term on the blacklist claim 3 , disable the assembly at least from sending voice signals.6. The assembly of claim 2 , wherein the list of terms ...

Подробнее
14-08-2014 дата публикации

CUSTOMIZED SPEECH GENERATION

Номер: US20140229182A1
Принадлежит: Amazon Technologies, Inc.

Various approaches enable automatic communication generation based on patterned behavior in a particular context. For example, a computing device can monitor behavior of a user to determine patterns of communication behavior in certain situations. In response to detecting multiple occurrences of the certain situation, a computing device can prompt a user to perform an action corresponding to the pattern of behavior. In some embodiments, a set of speech models corresponding to a type of contact is generated. The speech models include language consistent with patterns of speech between a user and the type of contact. Based on context and on the contact, a message using language consistent with past communications between the user and contact is generated from a speech model associated with the type of contact. 1. (canceled)2. A system , comprising:at least one device processor; and receive a first communication between a first user and a second user;', 'analyze text contained in the first communication to identify at least one communication pattern;', 'generate a communication model, to be associated with at least one of the first user and the second user, based at least in part on the at least one communication pattern;', 'receive a second communication from the second user, the second communication requesting a response from the first user;', 'obtain activity information associated with the first user; and', 'in response to the first user not responding to the second communication within a predetermined period of time, generate a message to be sent to the second user, the message including textual content based at least in part on the activity information and corresponding to the communication model., 'memory including instructions that, when executed by the at least one device processor, cause the system to3. The system of claim 2 , further comprising:enable the first user to approve the message, andsend the message to the second user.4. The system of claim 2 , ...

Подробнее
15-09-2022 дата публикации

READING DEVICE

Номер: US20220292933A1
Автор: Tsuchiya Osamu
Принадлежит: TOSHIBA TEC KABUSHIKI KAISHA

According to at least one embodiment, a reading device includes a first device and a second device. The first device includes an imaging unit, a first recognition unit configured to recognize a commodity from a captured image of the imaging unit, and a voice output unit configured to emit a voice if the first recognition unit recognizes the commodity. The second device is separate from the first device, and includes a second recognition unit configured to recognize a commodity from the captured image by a method different from that of the first recognition unit and a second voice generation unit configured to output voice data for sounding the voice output unit if the second recognition unit recognizes the commodity. The first device includes a first voice generation unit configured to output voice data for sounding the voice output unit if the first recognition unit recognizes the commodity, an input unit configured to take in voice data output by the second voice generation unit into the first device, and a voice mixer configured to input the voice data taken in by the input unit and the voice data output by the first voice generation unit to the voice output unit. 1. A reading device comprising: an imager;', 'a first recognizer configured to recognize a commodity from a captured image of the imager; and', 'a voice output configured to, when the first recognizer recognizes the commodity, emit a voice; and, 'a first device, includinga second device separate from the first device, the second device including a second recognizer configured to recognize a commodity from the captured image by a method different from that of the first recognizer, whereinthe second device further includes a second voice generator configured to output voice data for sounding the voice output when the second recognizer recognizes the commodity, and a first voice generator configured to output voice data for sounding the voice output when the first recognizer recognizes the commodity,', 'an ...

Подробнее
01-06-2017 дата публикации

SYSTEM AND METHOD FOR AUTOMATICALLY CONVERTING TEXTUAL MESSAGES TO MUSICAL COMPOSITIONS

Номер: US20170154615A1
Принадлежит:

A method for converting textual messages to musical messages comprising receiving a text input and receiving a musical input selection. The method includes analyzing the text input to determine text characteristics and analyzing a musical input corresponding to the musical input selection to determine musical characteristics. Based on the text characteristic and the musical characteristic, the method includes correlating the text input with the musical input to generate a synthesizer input, and sending the synthesizer input to a voice synthesizer. The method includes receiving a vocal rendering of the text input from the voice synthesizer and generating a musical message from the vocal rendering and the musical input. The method includes generating a video element based on a video input, incorporating the video element into the musical message, and outputting the musical message including the video element. 1. A computer implemented method for automatically converting textual messages to musical messages , the computer implemented method comprising:receiving a text input;receiving a musical input selection;analyzing, via one or more processors, the text input to determine at least one text characteristic of the text input;analyzing, via the one or more processors, a musical input corresponding to the musical input selection to determine at least one musical characteristic of the musical input;based on the at least one text characteristic and the at least one musical characteristic, correlating, via the one or more processors, the text input with the musical input to generate a synthesizer input;sending the synthesizer input to a voice synthesizer;receiving, from the voice synthesizer, a vocal rendering of the text input;generating, via the one or more processors, a musical message from the vocal rendering of the text input and the musical input;receiving a video input;generating, via the one or more processors, a video element based on the video input;incorporating, ...

Подробнее
17-06-2021 дата публикации

SPEECH PROCESSING

Номер: US20210183358A1
Принадлежит:

A first neural network model of a user device processes audio data to extract audio embeddings that represent vocal characteristics of a user of an utterance represented in the audio data. The audio embeddings may then be hashed to remove characteristics specific to the user while still maintaining a unique set of characteristics. The hashed embeddings may be sent to a remote system, which may use them to identify the user. 1. A computer-implemented method comprising:determining, using a user device, first audio data corresponding to an utterance;processing, using a feature extraction component, the first audio data to determine first embedding data representing first vocal characteristics of a user who spoke the utterance;processing, using a feature conversion component, the first embedding data to determine second embedding data representing second vocal characteristics representing a synthesized voice;sending, to a remote system, the second embedding data;receiving, from the remote system, user identification data corresponding to the user; andprocessing the first audio data and the user identification data to determine a response to the utterance.2. The computer-implemented method of claim 1 , further comprising:outputting, using the user device, a first prompt requesting that the user say something;determining, using the user device, third audio data corresponding to a first representation of a word the user said;outputting, using the user device, a second prompt requesting that the user say the same thing again;determining, using the user device, fourth audio data corresponding to a second representation of the word; andusing the third audio data and the fourth audio data by a text-to-speech (TTS) component.3. The computer-implemented method of claim 1 , further comprising:receiving, by the remote system, the second embedding data;determining, by the remote system and using the second embedding data, speaker identification data;processing, using a text-to- ...

Подробнее
09-06-2016 дата публикации

DIALOG MANAGEMENT SYSTEM AND DIALOG MANAGEMENT METHOD

Номер: US20160163314A1
Автор: FUJII Yoichi, ISHII Jun
Принадлежит: Mitsubishi Electric Corporation

An intention estimated-weight determination processor determines an intention estimated weight on the basis of an intention hierarchical graphic data and an activated intention. A transfer node determination processor determines an intention to be newly activated through transition, after correcting an intention estimation result according to the intention estimated weight. A dialog turn generator generates a turn of dialog from the activated intention. A dialog management unit controls, when a new input is provided due to the turn of dialog, at least one process among processes performed by an intention estimation processor , the intention estimated-weight determination processor , the transition node determination processor and the dialog turn generator , followed by repeating that controlling, to thereby finally execute a setup command. 16-. (canceled)7. An dialog control system comprising:an intention estimation processor that, based on data provided by converting an input in a natural language into a morpheme string, estimates an intention of the input;an intention estimated-weight determination processor that, based on data in which intentions are arranged in a hierarchical structure and based on the intention thereamong being activated at a given object time, determines an intention estimated weight of the intention estimated by the intention estimation processor;a transition node determination processor that determines an intention to be newly activated through transition, after correcting an estimation result by the intention estimation processor according to the intention estimated weight determined by the intention estimated-weight determination processor;a history-considered dialog turn generator that generates a turn of dialog from one or plural intentions activated by the transition node determination processor, and that records each command having been executed as a result by the dialog, to thereby generate a turn of dialog using a list in which ...

Подробнее
14-05-2020 дата публикации

ELECTRONIC APPARATUS AND METHOD FOR CONTROLLING THEREOF

Номер: US20200152194A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

An electronic apparatus, based on a text sentence being input, obtains prosody information of the text sentence, segments the text sentence into a plurality of sentence elements, obtains a speech in which prosody information is reflected to each of the plurality of sentence elements in parallel by inputting the plurality of sentence elements and the prosody information of the text sentence to a text to speech (TTS) module, and merges the speech for the plurality of sentence elements that are obtained in parallel to output speech for the text sentence. 1. An electronic apparatus comprising:a memory configured to store at least one instruction; and based on obtaining a text input, obtain prosody information of the text input,', 'segment the text input into a plurality of segments,', 'obtain speech segments in which the prosody information is reflected to each segment of the plurality of segments in parallel by inputting the plurality of segments and the prosody information to a text-to-speech (TTS) module, and', 'obtain a speech for the text input by merging the speech segments., 'a processor configured to execute the at least one instruction stored in the memory, which when executed causes the processor to control to2. The electronic apparatus of claim 1 , wherein the processor when executing the at least one instruction is further configured to:obtain a plurality of first segments by segmenting the text input based on a first criterion, and based on a first processing time for converting the plurality of first segments to the speech segments being less than a predetermined time, input the plurality of first segments to the TTS module,based on the first processing time for converting at least one first segment of the plurality of first segments to the speech segments being greater than or equal to the predetermined time, obtain a plurality of second segments by segmenting the at least one first segment based on a second criterion, andbased on a second processing time ...

Подробнее
24-06-2021 дата публикации

Multimedia processing method and electronic system

Номер: US20210193107A1
Принадлежит: Acer Inc

An electronic system is provided. The electronic system includes a host and a display. The host includes an audio processing module, and a smart interpreter engine. The audio processing module is utilized for acquiring audio data corresponding to a first language from audio streams processed by an application program executed on the host. The application program executed on the host includes a specific game software. The smart interpreter engine is utilized for receiving the audio data corresponding to the first language from the audio processing module and converting the audio data corresponding to the first language into text data corresponding to a second language according to the game software executed on the host The display is utilized for receiving the text data corresponding to the second language from the smart interpreter engine and displaying the text data corresponding to the second language.

Подробнее
24-06-2021 дата публикации

MULTI-MODAL INTERACTION BETWEEN USERS, AUTOMATED ASSISTANTS, AND OTHER COMPUTING SERVICES

Номер: US20210193146A1
Принадлежит:

Techniques are described herein for multi-modal interaction between users, automated assistants, and other computing services. In various implementations, a user may engage with the automated assistant in order to further engage with a third party computing service. In some implementations, the user may advance through dialog state machines associated with third party computing service using both verbal input modalities and input modalities other than verbal modalities, such as visual/tactile modalities. 1. A method implemented using one or more processors , comprising:receiving, by a third party computing service implemented at least in part by the one or more processors, first data transmitted over one or more computer networks from an automated assistant, wherein the first data is indicative of an intent of a user of a computing device in communication with the automated assistant as part of a human-to-computer dialog session between the user and the automated assistant;generating resolution information based on the intent of the user;updating a display context maintained for the third party computing service in association with the human-to-computer dialog session, wherein the updating is based at least in part on one or both of the intent and the resolution information; andtransmitting second data to the automated assistant over one or more of the computer networks, wherein the second data is indicative of the display context and causes an assistant application executing on the computing device to trigger a touchless interaction between the user and a graphical user interface of the assistant application.2. The method of claim 1 , wherein the graphical user interface comprises a web browser embedded into the assistant application.3. The method of claim 1 , wherein the touchless interaction comprises operation of a selectable element of the graphical user interface.4. The method of claim 1 , wherein the touchless interaction comprises scrolling to a particular ...

Подробнее
16-06-2016 дата публикации

SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS

Номер: US20160171970A1
Принадлежит:

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database. 1. A method comprising:detecting incorrect stress patterns in selected acoustic units representing speech to be synthesized;performing a word level analysis of the incorrect stress patterns, a phrase level analysis of the incorrect stress patterns and a sentence level analysis on the incorrect stress patterns to yield analyses, wherein the analyses are performed in parallel; andmodifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units according to the analyses, to yield corrected stress patterns.2. The method of claim 1 , wherein detecting incorrect stress patterns is performed according to a stress pattern for a language.3. The method of claim 2 , wherein the stress pattern comprises one of lexical stress claim 2 , sentential stress claim 2 , primary stress claim 2 , and secondary stress.4. The method of claim 1 , further comprising receiving a stress pattern for both a language and an accent in the language claim 1 , wherein the detecting of the incorrect stress patterns is performed based on the stress ...

Подробнее