Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 4429. Отображено 100.
22-03-2012 дата публикации

Mobile business client

Номер: US20120072489A1
Принадлежит: Individual

The subject mater herein relates to computer software and client-server based applications and, more particularly, to a mobile business client. Some embodiments include one or more device-agnostic application interaction models and one or more device specific transformation services. Some such embodiments provide one or more of systems, methods, and software embodied at least in part in a device specific transformation service to transform channel agnostic application interaction models to and from device or device surrogate specific formats.

Подробнее
12-04-2012 дата публикации

Speech synthesizer, speech synthesizing method and program product

Номер: US20120089402A1
Принадлежит: Toshiba Corp

According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.

Подробнее
05-07-2012 дата публикации

Multi-lingual text-to-speech system and method

Номер: US20120173241A1

A multi-lingual text-to-speech system and method processes a text to be synthesized via an acoustic-prosodic model selection module and an acoustic-prosodic model mergence module, and obtains a phonetic unit transformation table. In an online phase, the acoustic-prosodic model selection module, according to the text and a phonetic unit transcription corresponding to the text, uses at least a set controllable accent weighting parameter to select a transformation combination and find a second and a first acoustic-prosodic models. The acoustic-prosodic model mergence module merges the two acoustic-prosodic models into a merged acoustic-prosodic model, according to the at least a controllable accent weighting parameter, processes all transformations in the transformation combination and generates a merged acoustic-prosodic model sequence. A speech synthesizer and the merged acoustic-prosodic model sequence are further applied to synthesize the text into an L1-accent L2 speech.

Подробнее
19-07-2012 дата публикации

Extracting text for conversion to audio

Номер: US20120185253A1
Принадлежит: Microsoft Corp

Embodiments are disclosed that relate to converting markup content to an audio output. For example, one disclosed embodiment provides, in a computing device a method including partitioning a markup document into a plurality of content panels, and forming a subset of content panels by filtering the plurality of content panels based upon geometric and/or location-based criteria of each panel relative to an overall organization of the markup document. The method further includes determining a document object model (DOM) analysis value for each content panel of the subset of content panels, identifying a set of content panels determined to contain text body content by filtering the subset of content panels based upon the DOM analysis value of each of the content panels of the subset of content panels, and converting text in a selected content panel determined to contain text body content to an audio output.

Подробнее
09-08-2012 дата публикации

Recognition dictionary creating device, voice recognition device, and voice synthesizer

Номер: US20120203553A1
Автор: Yuzo Maruta
Принадлежит: Mitsubishi Electric Corp

A recognition dictionary creating device includes a user dictionary in which a phoneme label string of an inputted voice is registered and an interlanguage acoustic data mapping table in which a correspondence between phoneme labels in different languages is defined, and refers to the interlanguage acoustic data mapping table to convert the phoneme label string registered in the user dictionary and expressed in a language set at the time of creating the user dictionary into a phoneme label string expressed in another language which the recognition dictionary creating device has switched.

Подробнее
20-09-2012 дата публикации

Apparatus and method for supporting reading of document, and computer readable medium

Номер: US20120239390A1
Принадлежит: Toshiba Corp

According to one embodiment, an apparatus for supporting reading of a document includes a model storage unit, a document acquisition unit, a feature information extraction, and an utterance style estimation unit. The model storage unit is configured to store a model which has trained a correspondence relationship between first feature information and an utterance style. The first feature information is extracted from a plurality of sentences in a training document. The document acquisition unit is configured to acquire a document to be read. The feature information extraction unit is configured to extract second feature information from each sentence in the document to be read. The utterance style estimation unit is configured to compare the second feature information of a plurality of sentences in the document to be read with the model, and to estimate an utterance style of the each sentence of the document to be read.

Подробнее
27-12-2012 дата публикации

Method for producing ammonium tungstate aqueous solution

Номер: US20120328506A1

A method for producing an ammonium tungstate aqueous solution includes the steps of: adding sulfuric acid to a solution containing tungstate ions; bringing the solution having the sulfuric acid added therein, into contact with an anion exchange resin; and bringing the anion exchange resin into contact with an aqueous solution containing ammonium ions.

Подробнее
27-12-2012 дата публикации

Method, system and processor-readable media for automatically vocalizing user pre-selected sporting event scores

Номер: US20120330666A1
Принадлежит: Verna IP Holdings LLC

A method and system for vocalizing user-selected sporting event scores. A customized spoken score application module can be configured in association with a device. A real-time score can be preselected by a user from an existing sporting event website for automatically vocalizing the score in a multitude of languages utilizing a speech synthesizer and a translation engine. An existing text-to-speech engine can be integrated with the spoken score application module and controlled by the application module to automatically vocalize the preselected scores listed on the sporting event site. The synthetically-voiced, real-time score can be transmitted to the device at a predetermined time interval. Such an approach automatically and instantly pushes the real time vocal alerts thereby permitting the user to continue multitasking without activating the pre-selected vocal alerts.

Подробнее
27-12-2012 дата публикации

Speech synthesizer, navigation apparatus and speech synthesizing method

Номер: US20120330667A1
Принадлежит: HITACHI LTD

Included in a speech synthesizer, a natural language processing unit divides text data, input from a text input unit, into a plurality of components (particularly, words). An importance prediction unit estimates an importance level of each component according to the degree of how much each component contributes to understanding when a listener hears synthesized speech. Then, the speech synthesizer determines a processing load based on the device state when executing synthesis processing and the importance level. Included in the speech synthesizer, a synthesizing control unit and a wave generation unit reduce the processing time for a phoneme with a low importance level by curtailing its processing load (relatively degrading its sound quality), allocate a part of the processing time, made available by this reduction, to the processing time of a phoneme with a high importance level, and generates synthesized speech in which important words are easily audible.

Подробнее
14-03-2013 дата публикации

Apparatus and method for generating vocal organ animation

Номер: US20130065205A1
Автор: Bong-Rae Park
Принадлежит: CLUSOFT CO Ltd

The present disclosure relates to an apparatus and method for generating a vocal organ animation very similar to a pronunciation pattern of a native speaker in order to support foreign language pronunciation education. The present disclosure checks an adjacent phonetic value in phonetic value constitution information, extracts a detail phonetic value based on the adjacent phonetic value, extracting pronunciation pattern information corresponding to the detail phonetic value and pronunciation pattern information corresponding to a transition section allocated between detail phonetic values, and performs interpolation to the extracted pronunciation pattern information, thereby generating a vocal organ animation.

Подробнее
14-03-2013 дата публикации

Parametric speech synthesis method and system

Номер: US20130066631A1
Автор: Fengliang Wu, Zhenhua Wu
Принадлежит: Goertek Inc

The present invention provides a parametric speech synthesis method and a parametric speech synthesis system. The method comprises sequentially processing each frame of speech of each phone in a phone sequence of an input text as follows: for a current phone, extracting a corresponding statistic model from a statistic model library and using model parameters of the statistic model that correspond to the current frame of the current phone as rough values of currently predicted speech parameters; according to the rough values and information about a predetermined number of speech frames occurring before the current time point, obtaining smoothed values of the currently predicted speech parameters; according to global mean values and global standard deviation ratios of the speech parameters obtained through statistics, performing global optimization on the smoothed values of the speech parameters to generate necessary speech parameters; and synthesizing the generated speech parameters to obtain a frame of speech synthesized for the current frame of the current phone. With this solution, the capacity of an RAM needed by speech synthesis will not increase with the length of the synthesized speech, and the time length of the synthesized speech is no longer limited by the RAM.

Подробнее
21-03-2013 дата публикации

Alarm method and apparatus in portable terminal

Номер: US20130070575A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

An alarm method in a portable terminal is provided, including setting an alarm by setting an alarm time, setting output information to be output at the alarm time, and setting an output device for receiving and displaying the output information; outputting a preset alarm sound when the set alarm time arrives, and obtaining the output information to be output at the alarm time; and transmitting the obtained output information to the set output device.

Подробнее
28-03-2013 дата публикации

Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus

Номер: US20130080176A1
Принадлежит: AT&T INTELLECTUAL PROPERTY II, L.P.

A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. Statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. The system synthesizes a large body of speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur. 1. A method comprising: assigning a default value as the associated concatenation cost; and', 'updating the concatenation cost database by synthesizing a body of speech, identifying the acoustic unit sequential pair in the body of speech, and recording a respective concatenation cost in the concatenation cost database., 'when, while synthesizing speech, an acoustic unit sequential pair does not have an associated concatenation cost in a concatenation cost database2. The method of claim 1 , further comprising synthesizing the speech using the respective concatenation cost.3. The method of claim 1 , wherein recording the respective concatenation cost comprises:assigning a value to each acoustic unit in the acoustic unit sequential pair; anddetermining a difference associated with the value assigned to each acoustic unit, to yield the respective concatenation cost.4. The method of claim 1 , wherein the concatenation cost database contains a portion of all possible concatenation costs associated with ...

Подробнее
04-04-2013 дата публикации

SPEECH SAMPLES LIBRARY FOR TEXT-TO-SPEECH AND METHODS AND APPARATUS FOR GENERATING AND USING SAME

Номер: US20130085759A1
Принадлежит: VIVOTEXT LTD.

A method for converting translating text into speech with a speech sample library is provided. The method comprises converting translating an input text to a sequence of triphones; determining musical parameters of each phoneme in the sequence of triphones; detecting, in the speech sample library, speech segments having at least the determined musical parameters; and concatenating the detected speech segments. 1. A method for converting text into speech with a speech sample library , comprising:converting an input text to a sequence of triphones;determining musical parameters of each phoneme in the sequence of triphones;detecting, in the speech sample library, speech segments having at least the determined musical parameters; andconcatenating the detected speech segments.2. The method of claim 1 , further comprising:adjusting the musical parameters of speech segments prior to concatenating the speech segments.3. The method of claim 1 , wherein the at least one musical parameter is any one of: a pitch curve claim 1 , a pitch perception claim 1 , duration claim 1 , and a volume.4. The method of claim 3 , wherein a value of a musical vector is an index indicative of a sub range in which its respective at least one musical parameter lies.5. The method of claim 1 , wherein the sequence of triphones includes overlapping triphones.6. The method of claim 2 , wherein determining the musical parameters of each phoneme in the sequence of triphones further includes: providing a set of numerical targets for each of the musical parameters.7. The method of claim 6 , wherein detecting the speech segments having at least the determined musical parameters further includes:searching the speech sample library for at least one of a central phoneme, phonemic context, and a musical index indicating at least one range of at least one of the musical parameters within which at least of the numerical targets lies.8. The method of claim 1 , wherein each of the speech segments comprises at ...

Подробнее
04-04-2013 дата публикации

TRAINING AND APPLYING PROSODY MODELS

Номер: US20130085760A1
Автор: Jr. James H., Stephens
Принадлежит: MORPHISM LLC

Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles. 126-. (canceled)27. A computer-implementable method for synthesizing audible speech , with varying prosody , from textual content , the method comprising:maintaining an inventory of prosody models with lexicons,selecting a subset of multiple prosody models from the inventory of prosody models;associating prosody models in the subset of multiple prosody models with different segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models;applying the associated prosody models to the different segments of the text to produce prosody annotations for the text;considering annotations of the prosody annotations to reconcile conflicting prosody annotations due to multiple prosody models associated with a segment of the text; andsynthesizing audible speech from the text and the reconciled prosody annotations.28. The method of claim 27 , wherein the reconciling is based on a reconciliation policy.29. The method of claim 28 , wherein the reconciliation policy considers the annotations of the prosody annotations that comprise a prosody model identifier and a prosody model confidence for the prosody annotation.30. The method of claim 29 , wherein annotations of the prosody annotations are represented by markup elements that indicate the scope of the tagged text.31. The method of claim 30 , wherein the reconciliation eliminates conflicting annotations that result from applications of multiple models.32. The method of claim 31 , wherein the selecting is based on input parameters.33. The ...

Подробнее
18-04-2013 дата публикации

FACILITATING TEXT-TO-SPEECH CONVERSION OF A USERNAME OR A NETWORK ADDRESS CONTAINING A USERNAME

Номер: US20130096920A1
Принадлежит: RESEARCH IN MOTION LIMITED

To facilitate text-to-speech conversion of a username, a first or last name of a user associated with the username may be retrieved, and a pronunciation of the username may be determined based at least in part on whether the name forms at least part of the username. To facilitate text-to-speech conversion of a domain name having a top level domain and at least one other level domain, a pronunciation for the top level domain may be determined based at least in part upon whether the top level domain is one of a predetermined set of top level domains. Each other level domain may be searched for one or more recognized words therewithin, and a pronunciation of the other level domain may be determined based at least in part on an outcome of the search. The username and domain name may form part of a network address such as an email address, URL or URI. 1. A method , for a wireless communication device , for text-to-speech conversion of a network address , the method comprising:determining that a part of a username in the network address comprises one of a recognized word from a spoken language, a first name, and a last name; andgenerating a representation of a pronunciation of the part, pronounced as a whole.2. The method of claim 1 , wherein the part comprises the entire username.3. The method of claim 1 , wherein determining comprises searching the username for the part.4. The method of claim 1 , wherein the network address is an email address.5. The method of claim 4 , wherein the email address contains an ‘@’ symbol claim 4 , and wherein the username corresponds to the portion of the email address preceding the ‘@’ symbol.6. The method of claim 5 , further comprising retrieving the username as the portion of the email address preceding the ‘@’ symbol.7. The method of claim 4 , wherein determining comprises identifying the part as one of a first name and a last name included in a display name received in conjunction with the email address.8. The method of claim 1 , ...

Подробнее
02-05-2013 дата публикации

FACILITATING TEXT-TO-SPEECH CONVERSION OF A DOMAIN NAME OR A NETWORK ADDRESS CONTAINING A DOMAIN NAME

Номер: US20130110512A1
Принадлежит: RESEARCH IN MOTION LIMITED

A method and apparatus of facilitating text-to-speech conversion of a domain name are provided. At a processor of a computing device, a pronunciation of a top level domain of a network address is determined by one or more of: generating a phonetic representation of each character in the top level domain pronounced individually; and, generating a tokenized representation of each individual character of the top level domain suitable for interpretation by a text-to-speech engine. For each other level domain of the network address, at the processor, a pronunciation of the other level domain is determined based on one or more recognized words within the other level domain. 1. A method comprising: generating a phonetic representation of each character in the top level domain pronounced individually; and,', 'generating a tokenized representation of each individual character of the top level domain suitable for interpretation by a text-to-speech engine; and, 'determining, at a processor of a computing device, a pronunciation of a top level domain of a network address by one or more offor each other level domain of the network address, determining, at the processor, a pronunciation of the other level domain based on one or more recognized words within the other level domain.2. The method of claim 1 , wherein the determining the pronunciation of a top level domain of a network address further comprises determining whether said top level domain is one of a set of top level domains.3. The method of claim 2 , wherein the set represents top level domains that are pronounced as a whole.4. The method of claim 1 , wherein the determining the pronunciation of a top level domain of a network address further comprises one or more of:generating a phonetic representation of the top level domain pronounced as a whole; andgenerating a tokenized representation of the top level domain as a whole suitable for interpretation by a text-to-speech engine.5. The method of claim 1 , wherein the ...

Подробнее
09-05-2013 дата публикации

SPEECH SYNTHESIZER, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Номер: US20130117026A1
Автор: Kato Masanori
Принадлежит: NEC Corporation

State duration creation means creates a state duration indicating a duration of each state in a hidden Markov model, based on linguistic information and a model parameter of prosody information. Duration correction degree computing means derives a speech feature from the linguistic information, and computes a duration correction degree which is an index indicating a degree of correcting the state duration, based on the derived speech feature. State duration correction means corrects the state duration based on a phonological duration correction parameter and the duration correction degree, the phonological duration correction parameter indicating a correction ratio of correcting a phonological duration. 110.-. (canceled)11. A speech synthesizer comprising:a state duration creation unit for creating a state duration indicating a duration of each state in a hidden Markov model, based on linguistic information and a model parameter of prosody information;a duration correction degree computing unit for deriving a speech feature from the linguistic information, and computing a duration correction degree based on the derived speech feature, the duration correction degree being an index indicating a degree of correcting the state duration; anda state duration correction unit for correcting the state duration based on a phonological duration correction parameter and the duration correction degree, the phonological duration correction parameter indicating a correction ratio of correcting a phonological duration.12. The speech synthesizer according to claim 11 , wherein the duration correction degree computing unit estimates a temporal change degree of the speech feature derived from the linguistic information claim 11 , and computes the duration correction degree based on the estimated temporal change degree.13. The speech synthesizer according to claim 12 , wherein the duration correction degree computing unit estimates a temporal change degree of a spectrum or a pitch from ...

Подробнее
16-05-2013 дата публикации

VIDEO GENERATION BASED ON TEXT

Номер: US20130124206A1
Автор: Rezvani Behrooz, ROUHI Ali
Принадлежит: Seyyer, Inc.

Techniques for generating a video sequence of a person based on a text sequence, are disclosed herein. Based on the received text sequence, a processing device generates the video sequence of a person to simulate visual and audible emotional expressions of the person, including using an audio model of the person's voice to generate an audio portion of the video sequence. The emotional expressions in the visual portion of the video sequence are simulated based a priori knowledge about the person. For instance, the a priori knowledge can include photos or videos of the person captured in real life. 1. A method comprising:inputting a text sequence at a processing device; andgenerating, by the processing device, a video sequence of a person based on the text sequence to simulate visual and audible emotional expressions of the person, including using an audio model of the person's voice to generate an audio portion of the video sequence.2. The method of claim 1 , wherein the processing device is a mobile device claim 1 , the text sequence is inputted from a second mobile device via a Short Message Service (SMS) channel claim 1 , and said generating a video sequence of a person comprises generating claim 1 , by the mobile device claim 1 , a video sequence of a person based on shared information stored on the mobile device and the second mobile device.3. The method of claim 1 , wherein the text sequence includes a set of words including at least one word claim 1 , and wherein the video sequence is generated such that the person appears to utter the words in the video sequence.4. The method of claim 1 , wherein the text sequence includes a text representing an utterance claim 1 , and wherein the video sequence is generated such that the person appears to utter the utterance in the video sequence.5. The method of claim 1 , wherein the text sequence includes a word and an indicator for the word claim 1 , the indicator indicates an emotional expression of the person at a time ...

Подробнее
23-05-2013 дата публикации

System and Method for Generating Challenge Items for CAPTCHAs

Номер: US20130132093A1
Автор: GROSS JOHN NICHOLAS

Challenge items for an audible based electronic challenge system are generated using a variety of techniques to identify optimal candidates. The challenge items are intended for use in a computing system that discriminates between humans and text to speech (TTS) system. 119.-. (canceled)20. A method embodied in a computer readable medium of selecting challenge data to be used for accessing data and/or resources of a computing system comprising:(a) providing data identifying a first set of diphones to be assessed by a computing system, wherein each of said first set of diphones represents a sound associated with an articulation of a pair of phonemes in a natural language;(b) generating an a plurality of articulation scores using the computing system based on measuring acoustical characteristics of a machine text to speech (TTS) system articulation of each of said first set of diphones; and(c) selecting challenge text including words and phrases from the natural language using the computing system based on said plurality of articulation scores;wherein said challenge text is useable by an utterance-based challenge system for discriminating between humans and machines.21. The method of further including a step: processing input speech by an entity using said challenge item database to distinguish between a human and a machine synthesized voice.22. A method embodied in a computer readable medium of selecting challenge data to be used for accessing data and/or resources of a computing system comprising:a) selecting a candidate challenge item which includes text words and/or visual images;b) measuring first acoustical characteristics of a computer synthesized utterance when articulating challenge content associated with said candidate challenge item;c) measuring second acoustical characteristics of a human utterance when articulating said challenge content;d) generating a challenge item score based on measuring a difference in said first and second acoustical ...

Подробнее
06-06-2013 дата публикации

SYSTEMS AND METHODS DOCUMENT NARRATION

Номер: US20130144625A1
Принадлежит: K-NFB READING TECHNOLOGY, INC.

Disclosed are techniques and systems to provide a narration of a text in multiple different voices. In some aspects, systems and methods described herein can include receiving a user-based selection of a first portion of words in a document where the document has a pre-associated first voice model and overwriting the association of the first voice model, by the one or more computers, with a second voice model for the first portion of words. 1. A computer implemented method , comprising:receiving a user-based selection of a first portion of words in a document, at least of portion of the document being displayed on a user interface on a display device, the document being pre-associated with a first voice model;applying, by the one or more computers, in response to the user-based selection of the first portion of words, a first set of indicia to the user-selected first portion of words in the document; andoverwriting the association of the first voice model, by the one or more computers, with a second voice model for the first portion of words.2. The method of wherein the words in the first portion of words are narrated using the second voice model and at least some of the other words in the document are narrated using the first voice model.3. The method of claim 1 , wherein the method further comprises:associating, by the one or more computers, the first voice model with the document, prior to receiving the user-based selection of the first portion of words.4. The method of claim 1 , wherein the words in the first portion of words are narrated using the second voice model and remaining words in the document are narrated using the first voice model.5. The method of claim 1 , wherein the first voice model comprises a default voice model.6. The method of claim 1 , further comprising:applying, in response to a user-based selection of a second portion of words in the document, a second highlighting indicium to the user-selected second portion of words in the document; ...

Подробнее
25-07-2013 дата публикации

Speech synthesis method and apparatus for electronic system

Номер: US20130191130A1
Принадлежит: ASUSTeK Computer Inc

A speech synthesis method for an electronic system and a speech synthesis apparatus are provided. In the speech synthesis method, a speech signal file including text content is received. The speech signal file is analyzed to obtain prosodic information of the speech signal file. The text content and the corresponding prosodic information are automatically tagged to obtain a text tag file. A speech synthesis file is obtained by synthesizing a human voice profile and the text tag file.

Подробнее
08-08-2013 дата публикации

ELECTRONIC APPARATUS AND FUNCTION GUIDE METHOD THEREOF

Номер: US20130204623A1
Принадлежит: YAMAHA CORPORATION

In an electronic apparatus having a plurality of functions, a connecting unit connects the electronic apparatus to an external device which presents text information in a form recognizable by a visually impaired user. A function selection unit selects a function to be executed. A storage unit stores a table defining correspondence between the plurality of functions and a plurality of text files each containing text information. A text file selection unit selects a text file corresponding to the selected function with reference to the table. An acquisition unit acquires file information from the selected text file. A transmission unit transmits the acquired file information to the external device. 1. An electronic apparatus having a plurality of functions , comprising:a connecting unit that connects the electronic apparatus to an external device that has a presenting unit for presenting text information in a form desired by a user;a function selection unit that selects a function to be executed;a storage unit that stores matching information defining correspondence between the plurality of functions and a plurality of text files each containing text information;a text file selection unit that selects a text file corresponding to the function selected by the function selection unit with reference to the matching information;an acquisition unit that acquires file information from the selected text file; anda transmission unit that transmits the acquired file information to the external device connected by the connecting unit.2. The electronic apparatus according to claim 1 , wherein the function selection unit selects a function in either of a first manipulation mode or a second manipulation mode claim 1 , the electronic apparatus further comprising a control unit that executes the selected function when the function is selected by the function selection unit in the first manipulation mode and that does not execute the selected function when the function is selected by ...

Подробнее
08-08-2013 дата публикации

CONTEXTUAL CONVERSION PLATFORM FOR GENERATING PRIORITIZED REPLACEMENT TEXT FOR SPOKEN CONTENT OUTPUT

Номер: US20130204624A1
Автор: Ben-Ezri Daniel
Принадлежит:

A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform. 1. A method , comprising: receiving data related to content of a target;', 'filtering the data to locate a target term;', 'accessing one or more tables in a repository, the one or more tables comprising entries with a substitution unit corresponding to the target term, the entries arranged according to a prioritized scheme that defines a position for the substitution unit in the tables; and', 'generating an output comprising data that represents the substitution unit to be utilized by a text-to-speech generator to generate spoken content,', 'wherein the position of the substitution unit in the one or more tables is assigned based on a specificity characteristic that describes the relative inclusivity of the substitution unit as compared to other substitution units in the one or more tables., 'at a computer comprising a computer program to implement processing operations2. The method of claim 1 , further comprising:breaking the content into at least one contextual unit that includes the target term; andinserting the substitution unit in the contextual unit in place of the target term.3. The method of claim 1 , further comprising:identifying a context cue in the data, the context cue identifying characteristics of the target; andselecting a table from the one or more tables in which the substitution unit is compatible with the characteristics of target.4. The method of claim 1 , further comprising: ...

Подробнее
15-08-2013 дата публикации

Feature sequence generating device, feature sequence generating method, and feature sequence generating program

Номер: US20130211839A1
Автор: Masanori Kato
Принадлежит: NEC Corp

Spread level parameter correcting means 501 receives a contour parameter as information representing the contour of a feature sequence (a sequence of features of a signal considered as the object of generation) and a spread level parameter as information representing the level of a spread of the distribution of the features in the feature sequence. The spread level parameter correcting means 501 corrects the spread level parameter based on a variation of the contour parameter represented by a sequence of the contour parameters. Feature sequence generating means 502 generates the feature sequence based on the contour parameters and the corrected spread level parameters.

Подробнее
22-08-2013 дата публикации

SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND COMPUTER PROGRAM PRODUCT

Номер: US20130218568A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to an embodiment, a speech synthesis device includes a first storage, a second storage, a first generator, a second generator, a third generator, and a fourth generator. The first storage is configured to store therein first information obtained from a target uttered voice. The second storage is configured to store therein second information obtained from an arbitrary uttered voice. The first generator is configured to generate third information by converting the second information so as to be close to a target voice quality or prosody. The second generator is configured to generate an information set including the first information and the third information. The third generator is configured to generate fourth information used to generate a synthesized speech, based on the information set. The fourth generator configured to generate the synthesized speech corresponding to input text using the fourth information. 1. A speech synthesis device comprising:a first storage configured to store therein first information obtained from a target uttered voice;a second storage configured to store therein second information obtained from an arbitrary uttered voice;a first generator configured to generate third information by converting the second information so as to be close to a target voice quality or prosody;a second generator configured to generate an information set including the first information and the third information;a third generator configured to generate fourth information used to generate a synthesized speech, based on the information set; anda fourth generator configured to generate the synthesized speech corresponding to input text using the fourth information.2. The device according to claim 1 ,wherein the first information and the second information are stored together with attribute information thereof, andthe second generator generates the information set by adding the first information and the entire or a portion of the third information, the ...

Подробнее
05-09-2013 дата публикации

Automatic Sound Level Control

Номер: US20130231921A1
Принадлежит: AT&T Intellectual Property I, L.P.

A method includes identifying, at a computing device, a plurality of words in data. Each of the plurality of words corresponds to a particular word of a written language. The method includes determining a sound output level based on a location of the computing device. The method includes generating sound data based on the sound output level and the plurality of words identified in the data. 1. A method comprising:identifying, at a computing device, a plurality of words in data, wherein each of the plurality of words corresponds to a particular word of a written language;determining a sound output level based on at least in part on a location of the computing device; andgenerating sound data based on the sound output level and the plurality of words identified in the data.2. The method of claim 1 , further comprising determining a noise level external to the computing device claim 1 , wherein the sound output level is based on the noise level external to the computing device.3. The method of claim 2 , wherein determining the noise level external to the computing device includes receiving sound data from one or more sound input devices of the computing device.4. The method of claim 1 , wherein the data includes image data claim 1 , and wherein at least one of the plurality of words is identified in the image data.5. The method of claim 4 , wherein the at least one of the plurality of words is identified in the image data using optical character recognition.6. The method of claim 1 , wherein the data is accessed from a data file.7. The method of claim 6 , wherein the data file is in a portable document format.8. The method of claim 1 , further comprising outputting one or more sounds from the computing device based on the sound data.9. The method of claim 1 , further comprising accessing sound configuration data from a memory of the computing device claim 1 , the sound configuration data including sound data corresponding to one or more locations.10. The method of ...

Подробнее
05-09-2013 дата публикации

METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS

Номер: US20130231935A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings. 16-. (canceled)7. A method for use with a speech-enabled application , the method comprising:receiving, from the speech-enabled application, input comprising a plurality of text strings;identifying a first portion of a first text string of the plurality of text strings as differing from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string as not differing from a corresponding second portion of the second text string;assigning contrastive stress to the identified first portion of the first text string, but not to the identified second portion of the first text string;generating, using at least one computer system, speech synthesis output to render the plurality of text strings as speech having the assigned contrastive stress; andproviding the speech synthesis output for the speech-enabled application.8. The method of claim 7 , wherein the identifying comprises identifying the first portion of the first text string ...

Подробнее
26-09-2013 дата публикации

SPEECH DIALOGUE SYSTEM, TERMINAL APPARATUS, AND DATA CENTER APPARATUS

Номер: US20130253926A1
Автор: Takahashi Jun
Принадлежит: FUJITSU LIMITED

A speech dialogue system includes a data center apparatus and a terminal apparatus. The data center apparatus acquires answer information for request information obtained in a speech recognition process for speech data from a terminal apparatus, creates a scenario including the answer information, creates first synthesized speech data concerning the answer information, transmits the first synthesized speech data to the terminal apparatus, and transmits the scenario to the terminal apparatus while the first synthesized speech data is being created in the creating the first synthesized speech data. The terminal apparatus creates second synthesized speech data concerning the answer information in the received scenario, receives the first synthesized speech data, selects one of the first synthesized speech data and the second synthesized speech data based on a determination result regarding whether the reception of the first synthesized speech data is completed, and reproduces speech. 1. A speech dialogue system comprising: receives speech data of speech sound transmitted from a terminal apparatus,', 'applies a speech recognition process to the speech data to acquire request information expressed by the speech data,', 'acquires answer information for the request information from an information source,', 'creates a scenario including the answer information,', 'creates first synthesized speech data expressing synthesized speech that generates sound of the answer information,', 'transmits the first synthesized speech data created in the creating the first synthesized speech data to the terminal apparatus, and', 'transmits the scenario to the terminal apparatus while the first synthesized speech data is being created in the creating the first synthesized speech data; and, 'a data center apparatus that'} acquires input of the speech sound to convert the speech sound to speech data expressing the speech sound,', 'transmits the speech data of the speech sound to the data ...

Подробнее
26-09-2013 дата публикации

SOCIAL BROADCASTING USER EXPERIENCE

Номер: US20130253934A1
Принадлежит: JELLI, INC.

A method of providing user participation in a social broadcast environment is disclosed. A network communication is received from a user of a broadcast that includes a preference data indicating a preference of the user that a promoted content be included in the broadcast. Via a responsive network communication, a feedback data is provided to the user that includes a predicted future time at which the promoted content may be included in the broadcast. 121.-. (canceled)22. A method of promoting user participation in a radio broadcast , comprising: 1) designating a track currently being played on air during a radio broadcast; and', '2) causing the track currently being played to be taken off the radio broadcast via a power-up used by the user and applied to the track currently being played;, 'determining based on an attribution criterion that a user of the radio broadcast is to receive credit at least in part forusing a processor to process a user profile data associated with the user to generate an audio signature including an audio identifier of the user, wherein the audio identifier is speech synthesized from the user profile data; andincluding the audio signature in the radio broadcast after the track is taken off the radio broadcast.23. A method as recited in claim 22 , wherein the power-up is a bomb.24. A method as recited in claim 22 , wherein the power-up is a virus.25. A method as recited in claim 22 , wherein the user designates the track via one or more of: a form on a social broadcasting portal claim 22 , a widget on a third party site claim 22 , a form on a third party site claim 22 , and a mobile gateway.26. A method as recited in claim 22 , wherein the user is a host of a group of plurality of users of the radio broadcast.27. A method as recited in claim 26 , wherein the power-up includes a proxy from another member of the group.28. A method as recited in claim 27 , wherein the user profile data includes an affiliation of the group.29. A method as ...

Подробнее
03-10-2013 дата публикации

Text to speech method and system

Номер: US20130262109A1
Принадлежит: Toshiba Corp

A text-to-speech method for simulating a plurality of different voice characteristics includes dividing inputted text into a sequence of acoustic units; selecting voice characteristics for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model having a plurality of model parameters provided in clusters each having at least one sub-cluster and describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio with the selected voice characteristics. A parameter of a predetermined type of each probability distribution is expressed as a weighted sum of parameters of the same type using voice characteristic dependent weighting. In converting the sequence of acoustic units to a sequence of speech vectors, the voice characteristic dependent weights for the selected voice characteristics are retrieved for each cluster such that there is one weight per sub-cluster.

Подробнее
03-10-2013 дата публикации

PLAYBACK CONTROL APPARATUS, PLAYBACK CONTROL METHOD, AND PROGRAM

Номер: US20130262118A1
Принадлежит: SONY CORPORATION

A playback control apparatus includes a playback controller configured to control playback of first content and second content. The first content is to output first sound which is generated based on text information using speech synthesis processing. The second content is to output second sound which is generated not using the speech synthesis processing. The playback controller causes an attribute of content to be played back to be displayed on the screen, the attribute indicating whether or not the content is to output sound which is generated based on text information using speech synthesis processing. 1. A playback control apparatus comprising:a playback controller configured to control playback of first content and second content, the first content is to output first sound which is generated based on text information using speech synthesis processing, the second content is to output second sound which is generated not using the speech synthesis processing,wherein the playback controller causes an attribute of content to be played back to be displayed on the screen, the attribute indicating whether or not the content is to output sound which is generated based on text information using speech synthesis processing.2. The playback control apparatus according to claim 1 , wherein the playback controller further causes a display portion claim 1 , associated with sound output at that time claim 1 , to be displayed in a highlighted state.3. The playback control apparatus according to claim 1 , wherein the playback controller further changes a speaker or background music claim 1 , which is in part of the sound claim 1 , in accordance with content of the text information used in generating sound.4. The playback control apparatus according to claim 1 , wherein a text-to-speech function for generating sound based on the text information using the speech synthesis processing is configured to be turned on or off claim 1 , andthe playback controller causes the first content ...

Подробнее
03-10-2013 дата публикации

TEXT TO SPEECH SYSTEM

Номер: US20130262119A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

A text-to-speech method configured to output speech having a selected speaker voice and a selected speaker attribute, including: inputting text; dividing the inputted text into a sequence of acoustic units; selecting a speaker for the inputted text; selecting a speaker attribute for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model; and outputting the sequence of speech vectors as audio with the selected speaker voice and a selected speaker attribute. The acoustic model includes a first set of parameters relating to speaker voice and a second set of parameters relating to speaker attributes, which parameters do not overlap. The selecting a speaker voice includes selecting parameters from the first set of parameters and the selecting the speaker attribute includes selecting the parameters from the second set of parameters. 1. A text-to-speech method configured to output speech having a selected speaker voice and a selected speaker attribute ,said method comprising:inputting text;dividing said inputted text into a sequence of acoustic units;selecting a speaker for the inputted text;selecting a speaker attribute for the inputted text;converting said sequence of acoustic units to a sequence of speech vectors using an acoustic model; andoutputting said sequence of speech vectors as audio with said selected speaker voice and a selected speaker attribute,wherein said acoustic model comprises a first set of parameters relating to speaker voice and a second set of parameters relating to speaker attributes, wherein the first and second set of parameters do not overlap, and wherein selecting a speaker voice comprises selecting parameters from the first set of parameters which give the speaker voice and selecting the speaker attribute comprises selecting the parameters from the second set which give the selected speaker attribute.2. A method according to claim 1 , wherein there are a plurality of sets of ...

Подробнее
10-10-2013 дата публикации

SPEECH SYNTHESIS SYSTEM, SPEECH SYNTHESIS PROGRAM PRODUCT, AND SPEECH SYNTHESIS METHOD

Номер: US20130268275A1
Принадлежит:

Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values. 1. At least one computer-readable storage device encoded with a speech synthesis program which causes a system for synthesizing speech from text to perform:determining a first speech segment sequence corresponding to an input text, by selecting speech segments from a speech segment database according to a first cost calculated based at least in part on a statistical model of prosody variations;determining prosody modification values for the first speech segment sequence, after the first speech segment sequence is selected, by using a second cost calculated based at least in part on the statistical model of prosody variations, wherein the first cost is different from the second cost; andapplying the determined prosody modification values to the first speech segment sequence to produce a second speech segment sequence whose prosodic characteristics are different from prosodic characteristics of the first speech segment sequence.2. The at least one computer readable storage device of claim 1 , wherein the first cost for determining the first speech segment sequence includes a spectrum continuity cost claim 1 , a duration error cost claim 1 , a ...

Подробнее
17-10-2013 дата публикации

Hands-Free List-Reading by Intelligent Automated Assistant

Номер: US20130275138A1
Принадлежит:

Systems and methods for providing hands-free reading of content comprising: identifying a plurality of data items for presentation to a user, the plurality of data items associated with a domain-specific item type and sorted according to a particular order; based on the domain-specific item type, generating a speech-based overview of the plurality of data items; for each of the plurality of data items, generating a respective speech-based, item-specific paraphrase for the data item based on respective content of the data item; and providing, to a user through the speech-enabled dialogue interface, the speech-based overview, followed by the respective speech-based, item-specific paraphrases for at least a subset of the plurality of data items in the particular order. 1. A method for providing information through a speech-enabled dialogue interface , comprising:identifying a plurality of data items for presentation to a user, the plurality of data items associated with a domain-specific item type and sorted according to a particular order;based on the domain-specific item type, generating a speech-based overview of the plurality of data items;for each of the plurality of data items, generating a respective speech-based, item-specific paraphrase for the data item based on respective content of the data item; andproviding, to a user through the speech-enabled dialogue interface, the speech-based overview, followed by the respective speech-based, item-specific paraphrases for at least a subset of the plurality of data items in the particular order.2. The method of claim 1 , further comprising:while providing the respective speech-based, item-specific paraphrases, inserting a pause between each pair of adjacent speech-based, item-specific paraphrases; andentering a listening mode to capture user input during the pause.3. The method of claim 1 , further comprising:while providing the respective speech-based, item-specific paraphrases in a sequential order, advancing a ...

Подробнее
24-10-2013 дата публикации

Vehicle-Based Message Control Using Cellular IP

Номер: US20130282375A1
Принадлежит:

Architecture for playing back personal text-based messages such as email and voicemail over a vehicle-based media system. The user can use a cell phone that registers over a cellular network to an IMS (IP multimedia subsystem) to obtain an associated IP address. The personal messages are then converted into audio signals using a remote text-to-voice (TTV) converter and transmitted to the phone based on the IP address. The phone then transmits the audio signals to the vehicle media system for playback using an unlicensed wireless technology (e.g., Bluetooth, Wi-Fi, etc.). Other alternative embodiments include transmitting converted message directly to the media system, via a satellite channel, converting the messages via a TTV converter on the cell phone, and streaming the converted messages to the phone and/or the media system for playback. 1. A method , comprising:receiving, by a text-to-audio converter component, from an internet protocol multimedia system being distinct from and in communication with the text-to-audio converter component, a registration request relating to a mobile communication device, the registration request being received following an association, by the internet protocol multimedia system, of an internet protocol address with the mobile communication device;associating the internet protocol address with the text-to-audio converter component;receiving, by the text-to-audio converter component, a text-based communication from the internet protocol multimedia system; andconverting, by the text-to-audio converter component, the text-based communication to an audio message.2. The method of claim 1 , wherein the text-to-audio converter component is included in the mobile communication device.3. The method of claim 2 , further comprising transmitting the audio message to a vehicle media system.4. The method of claim 1 , wherein the text-to-audio converter component is remote to the mobile communication device.5. The method of claim 4 , further ...

Подробнее
24-10-2013 дата публикации

FILE FORMAT, SERVER, VIEWER DEVICE FOR DIGITAL COMIC, DIGITAL COMIC GENERATION DEVICE

Номер: US20130282376A1
Автор: NONAKA Shunichiro
Принадлежит:

A viewer device for a digital comic comprising: an information acquisition unit that acquires a digital comic in a file format for a digital comic viewed on a viewer device, the file format including speech balloon information including information of a speech balloon region that indicates a region of a speech balloon, first text information indicating a dialogue within each speech balloon, the first text information being correlated with each speech balloon, and first display control information including positional information and a transition order of a anchor point so as to enable the image of the entire page to be viewed on a monitor of the viewer device in a scroll view; and a voice reproduction section that synthesizes a voice for reading the letter corresponding to the text information based on an attribute of the character, an attribute of the speech balloon or the dialogue, and outputs the voice. 1. A viewer device for a digital comic comprising:an information acquisition unit that acquires a digital comic in a file format for a digital comic viewed on a viewer device, the file format includinga high-definition image of an entire page for each page of a comic,speech balloon information including information of a speech balloon region that indicates a region of a speech balloon in which a dialogue of a character of the comic is placed within the image,first text information indicating a dialogue within each speech balloon, the first text information being correlated with each speech balloon, andfirst display control information including positional information and a transition order of a predetermined anchor point so as to enable the image of the entire page to be viewed on a monitor of the viewer device in a scroll view;a display unit;an image display control unit that scroll-reproduces or panel-reproduces the image of each page on a screen of the display unit based on the display control information of the acquired digital comic;a letter display control ...

Подробнее
31-10-2013 дата публикации

Realistic Speech Synthesis System

Номер: US20130289998A1
Принадлежит: SRC Inc

A system and method for realistic speech synthesis which converts text into synthetic human speech with qualities appropriate to the context such as the language and dialect of the speaker, as well as expanding a speaker's phonetic inventory to produce more natural sounding speech.

Подробнее
14-11-2013 дата публикации

SYSTEM AND METHOD FOR AUDIBLY PRESENTING SELECTED TEXT

Номер: US20130304474A1
Принадлежит:

Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user. 1. A method comprising:displaying, via a processor, text via a touch-sensitive display;receiving, from the touch-sensitive display, input identifying a portion of the text; andaudibly presenting the portion of the text.2. The method of claim 1 , wherein receiving the input further comprises receiving non-contiguous separate touches on the touch-sensitive display claim 1 , wherein the non-contiguous separate touches indicate a number of paragraphs of the text to be audibly presented as the portion of the text.3. The method of claim 1 , wherein the input comprises data associated with a first tap at a first location and a second tap at a second location claim 1 , and the portion of the text is identified as text displayed between the first location and the second location.4. The method of claim 1 , wherein audibly presenting the portion of the text occurs via a speaker associated with the touch-sensitive display.5. The method of claim 1 , wherein the touch-sensitive display is part of a mobile phone.6. The method of claim 1 , wherein audibly presenting the portion of the text comprises communicating pre-recorded phonemes combined together.7. The method of claim 1 , wherein the input further comprises an area of the touch-sensitive display indicated by user touch.8. A system comprising:a processor; anda computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising ...

Подробнее
21-11-2013 дата публикации

Electronic Apparatus

Номер: US20130311187A1
Автор: Nakamae Midori
Принадлежит: KABUSHIKI KAISHA TOSHIBA

An electronic apparatus comprises a storage module, a manipulation module, a voice output control module, and a display module. The storage module configured to store book data. The manipulation module is configured to convert a manipulation of a user into an electrical signal while the voice output control module configured to reproduce a voice by reading the book data in the storage module based on the manipulation, and the display module is configured to display the book data. When it is determined that a part to be reproduced includes an illustration or a figure, the user is urged to view the display module and the illustration or the figure is displayed at the display module. 1. An electronic apparatus comprising:a storage module configured to store book data;a manipulation module configured to convert a manipulation of a user into an electrical signal;a voice output control module configured to reproduce a voice by reading the book data in the storage module based on the manipulation; anda display module configured to display the book data,wherein when it is determined that a part to be reproduced includes an illustration or a figure, the user is urged to view the display module and the illustration or the figure is displayed at the display module.2. The electronic apparatus of claim 1 , wherein when it is determined that the user is not viewing the display module during voice reproduction of the book data claim 1 , the user is urged to view the display module and the illustration or the figure is displayed at the display module.3. The electronic apparatus of claim 1 , further comprising:a control module, configured to store, in the storage module, a position of voice reproduction of the book data by the voice output control module, and to synchronize the position of the voice reproduction with a reproduction position in the book data.4. The electronic apparatus of claim 1 , further comprising:a control module;wherein a reproduction part in the book data is ...

Подробнее
21-11-2013 дата публикации

Text-to-speech device, speech output device, speech output system, text-to-speech methods, and speech output method

Номер: US20130311188A1
Автор: ADACHI Takuma
Принадлежит: Panasonic Corporation

An audio read-out device comprises an audio signal generator, a first information receiver, a first information transmitter, a first controller, and a mixed audio signal generator, and when the first information receiver receives audio output enablement information indicating that audio output is disabled, the first controller causes the mixed audio signal generator to generate a mixed audio signal composed of a broadcast audio signal and causes the first information transmitter to transmit the mixed audio signal until the first information receiver receives audio output enablement information indicating that audio output is enabled; and when the first information receiver receives audio output enablement information indicating that audio output is enabled, the first controller causes the mixed audio signal generator to generate a mixed audio signal obtained by mixing a read-out audio signal and a broadcast audio signal, and causes the first information transmitter to transmit the mixed audio signal. 1. An audio read-out device connected via a network to an audio output device that outputs a read-out audio signal , the audio read-out device comprising:an audio signal generator configured to generate the read-out audio signal from text information;a first information receiver configured to receive audio output enablement information from the audio output device via the network;a first information transmitter configured to transmit the read-out audio signal generated by the audio signal generator, to the audio output device via the network;a first controller configured to, when the first information receiver receives audio output enablement information indicating that audio output is disabled, cause the first information transmitter to wait to transmit the read-out audio signal until the first information receiver receives audio output enablement information indicating that audio output is enabled, and to, when the first information receiver receives audio output ...

Подробнее
12-12-2013 дата публикации

Method and System for Enhancing a Speech Database

Номер: US20130332169A1
Принадлежит: AT&T INTELLECTUAL PROPERTY II, L.P.

A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis. 1. A method comprising:receiving text as part of a text-to-speech process; identifying primary speech segments in the primary speech database which do not meet a need of the text-to-speech process, wherein the primary speech segments comprise one of half-phones, half-phonemes, demi-syllables, and polyphones;', 'identifying replacement speech segments which satisfy the need in a secondary speech database; and', 'enhancing the primary speech database by substituting, in the primary database, the primary speech segments with the replacement speech segments; and, 'selecting a speech segment associated with the text, wherein the speech segment is selected from a primary speech database which has been modified bygenerating speech corresponding to the text using the speech segment.2. The method of claim 1 , wherein the need is based on one of dialect differences claim 1 , geographic language differences claim 1 , regional language differences claim 1 , accent differences claim 1 , national language differences claim 1 , idiosyncratic speech differences claim 1 , and database coverage differences.3. The method of claim 1 , wherein the primary speech segments are one of diphones claim 1 , triphones claim 1 , and phonemes.4. The method of claim 1 , wherein the primary speech database has been further modified by identifying boundaries of the primary speech segments.5. ...

Подробнее
26-12-2013 дата публикации

DEVICE FOR AIDING COMMUNICATION IN THE AERONAUTICAL DOMAIN

Номер: US20130346081A1
Принадлежит: Airbus (SAS)

The device () comprises means () for recording audio messages corresponding to all the incoming and outgoing audio communications, means () for transcribing, in real time, each of said audio messages into a textual message, means () for displaying, on at least one screen (), each textual message thus generated, and means able to play back any recorded audio message. 2. The device as claimed in claim 1 , playback means which are able to carry out audio playback of any recorded audio message; and', 'activation means able to be activated by an operator to identify and trigger the playback by the playback means of a recorded audio message., 'which comprises moreover3. The device as claimed in claim 2 ,wherein said activation means comprise, for each textual message displayed on the screen, an associated sensitive area which is displayed jointly with the textual message with which it is associated, and which is able to be activated so as to trigger the playback of the audio message corresponding to said textual message.4. The device as claimed in claim 1 ,which comprises means allowing an operator to copy at least part of a displayed textual message and to transmit it to a system of the aircraft.5. The device as claimed in claim 1 ,which comprises moreover means for determining the time of emission of each audio message, and in that said display means display moreover on said screen, jointly with the textual message with which it is associated, the corresponding emission time.6. The device as claimed in claim 1 ,which comprises means making it possible to access an automatic terminal information service, to transcribe into textual messages all the audio messages emitted by this service, to display said textual messages on said screen, and to play back any reordered audio message that may be listened to on request.7. The device as claimed in claim 1 ,wherein said screen of the display means represents a dedicated single graphical interface.8. The device as claimed in ...

Подробнее
02-01-2014 дата публикации

SYSTEM AND METHOD FOR DYNAMICALLY INTERACTING WITH A MOBILE COMMUNICATION DEVICE

Номер: US20140006032A1
Автор: Korn Jeffrey
Принадлежит:

Audio presentations of a media content delivered onto a device are interrupted using commands not otherwise known to or programmed in to a messaging application used to present the content to the user of the device. In one embodiment, an electronic message having textual content is received at the mobile device, where it is translated into an audio stream, and presented (i.e., played back) to the user of the device within the messaging application. The user provides, and the application receives a string of identical user commands that are not specifically defined or programmed in as commands within the messaging application, and playback of the audio stream is modified according to the received string of user commands. 1. A method for interrupting a presentation of a message delivered onto a mobile device , the method comprising the steps of:receiving an incoming electronic message at the mobile device, the incoming message comprising textual content;translating the textual content to an audio stream;initiating playback of the audio stream within a messaging application;receiving, by the messaging application, a string of substantially identical user commands, the commands not being specifically defined as commands within the messaging application; andtriggering interruption of playback of the audio stream based on receiving the string of user commands.2. The method of wherein the electronic message comprises one of an electronic mail message claim 1 , a text message claim 1 , an SMS message claim 1 , a news story claim 1 , a broadcast message claim 1 , a calendar event description claim 1 , a web page claim 1 , a web-based article claim 1 , a web log (blog) claim 1 , a weather report claim 1 , a digital text document claim 1 , a task from a task list claim 1 , or other structured electronic content.3. The method of wherein the string of identical user commands comprise a repetition of utterances.4. The method of wherein the utterances are monosyllabic.5. The ...

Подробнее
09-01-2014 дата публикации

METHOD AND APPARATUS FOR RECORDING AND PLAYING USER VOICE IN MOBILE TERMINAL

Номер: US20140012583A1
Автор: KWAK Byeonghoon, MOK Jieun
Принадлежит:

A method and an apparatus for recording and playing a user voice in a mobile terminal are provided. The method for recording and storing a user voice in a mobile terminal includes entering a page by executing an electronic book, identifying whether a user voice record file related to the page exists, generating a user voice record file related to the page by recording a text included in the page to a user voice if the user voice record file does not exist, and playing by synchronizing the user voice stored in the user voice record file with the text if the user voice record file exists. Accordingly, a user voice can be recorded corresponding to a text of a page when recording a specific record of an electronic book, and the text corresponding to the user voice being played can be highlighted by synchronizing the user voice and the text. 1. A method for recording and playing a user voice in a mobile terminal , the method comprising:entering a page by executing an electronic book;identifying whether a user voice record file related to the page exists;generating a user voice record file related to the page by recording a text included in the page to a user voice if the user voice record file does not exist; andplaying by synchronizing the user voice stored in the user voice record file with the text if the user voice record file exists.2. The method of claim 1 , wherein the generating of the user voice record file comprises recording a text included in the page to a user voice and a synchronization file including text location information corresponding to each time section of the user voice record file.3. The method of claim 1 , wherein the generating of the user voice record file further comprises:identifying whether a touch input corresponding to a text location is detected if a record command for the text is received; andstarting to record a user voice if the touch input is not detected within a predetermined time elapse.4. The method of claim 3 , further comprising ...

Подробнее
16-01-2014 дата публикации

ELECTRONIC DEVICE, INFORMATION PROCESSING APPARATUS,AND METHOD FOR CONTROLLING THE SAME

Номер: US20140019136A1
Автор: Tanaka Tomonori
Принадлежит:

The present invention provides a technology for enabling a natural voice reproduction in which, depending on a gazed character position, a position of a voice output character follows but not excessively reacts with the gazed character position. Therefore, in an electronic device provided with a display unit for displaying text on a screen, a voice outputting unit for outputting the text as voice, and a sight-line detection unit for detecting a sight-line direction of a user, a control unit changes a starting position at which a voice outputting unit starts voice output if a distance between the position of the current output character and the position of the current gazed character is a preset threshold or more. 1. An electronic device comprising:a display unit configured to display text on a screen;a voice outputting unit configured to output the text as voice;a sight-line detection unit configured to detect a sight-line direction of a user; anda control unit configured, assuming that a position of a character that the voice outputting unit is currently outputting as voice is defined as a position of a current output character, and a position of a character in the text that is present in the sight-line direction of the user detected by the sight-line detection unit is defined as a position of a current gazed character, to change a starting position at which the voice outputting unit starts voice output depending on a distance between the position of the current output character and the position of the current gazed character,the control unit including:a determination unit configured to determine whether or not the distance between the position of the current output character and the position of the current gazed character is a preset threshold or more; anda setting unit configured, if the determination unit determined that the distance is the threshold or more, to set the position of the current gazed character to the starting position at which the voice ...

Подробнее
16-01-2014 дата публикации

Training and Applying Prosody Models

Номер: US20140019138A1
Автор: Jr. James H., Stephens
Принадлежит: MORPHISM LLC

Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles. 114-. (canceled)15. A computer-implementable method for synthesizing audible speech , with varying prosody , from textual content , the method comprising:generating texts annotated with prosody information generated from audio using a speech recognition engine that performs the annotation during its operation;training prosody models with lexicons based on first segments of the texts with the prosody information;maintaining an inventory of the prosody models with lexicons, selecting a subset of multiple prosody models from the inventory of prosody models;associating prosody models in the subset of multiple prosody models with second segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models;applying the associated prosody models to one of the second segments of the text to produce prosody annotations for the text;updating the associated prosody models' lexicons based on the phrases in the second segments of text;analyzing annotations of the prosody annotations to reconcile conflicting prosody annotations previously produced by multiple prosody models associated with the second segments of text; andsynthesizing audible speech from the second segments of text and the reconciled prosody annotations.16. The method of claim 15 , wherein the prosody information comprises directives related to pitch claim 15 , rate claim 15 , and volume of the audio as measured by the speech recognition engine.17. The method of claim 16 , wherein the reconciliation of conflicting prosody ...

Подробнее
23-01-2014 дата публикации

Voice Outputting Method, Voice Interaction Method and Electronic Device

Номер: US20140025383A1
Принадлежит:

A voice outputting method, a voice interaction method and an electronic device are described The method includes acquiring a first content to be output; analyzing the first content to acquire a first emotion information for expressing the emotion carried by the first content to be output; acquiring a first voice data to be output corresponding to the first content; processing the first voice data to be output based on the first emotion information to generate a second voice data to be output with a second emotion information, wherein the second emotion information is used to express the emotion of the electronic device outputting the second voice data to be output to enable the user to acquire the emotion of the electronic device, and wherein the first and the second emotion information are matched to and/or correlated to each other; outputting the second voice data to be output. 1. A voice output method applied in an electronic device , characterized in that , the method comprises:acquiring a first content to be output;analyzing the first content to be output to acquire a first emotion information for expressing the emotion carried by the first content to be output;acquiring a first voice data to be output corresponding to the first content to be output;processing the first voice data to be output based on the first emotion information to generate a second voice data to be output with a second emotion information, wherein the second emotion information is used to express the emotion of the electronic device outputting the second voice data to be output to enable the user to acquire the emotion of the electronic device, and wherein the first emotion information and the second emotion information are matched to/correlated to each other;outputting the second voice data to be output.2. The method according to claim 1 , characterized in that claim 1 , acquiring a first content to be output is:acquiring the voice data received via a instant message application;acquiring ...

Подробнее
20-02-2014 дата публикации

PROSODY EDITING APPARATUS AND METHOD

Номер: US20140052446A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to one embodiment, a prosody editing apparatus includes a storage, a first selection unit, a search unit, a normalization unit, a mapping unit, a display, a second selection unit, a restoring unit and a replacing unit. The search unit searches the storage for one or more second prosodic patterns corresponding to attribute information that matches attribute information of the selected phrase. The mapping maps each of the normalized second prosodic patterns on a low-dimensional space. The restoring unit restores a restored prosodic pattern according to the selected coordinates. The replacing unit replaces prosody of synthetic speech generated based on the selected phrase by the restored prosodic pattern. 1. A prosody editing apparatus comprising:a storage configured to store attribute information items of phrases and one or more first prosodic patterns corresponding to each of the attribute information items of the phrases, the attribute information items each indicating an attribute associated with a phrase, the first prosodic patterns each including parameters which indicate a prosody type of the phrase and expresses prosody of the phrase, the parameters each including elements not less than the number of phonemes of the phrase;a first selection unit configured to select a phrase including phonemes from text to obtain a selected phrase;a search unit configured to search the storage for one or more second prosodic patterns corresponding to an attribute information item that matches an attribute information item of the selected phrase to obtain as a prosodic pattern set, the second prosodic patterns being included in the first prosodic patterns;a normalization unit configured to normalize the second prosodic patterns respectively;a mapping unit configured to map each of the normalized second prosodic patterns on a low-dimensional space represented by one or more coordinates smaller than the number of the elements to generate mapping coordinates;a display ...

Подробнее
20-02-2014 дата публикации

SPEECH SYNTHESIS APPARATUS, METHOD, AND COMPUTER-READABLE MEDIUM

Номер: US20140052447A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to one embodiment, a speech synthesis apparatus is provided with generation, normalization, interpolation and synthesis units. The generation unit generates a first parameter using a prosodic control dictionary of a target speaker and one or more second parameters using a prosodic control dictionary of one or more standard speakers based on language information for an input text. The normalization unit normalizes the one or more second parameters based a normalization parameter. The interpolation unit interpolates the first parameter and the one or more normalized second parameters based on weight information to generate a third parameter and the synthesis unit generates synthesized speech using the third parameter. 1. A speech synthesis apparatus comprising:a text analysis unit configured to analyze an input text and output language information;a dictionary storage unit configured to store a first prosodic control dictionary of a target speaker and a second prosodic control dictionary of one standard speaker or each of a plurality of standard speakers;a prosodic parameter generation unit configured to generate a first prosodic parameter using the first prosodic control dictionary and generate one or a plurality of second prosodic parameters using the second prosodic control dictionary, based on the language information;a normalization unit configured to normalize the one or the plurality of second prosodic parameters based a normalization parameter;a prosodic parameter interpolation unit configured to interpolate the first prosodic parameter and the one or the plurality of normalized second prosodic parameters based on weight information to generate a third prosodic parameter; anda speech synthesis unit configured to generate synthesized speech in accordance with the third prosodic parameter.2. The apparatus according to claim 1 , further comprising a normalization parameter generation unit configured to generate the normalization parameter based on the ...

Подробнее
27-02-2014 дата публикации

SYSTEM FOR TUNING SYNTHESIZED SPEECH

Номер: US20140058734A1
Принадлежит:

An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and/or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech, including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats. 1. A method of tuning synthesized speech , comprising:synthesizing, by a text-to-speech engine, user supplied text to produce synthesized speech;receiving, by the text-to-speech engine, a user indication of segments of the user supplied text and/or the synthesized speech to skip during re-synthesis of the speech; andre-synthesizing, by the text-to-speech engine, the speech based on the user indicated segments to skip.2. A method of tuning synthesized speech as defined in claim 1 , further comprising receiving a user modification of duration cost factors associated with the synthesized speech to change the duration of the synthesized speech claim 1 , wherein re-synthesizing the speech includes re-synthesizing the speech based on the user modified duration cost factors.3. A method of tuning synthesized speech as defined in claim 2 , wherein receiving a user modification of duration cost factors includes modifying a search of speech units when the user supplied text is re-synthesized to favor shorter speech units in response to user marking of any speech units in the synthesized speech as too long and modifying the search of speech units to favor longer speech units in response to user marking of any speech units in the synthesized speech as too short.4. A method of tuning synthesized speech as defined in claim 1 , further comprising receiving a user modification of pitch cost factors associated with the synthesized speech ...

Подробнее
06-03-2014 дата публикации

SEGMENT INFORMATION GENERATION DEVICE, SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Номер: US20140067396A1
Автор: Kato Masanori
Принадлежит:

A segment information generation device includes a waveform cutout unit cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech. A feature parameter extraction unit extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit. A time domain waveform generation unit generates a time domain waveform based on the feature parameter. 1. A segment information generation device comprising:a waveform cutout unit that cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech;a feature parameter extraction unit that extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit; anda time domain waveform generation unit that generates a time domain waveform based on the feature parameter.2. The segment information generation device according to claim 1 , comprising:a period control unit that determines a time period to cut out a speech waveform from natural speech based on attribute information of the natural speech.3. The segment information generation device according to claim 1 , comprising:a spectrum shape change degree estimation unit that estimates a degree of change in spectrum shape indicating a degree of change in spectrum shape of natural speech; anda period control unit that determines a time period to cut out a speech waveform from the natural speech based on the degree of change in spectrum shape.4. The segment information generation device according to claim 3 , wherein when a degree of change in spectrum shape is determined to be small claim 3 , the period control unit sets a time period to cut out a speech waveform from natural speech to be longer than a time period during normal time.5. The segment information generation device according to claim 3 , wherein when a degree of change in spectrum shape is determined to be large claim 3 , ...

Подробнее
06-03-2014 дата публикации

METHOD AND SYSTEM FOR REPRODUCTION OF DIGITAL CONTENT

Номер: US20140067399A1
Принадлежит: MATOPY LIMITED

The present invention relates to a method and system of aurally reproducing visually structured content by associating specific audio formatting elements with visual formatting elements of the content. A method and system for reproducing visually structured content by associating abstract visual elements with visual formatting elements of the content is also described. 1. A method of aurally reproducing visually structured content by associating specific audio formatting elements with visual formatting elements of the content.2. A method as claimed in including the step of aurally reproducing the content using the associated audio formatting elements.3. A method as claimed in wherein aural reproduction of the content include layering of audio related to multiple audio formatting element types.4. A method as claimed in wherein the audio formatting element types include background music claim 3 , voice claim 3 , sound effect claim 3 , and audio effect.5. A method as claimed in wherein a processor associates the audio formatting elements with visual formatting elements in accordance with a set of rules.6. A method as claimed in wherein audio formatting elements are associated with visual formatting elements in accordance with a scoring method.7. A method as claimed in wherein elements of content are ordered in accordance with a score assigned to each element using a scoring method8. A method as claimed in wherein the scoring method including the step of calculating a score for each element of content using attributes of one or more visual formatting elements associated with that element of content.9. A method as claimed in including the step of receiving input during aural reproduction to navigate within the content.10. A method as claimed in wherein the input specifies navigation to different portions of the aurally reproduced content based upon visual formatting elements.11. A method as claimed in wherein the input is a single user action.12. A method as claimed in ...

Подробнее
06-03-2014 дата публикации

Phonetic information generating device, vehicle-mounted information device, and database generation method

Номер: US20140067400A1
Автор: Michihiro Yamazaki
Принадлежит: Mitsubishi Electric Corp

In a word string information DB, when phonetic information automatically generated from written notation information matches regular phonetic information, only the written notation information is registered, or, when the phonetic information automatically generated does not match the regular phonetic information, the written notation information and the regular phonetic information are registered. A word string information retrieving unit 2 retrieves information of a word string matching an input character string from the word string information DB, and, when regular phonetic information is not registered for the word string, a phonetic information generation determining unit 3 causes a phonetic information generating unit 4 to generate phonetic information and output this phonetic information to outside a phonetic information generating device, or, when the regular phonetic information is registered for the word string, outputs the regular phonetic information to outside the phonetic information generating device from a phonetic information output unit 5.

Подробнее
27-03-2014 дата публикации

METHOD AND APPARATUS FOR PROVIDING SIGN INFORMATION

Номер: US20140086491A1

A method and apparatus for proving sign information are disclosed. The sign information providing method includes: extracting a first sign from an input image, wherein the first sign is pre-defined; extracting a second sign representing information corresponding to the first sign around the location of the first sign, from the input image; and providing at least one piece of information of information about the first sign and information about the second sign in the form of voice. Accordingly, a user may correctly recognize information expressed by a sign. 1. A sign information providing method comprising:extracting a first sign from an input image, wherein the first sign is pre-defined;extracting a second sign representing information corresponding to the first sign around the location of the first sign, from the input image; andproviding at least one piece of information of information about the first sign and information about the second sign in the form of voice.2. The sign information providing method of claim 1 , wherein the extracting of the first sign comprises:extracting a first area at which the first sign is located;removing a noise signal included in the first area;extracting feature information from the first area from which the noise signal has been removed, and recognizing a plurality of first sign candidates based on the feature information; andextracting the first sign satisfying an authentication algorithm from among the plurality of first sign candidates.3. The sign information providing method of claim 2 , wherein the extracting of the first sign satisfying the authentication algorithm comprises extracting the first sign satisfying at least one of a Support Vector Machine (SVM)-based authentication algorithm and a Hidden Markov Model (HMM)-based authentication algorithm.4. The sign information providing method of claim 1 , wherein the extracting of the second sign from the input image comprises:extracting a plurality of second area candidates at ...

Подробнее
27-03-2014 дата публикации

AUTOMATED METHOD AND SYSTEM FOR OBTAINING USER-SELECTED INFORMATION ON A MOBILE COMMUNICATION DEVICE

Номер: US20140088969A1
Принадлежит: Verna IP Holdings, LLC

A customized live the application module can be configured in association with the mobile communication device in order to automatically vocalize the information preselected by a user in a multitude of languages. A text-to-speech application module can be integrated with the customized live tile application module to automatically vocalize the preselected information. The information can be obtained from a tile and/or a website integrated with a remote server and announced after a text to speech conversion process without opening the tile, if the tiles are selected for announcement of information by the device. The information can be obtained in real-time. Such an approach automatically and instantly pushes a vocal alert with respect to the user-selected information on the mobile communication device thereby permitting the user to continue multitasking. Information from tiles can also be rendered on second screens from a mobile device. 1. A method for obtaining user-selected information on mobile communication devices from active tiles displayed on the mobile devices and selected by users , comprising:associating a customized live tile application module with a first mobile communication device in order to selectively provide information preselected by a user for rendering as at least one of speech to a speaker integrated in the first mobile communication device, speech to a speaker wirelessly connected to the first mobile communication device, images on a flat panel display that is wirelessly connected to the first mobile communication device, images on a second mobile communication device that is wirelessly connected to the first mobile communication device, or speech on the second mobile communication device that is wirelessly connected to the first mobile communication device;obtaining said information with said first mobile communication device from a live tile displayed on the mobile device and selected by the user by retrieving said tile information from a ...

Подробнее
27-03-2014 дата публикации

METHOD AND DEVICE FOR USER INTERFACE

Номер: US20140088970A1
Автор: KANG Donghyun
Принадлежит: LG ELECTRONICS INC.

A method for user interface according to one embodiment of the present invention comprises the steps of: displaying text on a screen; receiving a character selection command of a user who selects at least one character included in a text, receiving a speech command of a user who designates a selected range in the text including at least one character, specifying the selected range according to the character selection command and the speech command; and a step for receiving an editing command of a user for the selected range. 1. A user interface method comprising:displaying a text on a screen;receiving a character selection command for selecting at least one character included in the text from a user;receiving a speech command for designating a selected range of the text including the at least one character from the user;specifying the selected range according to the character selection command and the speech command; andreceiving an editing command for the selected range from the user.2. The user interface method according to claim 1 , wherein the selected range corresponds to a word claim 1 , phrase claim 1 , sentence claim 1 , paragraph or page including the at least one character.3. The user interface method according to claim 1 , wherein the editing command corresponds to one of a copy command claim 1 , a cut command claim 1 , an edit command claim 1 , a transmit command and a search command for the selected range of the text.4. The user interface method according to claim 1 , wherein the character selection command is received through a touch gesture of the user claim 1 , applied to the at least one character.5. The user interface method according to claim 1 , wherein the character selection command is received through movement of a cursor displayed on the screen.6. The user interface method according to claim 5 , wherein the cursor is moved by user input using a gesture claim 5 , keyboard claim 5 , mouse or wireless remote controller.7. The user interface ...

Подробнее
10-04-2014 дата публикации

DYNAMIC SPEECH AUGMENTATION OF MOBILE APPLICATIONS

Номер: US20140100852A1
Принадлежит: PeopleGo Inc.

Speech functionality is dynamically provided for one or more applications by a narrator application. A plurality of shared data items are received from the one or more applications, with each shared data item including text data that is to be presented to a user as speech. The text data is extracted from each shared data item to produce a plurality of playback data items. A text-to-speech algorithm is applied to the playback data items to produce a plurality of audio data items. The plurality of audio data items are played to the user. 1. A system that dynamically provides speech functionality to one or more applications , the system comprising:a narrator configured to receive a plurality of shared data items from the one or more applications, each shared data item comprising text data to be presented to a user as speech;an extractor, operably coupled to the narrator, configured to extract the text data from each shared data item, thereby producing a plurality of playback data items;a text-to-speech engine, operably coupled to the extractor, configured to apply a text-to-speech algorithm to the playback data items, thereby producing a plurality of audio data items;an inbox, operably coupled to the text-to-speech-engine, configured to store the plurality of audio data items and in indication of a playback order; anda media player, operably connected to the inbox, configured to play the plurality of audio data items in the playback order.2. The system of claim 1 , wherein extracting the text data comprises applying at least one technique selected from the group consisting of: tag block recognition claim 1 , image recognition on rendered documents claim 1 , and probabilistic block filtering.3. The system of claim 1 , wherein the extractor is further configured to apply one or more filters to the text data claim 1 , the one or more filters making the playback data items more suitable for application of the text-to-speech algorithm.4. The system of claim 3 , wherein the ...

Подробнее
06-01-2022 дата публикации

ELECTRONIC DEVICE AND OPERATION METHOD THEREOF

Номер: US20220005459A1
Автор: OH Junkwang, WOO Kyounggu
Принадлежит:

According to an embodiment, an electronic device may include: a display, a communication module comprising communication circuitry, a memory; and a processor operatively connected to the display, the communication module, and the memory. According to an embodiment, the memory may store instructions that, when executed, cause the processor to control the electronic device to: obtain unique information of an external electronic device and information associated with a user of the external electronic device through the communication module, generate a candidate group including at least one candidate based on the unique information and the information associated with the user of the external electronic device, provide an external server with at least part of the information associated with the user of the external electronic device and information associated with the candidate group, receive a reliability value indicating a degree of similarity between the information associated with the user of the external electronic device and the information associated with the candidate group, from the external server, and to display a user interface (UI) indicating authentication for the user of the external electronic device based on the reliability value on the display. 1. An electronic device comprising:a display;a communication module comprising communication circuitry;a memory; anda processor operatively connected to the display, the communication module, and the memory,wherein the memory stores instructions that, when executed, cause the processor to control the electronic device to:obtain unique information of an external electronic device and information associated with a user of the external electronic device through the communication module;generate a candidate group including at least one candidate based on the unique information and the information associated with the user of the external electronic device;provide an external server with at least part of the ...

Подробнее
05-01-2017 дата публикации

Transliteration work support device, transliteration work support method, and computer program product

Номер: US20170004822A1
Принадлежит: Toshiba Corp

According to an embodiment, a transliteration work support apparatus include an input unit, an extraction unit, a presentation unit, a reception unit, and a correction unit. The input unit receives document information. The extraction unit extracts, as a correction part, a surface expression of the document information that matches a correction pattern expressing a plurality of surface expressions having the same regularity in way of correction in one form. The presentation unit presents a way of correction defined in accordance with the correction pattern used in the extraction of the correction part. The reception unit receives selection of the way of correction. The correction unit corrects the correction part based on the selected way of correction.

Подробнее
04-01-2018 дата публикации

SYSTEM AND METHODS FOR NUTRITION MONITORING

Номер: US20180004913A1
Принадлежит:

An apparatus comprising a natural language processor, a mapper, a string comparator, a nutrient calculator, and a diet planning module, the diet planning module configured to generate a diet action control, the diet action control comprising instructions to operate the client device to perform a diet change recommendation on the client device, and apply the diet action control to the client device. 1. An apparatus comprising:a natural language processor to receive text from a client device and transform the text into a generated entity;a mapper to transform the generated entity into mapped data lists;a string comparator to transform the mapped data lists into a verified diet-specific control utilizing a nutrition control memory structure;a nutrient calculator to determine nutrition content from the verified diet-specific control;a diet planning module to generate a diet action control, the diet action control comprising instructions to operate the client device to perform a diet change recommendation on the client device and apply the diet action control to the client device; and receive a prompt activation signal from the natural language processor;', 'generate a prompt comprising instructions to operate the client device to display on a machine display of the client device an indication of a prompt item, the prompt item comprising an intent signal or a required entity;', 'receive an unstructured input, the unstructured input enabling the natural language processor to transform the text into the generated entity; and', 'send the prompt to the client device., 'a prompting module to2. The apparatus of claim 1 , further comprising a speech recognition module to:receive an audio from the client device;generate the text from the audio; andsend the text to the natural language processor.3. The apparatus of claim 1 , wherein the natural language processor comprises: compare the text to the intent signal in an intent signal control memory structure; and', 'generate the ...

Подробнее
07-01-2016 дата публикации

Voice Prompt Generation Combining Native and Remotely-Generated Speech Data

Номер: US20160005393A1
Принадлежит:

An electronic device includes a processor and a memory coupled to the processor. The memory stores instructions that, when executed by the processor, cause the processor to perform operations including determining whether a text prompt received from a wireless device corresponds to first synthesized speech data stored at the memory. The operations include, in response to a determination that the text prompt does not correspond to the first synthesized speech data, determining whether a network is accessible. The operations include, in response to a determination that the network is accessible, sending a text-to-speech (TTS) conversion request to a server via the network. The operation further include, in response to receiving second synthesized speech data from the server, storing the second synthesized speech data at the memory. 1. An electronic device comprising:a processor; and determining whether a text prompt received from a wireless device corresponds to first synthesized speech data stored at the memory;', 'in response to a determination that the text prompt does not correspond to the first synthesized speech data, determining whether a network is accessible;', 'in response to a determination that the network is accessible, sending a text-to-speech (TTS) conversion request to a server via the network; and', 'in response to receiving second synthesized speech data from the server, storing the second synthesized speech data at the memory., 'a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the processor to perform operations comprising2. The electronic device of claim 1 , wherein the operations further comprise determining whether the second synthesized speech data is received prior to expiration of a threshold time period.3. The electronic device of claim 2 , wherein the operations further comprise claim 2 , in response to a determination that the second synthesized speech data is received prior to ...

Подробнее
04-01-2018 дата публикации

EMOTION TYPE CLASSIFICATION FOR INTERACTIVE DIALOG SYSTEM

Номер: US20180005646A1
Автор: Leung Max, Un Edward
Принадлежит:

Techniques for selecting an emotion type code associated with semantic content in an interactive dialog system. In an aspect, fact or profile inputs are provided to an emotion classification algorithm, which selects an emotion type based on the specific combination of fact or profile inputs. The emotion classification algorithm may be rules-based or derived from machine learning. A previous user input may be further specified as input to the emotion classification algorithm. The techniques are especially applicable in mobile communications devices such as smartphones, wherein the fact or profile inputs may be derived from usage of the diverse function set of the device, including online access, text or voice communications, scheduling functions, etc. 1. An apparatus for an interactive dialog system , the apparatus comprising:a semantic content generation block configured to generate an output statement informationally responsive to a user dialog input, the output statement comprising a computer-generated object to be displayed on a display device;a classification block configured to select, based on at least one fact or profile input, an emotion type code to be imparted to the computer-generated object, the emotion type code specifying one of a plurality of predetermined emotion types; anda visual generation block configured to generate a digital image representation of the computer-generated object, the digital image representation generated to have the predetermined emotion type specified by the emotion type code;wherein the at least one fact or profile input is derived from usage of a mobile communications device implementing the interactive dialog system.2. The apparatus of claim 1 , the digital image representation comprising displayed text having different font or text size depending on the predetermined emotion type specified by the emotion type code.3. The apparatus of claim 1 , the digital image representation comprising an emoticon having the predetermined ...

Подробнее
02-01-2020 дата публикации

VOICE SYNTHESIS METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM

Номер: US20200005761A1
Автор: Yang Jie
Принадлежит:

Provided are a voice synthesis method, an apparatus, a device, and a storage medium, involving obtaining text information and determining characters in the text information and a text content of each of the characters; performing a character recognition on the text content of each of the characters, to determine character attribute information of each of the characters; obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, where the speakers are pre-stored pronunciation object having the character attribute information; and generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information. These improve pronunciation diversities of different characters in the synthesized voices, improve an audience's discrimination between different characters in the synthesized voices, and thereby improve experience of a user. 1. A voice synthesis method , comprising:obtaining text information and determining characters in the text information and a text content of each of the characters;performing a character recognition on the text content of each of the characters, to determine character attribute information of the each of the characters;obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, wherein the speakers are pre-stored speakers having the character attribute information; andgenerating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information.2. The method according to claim 1 , wherein the character attribute information comprises a basic attribute claim 1 , and the basic attribute comprises at least one of a gender attribute and an age attribute;before the obtaining speakers in one-to-one correspondence with the characters ...

Подробнее
02-01-2020 дата публикации

ARTIFICIAL INTELLIGENCE (AI)-BASED VOICE SAMPLING APPARATUS AND METHOD FOR PROVIDING SPEECH STYLE

Номер: US20200005763A1
Принадлежит: LG ELECTRONICS INC.

Disclosed is an artificial intelligence (AI)-based voice sampling apparatus for providing a speech style, including a rhyme encoder configured to receive a user's voice, extract a voice sample, and analyze a vocal feature included in the voice sample, a text encoder configured to receive text for reflecting the vocal feature, a processor configured to classify the vocal feature of the voice sample input to the rhyme encoder according to a label, extract an embedding vector representing the vocal feature from the label, and generate a speech style from the embedding vector and apply the generated speech style to the text, and a rhyme decoder configured to output synthesized voice data in which the speech style is applied to the text by the processor. 1. An artificial intelligence (AI)-based voice sampling apparatus for providing a speech style , the apparatus comprising:a rhyme encoder configured to receive a user's voice to extract a voice sample, and analyze a vocal feature included in the voice sample;a text encoder configured to receive text for reflecting the vocal feature;a processor configured to classify the vocal feature of the voice sample input to the rhyme encoder according to a label, extract an embedding vector representing the vocal feature from the label, and generate a speech style from the embedding vector and apply the generated speech style to the text; anda rhyme decoder configured to output synthesized voice data in which the speech style is applied to the text by the processor.2. The apparatus of claim 1 , whereinthe rhyme encoder divides the voice sample by a predetermined label and extract an embedding vector for the label.3. The apparatus of claim 1 , whereinthe rhyme encoder extracts the embedding vector through a vocal feature including at least one of a speech rate, a pronunciation intonation, a pause interval, a pitch, or an intonation of the user included in the voice sample.4. The apparatus of claim 3 , whereinthe extracting of the ...

Подробнее
14-01-2016 дата публикации

SPEECH SYNTHESIS DICTIONARY CREATION DEVICE, SPEECH SYNTHESIZER, SPEECH SYNTHESIS DICTIONARY CREATION METHOD, AND COMPUTER PROGRAM PRODUCT

Номер: US20160012035A1
Принадлежит:

According to an embodiment, a device includes a table creator, an estimator, and a dictionary creator. The table creator is configured to create a table based on similarity between distributions of nodes of speech synthesis dictionaries of a specific speaker in respective first and second languages. The estimator is configured to estimate a matrix to transform the speech synthesis dictionary of the specific speaker in the first language to a speech synthesis dictionary of a target speaker in the first language, based on speech and a recorded text of the target speaker in the first language and the speech synthesis dictionary of the specific speaker in the first language. The dictionary creator is configured to create a speech synthesis dictionary of the target speaker in the second language, based on the table, the matrix, and the speech synthesis dictionary of the specific speaker in the second language. 1. A speech synthesis dictionary creation device comprising:a mapping table creator configured to create, based on similarity between distribution of nodes of a speech synthesis dictionary of a specific speaker in a first language and distribution of nodes of a speech synthesis dictionary of the specific speaker in a second language, a mapping table in which the distribution of nodes of the speech synthesis dictionary of the specific speaker in the first language is associated with the distribution of nodes of the speech synthesis dictionary of the specific speaker in the second language;an estimator configured to estimate a transformation matrix to transform the speech synthesis dictionary of the specific speaker in the first language to a speech synthesis dictionary of a target speaker in the first language, based on speech and a recorded text of the target speaker in the first language and the speech synthesis dictionary of the specific speaker in the first language; anda dictionary creator configured to create a speech synthesis dictionary of the target speaker ...

Подробнее
09-01-2020 дата публикации

ADAPTIVE TEXT-TO-SPEECH OUTPUTS

Номер: US20200013387A1
Принадлежит: Google LLC

In some implementations, a language proficiency of a user of a client device is determined by one or more computers. The one or more computers then determines a text segment for output by a text-to-speech module based on the determined language proficiency of the user. After determining the text segment for output, the one or more computers generates audio data including a synthesized utterance of the text segment. The audio data including the synthesized utterance of the text segment is then provided to the client device for output. 1. A method comprising: a voice query was input to the client device by the user; and', 'an indication of a language proficiency designated to the user, the language proficiency designated to the user comprising one of a first level of language proficiency or a second level of language proficiency different than the first level of language proficiency;, 'receiving, at data processing hardware, from a client device associated with a user, data indicating that a first text segment comprising first information responsive to the voice query when the language proficiency designated to the user comprises the first level of language proficiency; or', 'a second text segment comprising second information responsive to the voice query when the language proficiency designated to the user comprises the second level of language proficiency, wherein at least a portion of the second information of the second text segment is different than the first information of the first test segment; and, 'generating, by the data processing hardware, audio data comprising a synthesized utterance of a particular text segment responsive to the voice query and based on the language proficiency designated to the user, the particular text segment comprising one ofproviding, by the data processing hardware, the audio data to the client device associated with the user.2. The method of claim 1 , further comprising claim 1 , prior to generating the audio data comprising the ...

Подробнее
09-01-2020 дата публикации

SYSTEM AND METHOD FOR ASSISTING COMMUNICATION THROUGH PREDICTIVE SPEECH

Номер: US20200013410A1
Автор: Bond Michael
Принадлежит:

A system and method for assisting communication through predictive speech is provided. A database includes commonly used words, phrases, and images, each associated with at least one context cue. A processor is configured to determine the user's context and display a number of possible initial words, phrases, or images associated with the determined context. A text field is updated with selected words, phrases, or images. The words, phrases, or literal equivalents of the images are audibly transmitted. 1. A system for assisting communication through predictive speech comprising:a user device comprising a display;a database comprising words, phrases, and images, wherein each of the words, phrases, and images are associated with one or more context cues; and determine user context;', 'display a number of possible initial phrases;', 'monitor for user input selecting one of the number of initial phrases;', 'display, at the user device, the selected initial phrase at a text field;', 'query the database for words, phrases, or images associated with a context cue matching the determined user context;', 'display the returned words, phrases, or images at a predictive field at the user device;', 'monitor for user input selecting one or more of the displayed words, phrases, or images;', 'update the displayed text field to input the selected words, phrases, or images; and', 'audibly transmitting the words, phrases, or literal equivalents of the images in the displayed text., 'an electronic storage device comprising software instructions, which when executed by a processor, configure the user device to2. The system of wherein:the context cues comprise other words, phrases, or images commonly used by the user such that words, phrases, or images selected more often by the user are displayed within the predictive field.3. The system of wherein:the context cues comprise one or more complete sentences such that the words, phrases, or images displayed at the predictive field form ...

Подробнее
11-01-2018 дата публикации

MOBILE ELECTRONIC DEVICE AND OPERATION METHOD THEREFOR

Номер: US20180013882A1
Принадлежит:

An operation method for a mobile electronic device is provided. The operation method includes: transmitting a calling phone number to a wireless audio product from an operation system of the mobile electronic device via wireless communication, wherein the mobile electronic device is wirelessly connected to the wireless audio product; transmitting the calling phone number to an application software of the mobile electronic device by the wireless audio product; searching a caller name corresponding to the calling phone number by the application software of the mobile electronic device; transmitting the caller name to the wireless audio product by the application software of the mobile electronic device via wireless communication; and playing the caller name by the wireless audio product. 1. An operation method for a mobile electronic device , comprising:transmitting a calling phone number to a wireless audio product from an operation system of the mobile electronic device via wireless communication, wherein the mobile electronic device is wirelessly connected to the wireless audio product;transmitting the calling phone number to an application software of the mobile electronic device by the wireless audio product;searching a caller name corresponding to the calling phone number by the application software of the mobile electronic device;transmitting the caller name to the wireless audio product by the application software of the mobile electronic device via wireless communication; andplaying the caller name by the wireless audio product.2. The operation method for a mobile electronic device according to claim 1 , whereinthe operation system of the mobile electronic device transmits the calling phone number to the wireless audio product via Bluetooth 3.0 hands free profile (HFP), wherein the wireless audio product is a Bluetooth audio product; andthe wireless audio product transmits the calling phone number to the application software of the mobile electronic device ...

Подробнее
03-02-2022 дата публикации

ACCESSIBLE MULTIMEDIA CONTENT

Номер: US20220035853A1
Принадлежит:

A method of generating accessible content is described. Embodiments of the method identifies a plurality of channels for a multimedia communication session, generate a master timeline for the communication session, wherein the master timeline comprises a chronological ordering of events from each of the channels, and wherein each of the events is associated with event-specific audio data, and present the multimedia communication session to a user to enable the user to transition among the channels based on the master timeline. 1. A method for generating accessible content , comprising:identifying a plurality of channels for a multimedia communication session;generating a master timeline for the communication session, wherein the master timeline comprises a chronological ordering of events from each of the channels, and wherein each of the events is associated with event-specific audio data; andpresenting the multimedia communication session to a user to enable the user to transition among the channels based on the master timeline.2. The method of claim 1 , further comprising:identifying the events from each of the plurality of channels;generating event data for each of the events, wherein the event data includes the event-specific audio data and a start time of the event; andgenerating channel metadata for each of the channels based on the event data, wherein the master timeline is generated based on the channel metadata.3. The method of claim 2 , wherein:the event data further includes at least one of an event ID, an event name, and an event duration.4. The method of claim 1 , further comprising:converting visual data from one or more of the channels to the event-specific audio data.5. The method of claim 4 , wherein:the conversion of the visual data comprises a text-to-speech conversion.6. The method of claim 1 , further comprising:playing a first audio file associated with a default channel of the plurality of channels;identifying an event from a secondary ...

Подробнее
15-01-2015 дата публикации

Voice synthesis device

Номер: US20150019224A1
Принадлежит: Mitsubishi Electric Corp

A voice synthesis device according to the present invention regularly recognizes the contents of an utterance made by a passenger or the like, and specifies a word before abbreviation corresponding to an abbreviation included in a facility name or the like which is included in the utterance contents by using the facility name or the like. Therefore, the voice synthesis device can read the abbreviation out loud while preventing the passenger from being forced to perform a burdensome operation of, for example, registering the word before abbreviation corresponding to the abbreviation and using a reading method familiar to and appropriate for the passenger.

Подробнее
18-01-2018 дата публикации

SOUND CONTROL DEVICE, SOUND CONTROL METHOD, AND SOUND CONTROL PROGRAM

Номер: US20180018957A1
Принадлежит:

A sound control device includes: a detection unit that detects a first operation on an operator and a second operation on the operator, the second operation being performed after the first operation; and a control unit that causes output of a second sound to be started, in response to the second operation being detected. The control unit causes output of a first sound to be started before causing the output of the second sound to be started, in response to the first operation being detected. 1. A sound control device comprising:a detection unit that detects a first operation on an operator and a second operation on the operator, the second operation being performed after the first operation; anda control unit that causes output of a second sound to be started, in response to the second operation being detected,wherein the control unit causes output of a first sound to be started before causing the output of the second sound to be started, in response to the first operation being detected.2. The sound control device according to claim 1 ,wherein the operator accepts push-in by a user,the detection unit detects, as the first operation, that the operator has been pushed in by a first distance from a reference position, andthe detection unit detects, as the second operation, that the operator has been pushed in by a second distance from the reference position, the second distance being longer than the first distance.3. The sound control device according to claim 1 ,wherein the detection unit comprises a first and second sensors provided in the operator,the first sensor detects the first operation, andthe second sensor detects the second operation.4. The sound control device according to claim 1 , wherein the operator comprises a keyboard that accepts the first and second operations.5. The sound control device according to claim 1 , wherein the operator comprises a touch panel that accepts the first and second operations.6. The sound control device according to claim 1 , ...

Подробнее
21-01-2016 дата публикации

Method, Apparatus and System For Regenerating Voice Intonation In Automatically Dubbed Videos

Номер: US20160021334A1
Автор: Dvir Jacob, Rossano Boaz
Принадлежит:

A system and method for automatically dubbing a video in a first language into a second language, comprising: an audio/video pre-processor configured to provide separate original audio and video files of the same media; a text analysis unit configured to receive a first text file of the video's subtitles in the first language and a second text file of the video's sub-titles in the second language, and re-divide them into text sentences; a text-to-speech unit configured to receive the text sentences in the first and second languages from the text analysis unit and produce therefrom first and second standard TTS spoken sentences; a prosody unit configured to receive the first and second spoken sentences, the separated audio file and timing parameters and produce therefrom dubbing recommendations; and a dubbing unit configured to receive the second spoken sentence and the recommendations and produce therefrom an automatically dubbed sentence in the second language. 1. A system for automatically dubbing a video in a first language into a second language , comprising:a Text Analysis Unit configured to receive original subtitles text, timing data and target language selection and translate the subtitle into the target language;a TTS (Text To Speech) Generation Unit configured to generate a standard TTS audio of the translated subtitle text;a Prosody Analysis Unit configured to receive the timing of the TTS translated audio and the timing of the original subtitle and recommend adjustments to the final dubbed subtitle; anda Dubbing Unit configured to implement the recommendations on the TTS translated speech.2. A system for automatically dubbing a video in a first language into a second language , comprising:an audio/video pre-processor configured to provide separate original audio and video files of the same media;a text analysis unit configured to receive a first text file of the video's subtitles in the first language and a second text file of the video's subtitles in ...

Подробнее
17-01-2019 дата публикации

ADAPTIVE TEXT-TO-SPEECH OUTPUTS

Номер: US20190019501A1
Принадлежит: Google LLC

In some implementations, a language proficiency of a user of a client device is determined by one or more computers. The one or more computers then determines a text segment for output by a text-to-speech module based on the determined language proficiency of the user. After determining the text segment for output, the one or more computers generates audio data including a synthesized utterance of the text segment. The audio data including the synthesized utterance of the text segment is then provided to the client device for output. 1. A method comprising:determining, by data processing hardware, a user context of a user of a client device, the user context indicating a level of complexity of speech that the user is likely able to comprehend;determining, by the data processing hardware, a particular text segment for text-to-speech output to the user, the particular text segment having a complexity score indicating a corresponding level of complexity associated with the particular text segment;modifying, by the data processing hardware, the particular text segment for the text-to-speech output to the user based on the complexity score of the particular text segment and the selected user context;generating, by the data processing hardware, audio data comprising a synthesized utterance of the modified particular text segment; andproviding, by the data processing hardware, the audio data comprising the synthesized utterance of the modified particular text segment to the client device.2. The method of claim 1 , further comprising claim 1 , prior to determining the particular text segment for text-to-speech output to the user claim 1 , receiving claim 1 , at the data processing hardware claim 1 , data indicating a voice query detected by the client device claim 1 ,wherein determining the particular text segment for text-to-speech output comprises generating the particular text segment as a response to the voice query, andwherein providing the audio data comprises ...

Подробнее
21-01-2021 дата публикации

GENERATING AND TRAINING NEW WAKE WORDS

Номер: US20210020162A1
Принадлежит:

The disclosed technology relates to a process for automatically training a machine learning algorithm to recognize a custom wake word. By using different text-to-speech services, input providing a custom wake word to a text to speech service can be used in order to generate different speech samples covering different variations in how the custom wake word can be pronounced. These samples are automatically generated and are subsequently used to train the wake word detection algorithm that will be used by the computing device to recognize and detect when the custom wake word is uttered by any user nearby a computing device for the purposes of initiating a virtual assistant. In a further embodiment, “white-listed” words (e.g different words that are pronounced similar to the custom wake word) are also identified and trained in order to minimize the occurrence of erroneously initiating the virtual assistant. 1. A method for automatically creating a wake word detection algorithm , the method comprising:receiving a user input from a user associated with a custom wake word, wherein the user input includes one or more words that will be spoken by the user in a vicinity of a computing device, and wherein the custom wake word is used to invoke a virtual assistant associated with the computing device;generating a plurality of samples associated with the user input, wherein the plurality of samples are generated using a plurality of text-to-speech services;training a machine learning model for the custom wake word using the plurality of samples; anddeploying a wake word detection algorithm that is the result of the machine learning model for the custom wake word to the computing device, wherein the wake word detection algorithm facilitates the computing device in recognizing when the custom wake word when spoken by the user.2. The method of claim 1 , wherein the text-to-speech services generate samples that modify how the user input would be spoken based on different pitches.3. ...

Подробнее
22-01-2015 дата публикации

METHOD AND SYSTEM FOR TEXT-TO-SPEECH SYNTHESIS WITH PERSONALIZED VOICE

Номер: US20150025891A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input () of speech in the form of an audio communication from an input speaker () and generating a voice dataset () for the input speaker (). The method includes receiving a text input () at the same device as the audio input () and synthesizing () the text from the text input () to synthesized speech including using the voice dataset () to personalize the synthesized speech to sound like the input speaker (). In addition, the method includes analyzing () the text for expression and adding the expression () to the synthesized speech. The audio communication may be part of a video communication () and the audio input () may have an associated visual input () of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (). 1receiving, at a first device, incidental audio speech data over a first network communication link from a second device, wherein the incidental audio speech data is produced during an audio communication in which an operator of the second device participates and is representative of the operator's speaking characteristics;generating, by the first device, a voice dataset for the operator based, at least in part, on the incidental audio speech data;receiving, at the first device, text data from the second device over a second network communication link subsequent to receiving the incidental audio speech data;converting, by the first device, the text data to synthesized speech; andpersonalizing the synthesized speech to sound like the operator using, at least in part, the voice dataset.. A method for text-to-speech synthesis, comprising: This application is a continuation of U.S. patent application Ser. No. 11/688264, filed on Mar. 20, 2007, entitled Method and System for Text- ...

Подробнее
26-01-2017 дата публикации

Method and Device for Editing Singing Voice Synthesis Data, and Method for Analyzing Singing

Номер: US20170025115A1
Принадлежит:

A singing voice synthesis data editing method includes adding, to singing voice synthesis data, a piece of virtual note data placed immediately before a piece of note data having no contiguous preceding piece of note data, the singing voice synthesis data including: multiple pieces of note data for specifying a duration and a pitch at which each note that is in a time series, representative of a melody to be sung, is voiced; multiple pieces of lyric data associated with at least one of the multiple pieces of note data; and a sequence of sound control data that directs sound control over a singing voice synthesized from the multiple pieces of lyric data, and obtaining the sound control data that directs sound control over the singing voice synthesized from the multiple pieces of lyric data, and that is associated with the piece of virtual note data. 1. A singing voice synthesis data editing method comprising:adding to singing voice synthesis data a piece of virtual note data placed immediately before a piece of note data having no contiguous preceding piece of note data, the singing voice synthesis data including: multiple pieces of note data for specifying a duration and a pitch at which each note that is in a time series, representative of a melody to be sung, is voiced; multiple pieces of lyrics data associated with at least one of the multiple pieces of note data; and a sequence of sound control data that directs sound control over a singing voice synthesized from the multiple pieces of lyrics data; andobtaining sound control data that directs sound control over the singing voice synthesized from the multiple pieces of lyrics data, and that is associated with the piece of virtual note data.2. The singing voice synthesis data editing method according to claim 1 ,wherein the adding a piece of virtual note data includes adding, as the piece of virtual note data, a piece of note data having a time length corresponding to a time difference between the note-on timing ...

Подробнее
24-01-2019 дата публикации

Location- and Interest-Based Social Media Platform

Номер: US20190026293A1
Автор: Ruiz Richard, Shah Sharvil
Принадлежит:

A social media platform is provided having location-based or interest-based or interest location-based searching or viewing capabilities, wherein the user may select a specific location or specific interest or both to view or post. The user of provided social media platform has option of privacy settings, tailored or customized feeds based upon selected location, interest or both for viewing or posting. The user may also access the posted content from social media platform in a video or audio format. The user further has an option to conduct wild card searches for any terms within real-time feeds or content generated according to a selected location or selected interest or both. 1. A method for searching or viewing on a social media platform having location-based capabilities in a general user interface comprising:i. selecting a geographical region from a menu with various locations provided to a user;ii. displaying relevant content as feeds in a personalized view panel generated based upon selection of a specific location from said menu for said user to view or post;iii. searching for any desired content within real-time feeds displayed in said personalized view panel;iv. integrating with text-to-speech features thereby accessing said displayed content in a video or audio format in said personalized view panel;v. providing single or plurality of advertisements to said user in an advertisement panel generated based upon location selected by said user.2. The method of claim 1 , wherein selecting a geographical region from a menu with various locations comprises a city view claim 1 , a state view claim 1 , a country view claim 1 , or a world view for selection by said user to search claim 1 , view or post content relevant to said geographical region.3. The method of claim 1 , wherein said user can select one or several locations from said menu to search claim 1 , view or post content relevant to said selected one or several locations.4. The method of claim 1 , wherein ...

Подробнее
28-01-2021 дата публикации

Image Analysis for Results of Textual Image Queries

Номер: US20210026900A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for analyzing images for generating query responses. One of the methods includes determining, using a textual query, an image category for images responsive to the textual query, and an output type that identifies a type of requested content; selecting, using data that associates a plurality of images with a corresponding category, a subset of the images that each belong to the image category, each image in the plurality of images belonging to one of the two or more categories; analyzing, using the textual query, data for the images in the subset of the images to determine images responsive to the textual query; determining a response to the textual query using the images responsive to the textual query; and providing, using the output type, the response to the textual query for presentation. 124-. (canceled)25. An image query processing method , comprising:determining, by a computing system comprising one or more computing devices, using a user query, an image category for images responsive to the user query;selecting, by the computing system, using data that associates a plurality of images with a corresponding category, a subset of the images that each belong to the image category, each image in the plurality of images belonging to one of two or more categories;analyzing, by the computing system, using the user query, data for the images in the subset of the images to determine images responsive to the user query;determining, by the computing system, a response to the user query using the images responsive to the user query; andproviding, by the computing system, the response to the user query for presentation.26. The method of claim 25 , comprising:determining, by the computing system, using the user query, one or more key phrases for the user query; andwherein analyzing, using the user query, data for the images in the subset of the images to determine images ...

Подробнее
17-02-2022 дата публикации

INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD

Номер: US20220051673A1
Автор: OHMURA Junki
Принадлежит: Sony Group Corporation

The present technology relates to an information processing apparatus and an information processing method capable of providing a dialog response more appropriately. 1. An information processing apparatus comprisinga processing unit thatacquires a capability for each device that outputs a dialog response,generates the dialog response corresponding to the acquired capability from a dialog response frame used as a seed at a time of generation of the dialog response, anddeploys the generated dialog response to each device.2. The information processing apparatus according to claim 1 , whereinthe capability includes interface information regarding an interface of the device and capability information regarding ability of the interface.3. The information processing apparatus according to claim 1 , whereinthe dialog response frame includes a frame described in conformity to a specification of a general-purpose dialog response.4. The information processing apparatus according to claim 3 , whereinthe processing unit applies the dialog response frame to a conversion template including a template for conversion for each combination of the capabilities to generate the dialog response.5. The information processing apparatus according to claim 1 , whereinthe processing unit converts the dialog response corresponding to the capability into the dialog response corresponding to another capability.6. The information processing apparatus according to claim 5 , whereinthe processing unit converts the dialog response by using a rule-based conversion algorithm or a machine learning-based conversion algorithm.7. The information processing apparatus according to claim 1 , whereinthe processing unitselects the capability on a basis of context information regarding context, andgenerates the dialog response corresponding to the selected capability.8. The information processing apparatus according to claim 1 , whereinthe processing unitselects the capability corresponding to accessibility, ...

Подробнее
04-02-2021 дата публикации

ADVERTISEMENT PROCESSING APPARATUS AND ADVERTISEMENT PROCESSING METHOD FOR ADVERTISEMENT PROCESSING APPARATUS

Номер: US20210035550A1
Автор: IKUMI Tomonori
Принадлежит:

According to an embodiment, an advertisement processing apparatus detects words or phrases from advertisement information. The advertisement processing apparatus determines, for each of the detected words or phrases, meaning of an advertisement represented by the corresponding words or phrases. The advertisement processing apparatus generates template data of sales talk on a basis of a combination of the determined meaning of the advertisement. The advertisement processing apparatus creates a sentence being sales talk from the generated template data and the detected words or phrases. The advertisement processing apparatus outputs the created sentence. 1. An advertisement processing apparatus , comprising:an input interface that inputs information of an advertisement;an output interface that outputs data regarding a sentence; and detect words or phrases from the information of an advertisement input by the input interface,', 'determine, for each of the detected words or phrases, meaning of an advertisement represented by the corresponding words or phrases,', 'generate template data of sales talk on a basis of a combination of the determined meaning of the advertisement,', 'create a sentence being sales talk from the generated template data and the detected words or phrases, and', 'output data regarding the created sentence via the output interface., 'a processor configured to2. The advertisement processing apparatus according to claim 1 , further comprising:a storage device that stores a meaning information database, the meaning information database being a set of data records in which the words or phrases and meaning information are associated with each other, the meaning information being information for classifying the words or phrases in accordance with meaning of the respective words or phrases, whereinthe processor determines, with reference to the meaning information database stored in the storage device, meaning information for each of the detected words or ...

Подробнее
04-02-2021 дата публикации

Controlling Expressivity In End-to-End Speech Synthesis Systems

Номер: US20210035551A1
Принадлежит: Google LLC

A system for generating an output audio signal includes a context encoder, a text-prediction network, and a text-to-speech (TTS) model. The context encoder is configured to receive one or more context features associated with current input text and process the one or more context features to generate a context embedding associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech The TTS model is configured to process the current input text and the style embedding to generate an output audio signal of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding. 1. A system comprising: receive one or more context features associated with current input text to be synthesized into expressive speech, each context feature derived from a text source of the current input text; and', 'process the one or more context features to generate a context embedding associated with the current input text;, 'a context encoder configured to receive the current input text from the text source, the text source comprising sequences of text to be synthesized into expressive speech;', 'receive the context embedding associated with the current input text from the context encoder; and', 'process the current input text and the context embedding associated with the current input text to predict, as output, a style embedding for the current input text, the style embedding specifying a specific prosody and/or style for synthesizing the current input text into expressive speech; and, 'a text-prediction network in communication with the context encoder and configured to receive the current input text from the text source;', 'receive the style ...

Подробнее
09-02-2017 дата публикации

Two Way (+) Language Translation Communication Technology

Номер: US20170039190A1
Автор: Ricardo Joseph
Принадлежит:

The two way plus verbal communication technology will allow for two or more people to communicate with one another in real time by converting the spoken word from one communicator (sender) to the other communicator (receiver) by converting both of their primary or preferred languages between both parties in real time. 1. A method for two-way translation via voice communication device that integrates (a.) Speech to text technology , (b.) Text to translation technology , and (c.) Translated text to speech technology.a. The first step would be the speech to text technology, where a person would speak into the receiver and the words would be translated into text (for example English speech into English text)b. The second step would be that the text would then be translated into the text of another language (the native or preferred language of the receiver). For example, the English text would be translated into the Spanish equivalent text as set by the receiving party.c. The final step would be the translated text being translated from text to speech. For example, the text that was translated into Spanish would be converted Spanish text to Spanish speech to the receiving party.d. The completed process would be as follows 1. English speech to English text, (Senders Primary language speech to Primary language text) 2. English text to translation into Spanish text, (Senders Primary language text is translated in receivers preferred language) and 3. Translated Spanish text to translated Spanish speech (receivers preferred language text translated into receivers preferred language speech).2. Software that allows at a minimal capacity to display conversations conducted through the aforementioned methods in the users preferred language.3. Extended software capability to allow for the translation to and from the user between themselves and others engaged in conversation.4. A method for allowing either party to set their preferred language5. A method for determining if either ...

Подробнее
04-02-2021 дата публикации

GENERATING PROCESS FLOW MODELS USING UNSTRUCTURE CONVERSATION BOTS

Номер: US20210036974A1
Принадлежит:

In an example computer-implemented method, unstructured interactions between an unstructured conversation bot and a plurality of users are logged. A process flow model is generated based on the logged unstructured interactions. Instructions based on the process flow model are presented to a user in real time via the conversation bot. 1. A system , comprising a processor to:log unstructured interactions between an unstructured conversation bot and a plurality of users;generate a process flow model based on the logged unstructured interactions; andpresent, via the conversation bot, instructions to a user in real time based on the process flow model.2. The system of claim 1 , wherein the conversation bot comprises a voice based interface on a mobile device.3. The system of claim 1 , wherein the processor is to generate and present views of the process flow model from different perspectives.4. The system of claim 3 , wherein the views comprise a ground level view and a hierarchical top down view.5. The system of claim 1 , wherein data included in the logged unstructured interactions comprises location data and time stamps.6. The system of claim 1 , wherein the processor is to log individual deviations from the process flow model for each of the plurality of users and generate an individual model for each of the users based on the logged individual deviations.7. The system of claim 1 , wherein the processor is to log deviations from the process flow model and iteratively update the process flow model based on a deviation that is more efficient than other deviations of the logged deviations.8. A computer-implemented method claim 1 , comprising:logging, via a processor, unstructured interactions between an unstructured conversation bot and a plurality of users;generating, via the processor, a process flow model based on the logged unstructured interactions; andpresenting, via the conversation bot, instructions to a user in real time based on the process flow model.9. The ...

Подробнее
12-02-2015 дата публикации

Machine And Method To Assist User In Selecting Clothing

Номер: US20150043822A1
Принадлежит: K-NFB READING TECHNOLOGY, INC.

A device to convey information to a user regarding clothing. The device receives data that specifies a clothing mode to use for processing an image, accesses a knowledge base to provide data to configure the computer program product for the clothing mode, the data including data specific to the clothing mode and receives an image or images of an article of clothing. The device processes the image or images to identify patterns in the image corresponding to items of clothing based on information obtained from the knowledge base. 127-. (canceled)28. A method of operating a portable electronic device , the method comprising:receiving by a processor device in the portable electronic device an image that captures a scene;retrieving by the processor device a template that includes a layout of a machine;processing by the processor device the image of the scene to recognized a pattern of controls by comparing the layout in the template to the recognized pattern in the scene, and to recognize a gesturing item in the image that indicates a user-initiated gesture pointing to a portion of the pattern in the image;determining by the processing device the control pointed to by the user; andcausing by the processor device, the portable electronic device to operate in a transaction mode.29. The method of wherein a directed reading mode is selected by the user according to the command determined from the gesturing.30. The method of further comprising:applying by the processor the pattern-recognition processing to the image to detect the gesturing over a control in the image andapplying optical character recognition processing to determine text in the image; andapplying the text to speech processing to announce the text to the user.31. The method of further comprising:processing by the processor the retrieved stored template that has a stored layout of controls on the machine; andprocessing by the processor the image according to the template to navigate the user through use of the ...

Подробнее
24-02-2022 дата публикации

SOUND MODIFICATION OF SPEECH IN AUDIO SIGNALS OVER MACHINE COMMUNICATION CHANNELS

Номер: US20220059071A1
Принадлежит:

Apparatus, systems, articles of manufacture, and methods to modify sound of speech in an audio signal are disclosed. An example apparatus includes processor circuitry to execute instructions to: identify a first portion of a keyword in the speech of the audio signal during generation of the speech; determine a waveform to replace a second portion of the keyword; and transform the keyword into a different word by introducing the waveform into the audio signal. 1. An apparatus to modify sound of speech in an audio signal , the apparatus comprising:memory;instructions in the apparatus; and{'claim-text': ['identify a first portion of a keyword in the speech during generation of the speech;', 'determine a waveform to replace a second portion of the keyword; and', 'transform the keyword into a different word by introducing the waveform into the audio signal.'], '#text': 'processor circuitry to execute the instructions to:'}2. The apparatus of claim 1 , wherein the processor circuitry is to:identify an attribute of the speech; andadjust the waveform based on the attribute.3. The apparatus of claim 2 , wherein the attribute is a volume.4. The apparatus of claim 2 , wherein the attribute is a vocal register.5. The apparatus of claim 2 , wherein the attribute is a prosody.6. The apparatus of claim 2 , wherein the attribute is a speaking rate.7. The apparatus of claim 1 , wherein the processor circuitry is to:identify text of the different word based on the keyword;convert the text to speech; anddetermine the waveform based on the converted text to speech.8. The apparatus of claim 1 , wherein the processor circuitry is to:determine a source phoneme sequence of the keyword;identify a target phoneme sequence based on the source phoneme sequence; andbuild the waveform based on the target phoneme sequence.9. The apparatus of claim 8 , wherein the processor circuitry is to implement a neural network to maintain characteristics of a voice speaking the keyword in the speech signal ...

Подробнее
01-05-2014 дата публикации

SINGLE INTERFACE FOR LOCAL AND REMOTE SPEECH SYNTHESIS

Номер: US20140122080A1
Принадлежит: IVONA Software Sp. z.o.o.

Features are disclosed for providing a consistent interface for local and distributed text to speech (TTS) systems. Some portions of the TTS system, such as voices and TTS engine components, may be installed on a client device, and some may be present on a remote system accessible via a network link. Determinations can be made regarding which TTS system components to implement on the client device and which to implement on the remote server. The consistent interface facilitates connecting to or otherwise employing the TTS system through use of the same methods and techniques regardless of the which TTS system configuration is implemented. 1. A non-transitory computer storage medium which stores an executable code module that directs a client computing device to perform a process comprising:receiving, via a first interface, a first request to generate a first audio presentation of a first text input, the first request indicating a first voice with which to generate the first audio presentation;selecting a second interface using a characteristic of the client computing device, wherein the second interface is an interface to a local text-to-speech module;using the second interface to generate the first audio presentation;receiving, via the first interface, a second request to generate a second audio presentation of a second text input, the second request indicating a second voice with which to generate the second audio presentation;selecting a third interface using the characteristic of the client computing device, wherein the third interface is an interface to a remote text-to-speech server; andusing the third interface to generate the second audio presentation.2. The non-transitory computer storage medium of claim 1 , wherein the characteristic comprises one or more of: a presence of a network connection; a latency of the network connection; a presence of data corresponding to the first voice on the client computing device; or a type of application requesting the ...

Подробнее
01-05-2014 дата публикации

Automated text to speech voice development

Номер: US20140122081A1
Принадлежит: Ivona Software Sp zoo

A group of users may be presented with text and a synthesized speech recording of the text. The users can listen to the synthesized speech recording and submit feedback regarding errors or other issues with the synthesized speech. A system of one or more computing devices can analyze the feedback, modify the voice or language rules, and recursively test the modifications. The modifications may be determined through the use of machine learning algorithms or other automated processes.

Подробнее
01-05-2014 дата публикации

APPARATUS AND METHOD FOR GENERATION OF PROSODY ADJUSTED SOUND RESPECTIVE OF A SENSORY SIGNAL AND TEXT-TO-SPEECH SYNTHESIS

Номер: US20140122082A1
Принадлежит: VIVOTEXT LTD.

A method for generation of a prosody adjusted digital sound. The method comprises receiving at least a sensory signal from at least one sensor; generating a digital sound respective of an input text content and a text-to-speech content retrieved from a memory unit; and modifying the generated digital sound respective of the at least the sensory signal to create the prosody adjusted digital sound. 1. An apparatus for generating prosody adjusted sound , comprising:a memory unit for maintaining at least a library that contains information to be used for text-to-speech conversion, the memory unit further maintains exactable instructions;at least one sensor; anda processing unit connected to the memory unit and to the at least one sensor, the processing unit is configured to execute the instructions, thereby causing the apparatus to: convert a text content into speech content respective of the library, and generate a prosody adjusted digital sound respective of the speech content and at least a sensory signal received from the at least one sensor.2. The apparatus of claim 1 , further comprises:a digital-to-analog converter (DAC) configured to receive the prosody adjusted digital sound and to generate an analog signal therefrom.3. The apparatus of claim 1 , wherein the at least one sensor is any one of: a physical sensor claim 1 , a virtual sensor.4. The apparatus of claim 3 , wherein the physical sensor is any one of: a temperature sensor claim 3 , a global positioning system (GPS) claim 3 , a pressure sensor claim 3 , a light intensity claim 3 , an image analyzer claim 3 , a sound sensor claim 3 , an ultrasound sensor claim 3 , a speech recognizer claim 3 , a moistness sensor.5. The apparatus of claim 3 , wherein the virtual sensor is a data receiving component communicatively connected to a global network through an interface.6. The apparatus of claim 5 , wherein the interface is further configured to provide connectivity through a local network between the apparatus ...

Подробнее
06-02-2020 дата публикации

AUDIO SEGMENTATION METHOD BASED ON ATTENTION MECHANISM

Номер: US20200043473A1
Принадлежит: KOREA ELECTRONICS TECHNOLOGY INSTITUTE

An audio segmentation method based on an attention mechanism is provided. The audio segmentation method according to an embodiment obtains a mapping relationship between an “inputted text” and an “audio spectrum feature vector for generating an audio signal”, the audio spectrum feature vector being automatically synthesized by using the inputted text, and segments an inputted audio signal by using the mapping relationship. Accordingly, high quality can be guaranteed and the effort, time, and cost can be noticeably reduced through audio segmentation utilizing the attention mechanism. 1. An audio segmentation method comprising:receiving an input of an audio signal;receiving an input of a text regarding the audio signal;obtaining a mapping relationship between the “inputted text” and an “audio spectrum feature vector for generating an audio signal regarding the text”, the audio spectrum feature vector being automatically synthesized by using the inputted text; andsegmenting the inputted audio signal by using the mapping relationship.2. The method of claim 1 , wherein the obtaining comprises obtaining the mapping relationship from an AI module which learns the mapping relationship between the “inputted text” and the “audio spectrum feature vector” claim 1 , in an AI mechanism which automatically synthesizes an audio spectrum feature vector for generating an audio signal regarding a text using an inputted text.3. The method of claim 2 , wherein the mapping relationship is a map indicating degrees of mapping between respective “labels forming the inputted text” and respective “audio spectrum features forming the audio spectrum feature vector.”4. The method of claim 1 , further comprising post-processing the obtained mapping relationship claim 1 ,wherein the segmenting comprises segmenting the inputted audio signal by using the post-processed mapping relationship.5. The method of claim 4 , wherein the post-processing comprises mapping the respective “audio spectrum ...

Подробнее
18-02-2021 дата публикации

MULTI-SPEAKER NEURAL TEXT-TO-SPEECH

Номер: US20210049999A1
Принадлежит: Baidu USA LLC

Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker. 1. A computer-implemented method for training a text-to-speech (TTS) system to synthesize human speech from text , comprising:converting an input text to phonemes corresponding to the input text, which is a transcription corresponding to training audio comprising utterances of a speaker, to phonemes corresponding to the input text and training audio;using the training audio, the phonemes corresponding to the input text, and at least a portion of a speaker identifier input indicating an identity of the speaker corresponding to the training audio to train a segmentation model to output segmented utterances by identifying phoneme boundaries in the training audio by aligning it with the corresponding phonemes;using the phonemes corresponding to the input text, the segmented utterances obtained from a segmentation model, and at least a portion of the speaker identifier input indicating an identity of the speaker to train a duration model to output phoneme durations of the phonemes in the segmented utterances;using the training audio, the phonemes corresponding to the input text, one or more frequency profiles of the training audio, and the at least a portion of speaker identifier input indicating an identity of the speaker to ...

Подробнее
18-02-2016 дата публикации

SYSTEM AND METHOD FOR UNIFIED NORMALIZATION IN TEXT-TO-SPEECH AND AUTOMATIC SPEECH RECOGNITION

Номер: US20160049144A1
Принадлежит:

A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition. 1. A method comprising:receiving input;normalizing the input, to yield normalized input;generating, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output comprising one of phonemes corresponding to the input and text corresponding to the input;when the output comprises the phonemes corresponding to the input, generating speech by performing prosody generation and unit selection synthesis using the phonemes; andwhen the output comprises the text corresponding to the input, training both an acoustic model and a language model for use in future speech recognition.2. The method of claim 1 , wherein the training of the acoustic model and the language model uses the phonemes.3. The method of claim 2 , wherein the acoustic model and the language model are used in generating future speech and recognizing future speech.4. The method of claim 1 , further comprising outputting the speech to a user as part of a dialog system.5. The method of claim 1 , wherein the dictionary comprises syllable boundaries.6. The method of claim 1 , ...

Подробнее
18-02-2016 дата публикации

MOBILE TERMINAL DEVICE

Номер: US20160049145A1
Принадлежит: KYOCERA CORPORATION

A mobile terminal device able to automatically set suitable field break positions in accordance with the situation, able to realize a skip operation and back skip operation by specific operations, able to efficiently utilize a readout function, and able to improve convenience to a user is provided. It has an operation unit for instructing a readout function, a memory storing text, a text-to-speech unit for converting text data stored in the memory to speech data at the time of readout, an audio output unit for outputting the speech data, and a control unit for recognizing predetermined breaks from the text to be read out when outputting the speech data at the audio output unit and performing control so as to output the words from either a break position before or after the readout target text at the point of time of the input of instruction as the speech data by the audio output unit when there is a predetermined instruction by the operation unit 1. A mobile terminal device comprising:a display unit for displaying an image, andan output unit for outputting an audio, whereinwhen a remaining level of a battery supplying power becomes a predetermined value or less,the output unit outputs an audio alarm and speech indicating that the remaining level of the battery becomes the predetermined value or less, andthe display unit displays information concerning a ratio of the remaining level of the battery.2. The mobile terminal device as set forth in claim 1 , comprising:an operational unit operated by a user, whereinwhen the operational unit receives a readout instruction of the information concerning the ratio of the remaining level of the battery displayed on the display unit,the output unit outputs the speech of the information concerning the ratio of the remaining level of the battery.3. A control method comprising:when a remaining level of a battery supplying power becomes a predetermined value or less,a step of outputting an audio alarm and speech indicating that the ...

Подробнее
16-02-2017 дата публикации

Computer-Implemented System And Method For Performing Distributed Speech Recognition

Номер: US20170047070A1
Автор: Odinak Gilad
Принадлежит:

A computer-implemented system and method for performing distributed speech recognition is provided. Audio data is collected. A main grammar and secondary grammars are simultaneously provided for the audio data. Each secondary grammar includes an independent grammar. Speech recognition is simultaneously performed on the audio data using each secondary grammar. A new grammar is constructed for the audio data based on the main grammar template using results of the speech recognition. Further speech recognition is performed on the audio data using the new grammar. 1. A computer-implemented system for performing distributed speech recognition , comprising:audio data;a grammar module to simultaneously provide a main grammar and secondary grammars for the audio data, wherein each secondary grammar comprises an independent grammar;a speech recognition module to simultaneously perform speech recognition on the audio data using each secondary grammar;a new grammar module to construct a new grammar for the audio data based on the main grammar template using results of the speech recognition; anda further speech recognition module to perform further speech recognition on the audio data using the new grammar.2. A system according to claim 1 , further comprising:a prompt module to transmit prompts to a telephony interface, wherein the prompts comprise call information provided in at least one of a file and script; anda receipt module to receive from a caller the audio data in reply to one or more of the transmitted prompts.3. A system according to claim 2 , further comprising:a selection module to select at least one of the prompts for playback to the caller.4. A system according to claim 2 , further comprising at least one of:a playback module to automatically play the prompt to the caller when the file comprises speech; anda conversion module to convert the prompt into speech for playback to the caller when the prompt comprises text.5. A system according to claim 1 , further ...

Подробнее
03-03-2022 дата публикации

Speech Recognition Using Unspoken Text and Speech Synthesis

Номер: US20220068255A1
Принадлежит: Google LLC

A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

Подробнее
08-05-2014 дата публикации

Method, System, and Relevant Devices for Playing Sent Message

Номер: US20140129228A1
Автор: Lai Yizhe
Принадлежит: Huawei Technologies Co., Ltd.

A method and a system for playing a message that are applicable to the field of communications technologies. The message playing method includes: receiving, by a receiving terminal, a message that includes a user identifier and text information, obtaining a speech identifier and an image identifier corresponding to the user identifier, generating or obtaining a speech animation stream according to a speech characteristic parameter indicated by the speech identifier, an image characteristic parameter indicated by the image identifier, and the text information, and playing the speech animation stream. In this way, the text information in the message can be played as a speech animation stream according to the user identifier, the text information in the message can be presented vividly, and the message can be presented in a personalized manner according to the speech identifier and the image identifier corresponding to the user identifier. 1. A message playing method , applicable to a terminal device , comprising:receiving a message that comprises a user identifier and a text information;obtaining a speech identifier and an image identifier corresponding to the user identifier, wherein the speech identifier is used to indicate a speech characteristic parameter and the image identifier is used to indicate an image characteristic parameter; andgenerating or obtaining a speech animation stream according to the speech characteristic parameter indicated by the speech identifier, the image characteristic parameter indicated by the image identifier, and the text information; andplaying the speech animation stream.2. The method according to claim 1 , wherein before receiving the message claim 1 , the method further comprises:providing a setup interface used to receive a correspondence between the user identifier, and the speech identifier and the image identifier;receiving the correspondence between the user identifier, and the speech identifier and the image identifier from ...

Подробнее
25-02-2021 дата публикации

Development of Voice and Other Interaction Applications

Номер: US20210056951A1
Принадлежит:

Among other things, a developer of an interaction application for an enterprise can create items of content to be provided to an assistant platform for use in responses to requests of end-users. The developer can deploy the interaction application using defined items of content and an available general interaction model including intents and sample utterances having slots. The developer can deploy the interaction application without requiring the developer to formulate any of the intents, sample utterances, or slots of the general interaction model. 1. A machine-based method comprisingpresenting a user interface enabling a developer to create speech markup language strings conforming to a speech markup language definition applied by a corresponding interaction assistant platform,the user interface enabling the user to create markup language strings using plain text and graphical elements and without requiring the user to select or enter any formal expressions of markup elements of the speech markup language definition.2. The method of in which the user interface presents controls for entering text to be spoken to an end user by an interaction assistant.3. The method of in which the user interface presents controls corresponding to elements of the speech markup language strings associated with effects to be applied or added to one or more words of text to be spoken to an end user by an interaction assistant.4. The method of in which the user interface presents controls corresponding to properties of elements of the speech markup language strings.5. The method of in which the user interface presents controls corresponding to selectable values of properties of elements of the speech markup language strings.6. The method of in which the user interface presents controls comprising icons graphically representative of effects to be applied to one or more words of text to be spoken to an end user by an interaction assistant claim 1 , properties of the effects claim 1 , or ...

Подробнее
25-02-2021 дата публикации

SYSTEMS AND METHODS FOR TRANSPOSING SPOKEN OR TEXTUAL INPUT TO MUSIC

Номер: US20210056952A1
Принадлежит:

Described herein are real-time musical translation devices (RETM) and methods of use thereof. Exemplary uses of RETMs include optimizing the understanding and/or recall of an input message for a user and improving a cognitive process in a user. 1. A method of transforming textual input to a musical score comprising:receiving text input;transliterating the text input into a standardized phonemic representation of the text input;determining for the phonemic text input, a plurality of spoken pause lengths and a plurality of spoken phoneme lengths;mapping the plurality of spoken pause lengths to a respective plurality of sung pause lengths;mapping the plurality of spoken phoneme lengths to a respective plurality of sung phoneme lengths;generating, from the plurality of sung pause lengths and the plurality of sung phoneme lengths, a timed text input;generating a plurality of matching metrics for each of a respective plurality of portions of the timed text input against a plurality of melody segments; andgenerating a patterned musical message from the timed text input and the plurality of melody segments based at least in part on the plurality of matching metrics.2. The method of claim 1 , wherein the method is performed in real-time or in near-real-time claim 1 , and further comprises causing the patterned musical message to be played audibly on a transducer.3. The method of claim 1 , wherein the patterned musical message is expected to optimize claim 1 , for a user claim 1 , at least one of an understanding of the input message and a recall of the input message.4. The method of claim 1 , further comprising providing to a user a visual image relating to the patterned musical message aimed at enhancing comprehension and learning.5. The method of claim 1 , wherein the patterned musical message is presented to a user having a cognitive impairment claim 1 , a behavioral impairment claim 1 , or a learning impairment.6. The method of claim 5 , wherein the user has a ...

Подробнее
26-02-2015 дата публикации

SPEECH PROCESSING SYSTEM AND METHOD

Номер: US20150058019A1
Автор: CHEN Langzhou
Принадлежит: KABUSHIKI KAISHA TOSHIBA

A method of training an acoustic model for a text-to-speech system, 1. A method of training an acoustic model for a text-to-speech system ,the method comprising:receiving speech data,said speech data comprising data corresponding to different values of a first speech factor,and wherein said speech data is unlabelled, such that for a given item of speech data, the value of said first speech factor is unknown;clustering said speech data according to the value of said first speech factor into a first set of clusters; andestimating a first set of parameters to enable the acoustic model to accommodate speech for the different values of the first speech factor,wherein said clustering and said first parameter estimation are jointly performed according to a common maximum likelihood criterion.2. A method according to claim 1 , wherein each of the first set of clusters comprises at least one sub-cluster claim 1 , and wherein said first set of parameters are weights to be applied such there is one weight per sub-cluster claim 1 , and wherein said weights are dependent on said first speech factor.3. A method according to claim 1 , wherein said first set of parameters are constrained likelihood linear regression transforms which are dependent on said first speech factor.4. A method according to claim 1 , wherein the first speech factor is speaker and said speech data further comprises speech data from one or more speakers speaking with neutral speech.5. A method according to claim 1 , wherein the first speech factor is expression.6. A method according to claim 5 , further comprisingreceiving text data corresponding to said received speech data; extracting expressive features from the speech data and forming an expressive feature synthesis vector constructed in a second space; and', 'training a machine learning algorithm, the training input of the machine learning algorithm being an expressive linguistic feature vector and the training output the expressive feature synthesis ...

Подробнее
15-05-2014 дата публикации

Voice synthesizing method and voice synthesizing apparatus

Номер: US20140136207A1
Принадлежит: Yamaha Corp

A voice synthesizing apparatus includes a first receiver configured to receive first utterance control information generated by detecting a start of a manipulation on a manipulating member by a user, a first synthesizer configured to synthesize, in response to a reception of the first utterance control information, a first voice corresponding to a first phoneme in a phoneme sequence of a voice to be synthesized to output the first voice, a second receiver configured to receive second utterance control information generated by detecting a completion of the manipulation on the manipulating member or a manipulation on a different manipulating member, and a second synthesizer configured to synthesize, in response to a reception of the second utterance control information, a second voice including at least the first phoneme and a succeeding phoneme being subsequent to the first phoneme of the voice to be synthesized to output the second voice.

Подробнее