Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 20039. Отображено 100.
19-01-2012 дата публикации

Intelligent Automated Assistant

Номер: US20120016678A1
Принадлежит: Apple Inc

An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

Подробнее
09-02-2012 дата публикации

System and method for synthetic voice generation and modification

Номер: US20120035933A1
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

Подробнее
22-03-2012 дата публикации

Mobile business client

Номер: US20120072489A1
Принадлежит: Individual

The subject mater herein relates to computer software and client-server based applications and, more particularly, to a mobile business client. Some embodiments include one or more device-agnostic application interaction models and one or more device specific transformation services. Some such embodiments provide one or more of systems, methods, and software embodied at least in part in a device specific transformation service to transform channel agnostic application interaction models to and from device or device surrogate specific formats.

Подробнее
03-05-2012 дата публикации

Speech Morphing Communication System

Номер: US20120109627A1
Автор: Fathy Yassa
Принадлежит: Fathy Yassa

A networked communication system is described. The communication system including an automatic speech recognizer configured to receive a speech signal from a client over a network and to convert the speech signal into a text sequence. The communication also including a speech analyzer configured to receive the speech signal. The speech analyzer configured to extract paralinguistic characteristics from the speech signal. In addition, the communication system includes a speech output device coupled with the automatic speech recognizer and the speech analyzer. The speech output device configured to convert the text sequence into an output speech signal based on the extracted paralinguistic characteristics.

Подробнее
28-06-2012 дата публикации

Method and System for Construction and Rendering of Annotations Associated with an Electronic Image

Номер: US20120166175A1
Автор: Sunil Kumar KOPPARAPU
Принадлежит: Tata Consultancy Services Ltd

A method and system for construction and rendering of annotations associated with an electronic image is disclosed. The system comprises a first data repository for storing the electronic image, which has a plurality of pixels, with one or more pixels annotated at a plurality of levels, which contain descriptive characteristics of the pixel, in ascending magnitude, such that the descriptive characteristics at a subsequent level are with reference to descriptive characteristics of one or more pixels surrounding the pixel. The system comprises a second data repository for storing the annotations. An image display module is configured to display the electronic image. A pixel and level identification module is configured to receive pixel and level selection details from a user-interface. An annotation retrieval module is configured to retrieve annotations corresponding to the pixel and level selection from the second repository and renders the retrieved annotations for the electronic image.

Подробнее
05-07-2012 дата публикации

Multi-lingual text-to-speech system and method

Номер: US20120173241A1

A multi-lingual text-to-speech system and method processes a text to be synthesized via an acoustic-prosodic model selection module and an acoustic-prosodic model mergence module, and obtains a phonetic unit transformation table. In an online phase, the acoustic-prosodic model selection module, according to the text and a phonetic unit transcription corresponding to the text, uses at least a set controllable accent weighting parameter to select a transformation combination and find a second and a first acoustic-prosodic models. The acoustic-prosodic model mergence module merges the two acoustic-prosodic models into a merged acoustic-prosodic model, according to the at least a controllable accent weighting parameter, processes all transformations in the transformation combination and generates a merged acoustic-prosodic model sequence. A speech synthesizer and the merged acoustic-prosodic model sequence are further applied to synthesize the text into an L1-accent L2 speech.

Подробнее
05-07-2012 дата публикации

Method and apparatus for annotating a document

Номер: US20120173959A1
Принадлежит: Individual

To facilitate the use of audio files for annotation purposes, an audio file format, which includes audio data for playback purposes, is augmented with a parallel data channel of line identifiers, or with a map associating time codes for the audio data with line numbers on the original document. The line number-time code information in the audio file is used to navigate within the audio file, and also to associate bookmark links and captured audio annotation files with line numbers of the original text document. An annotation device may provide an output document wherein links to audio and/or text annotation files are embedded at corresponding line numbers. Also, a navigation index may be generated, having links to annotation files and associated document line numbers, as well as bookmark links to selected document line numbers.

Подробнее
19-07-2012 дата публикации

Extracting text for conversion to audio

Номер: US20120185253A1
Принадлежит: Microsoft Corp

Embodiments are disclosed that relate to converting markup content to an audio output. For example, one disclosed embodiment provides, in a computing device a method including partitioning a markup document into a plurality of content panels, and forming a subset of content panels by filtering the plurality of content panels based upon geometric and/or location-based criteria of each panel relative to an overall organization of the markup document. The method further includes determining a document object model (DOM) analysis value for each content panel of the subset of content panels, identifying a set of content panels determined to contain text body content by filtering the subset of content panels based upon the DOM analysis value of each of the content panels of the subset of content panels, and converting text in a selected content panel determined to contain text body content to an audio output.

Подробнее
09-08-2012 дата публикации

Recognition dictionary creating device, voice recognition device, and voice synthesizer

Номер: US20120203553A1
Автор: Yuzo Maruta
Принадлежит: Mitsubishi Electric Corp

A recognition dictionary creating device includes a user dictionary in which a phoneme label string of an inputted voice is registered and an interlanguage acoustic data mapping table in which a correspondence between phoneme labels in different languages is defined, and refers to the interlanguage acoustic data mapping table to convert the phoneme label string registered in the user dictionary and expressed in a language set at the time of creating the user dictionary into a phoneme label string expressed in another language which the recognition dictionary creating device has switched.

Подробнее
23-08-2012 дата публикации

Apparatus, and associated method, for selecting information delivery manner using facial recognition

Номер: US20120212629A1
Принадлежит: Research in Motion Ltd

An apparatus, and an associated method, selects a manner by which to deliver received information at a wireless, or other electronic, device. A facial recognition indication is obtained and analyzed. Responsive to the analysis of the facial recognition indication, selection is made of the manner by which to deliver the information. If the facial recognition indication indicates the recipient to exhibit a serious demeanor, the information is provided in aural form, thereby to permit delivery of the information without requiring the recipient to read, or otherwise view, the information.

Подробнее
23-08-2012 дата публикации

Hearing assistance system for providing consistent human speech

Номер: US20120215532A1
Принадлежит: Apple Inc

Broadly speaking, the embodiments disclosed herein describe an apparatus, system, and method that allows a user of a hearing assistance system to perceive consistent human speech. The consistent human speech can be based upon user specific preferences.

Подробнее
30-08-2012 дата публикации

Network apparatus and methods for user information delivery

Номер: US20120221412A1
Автор: Robert F. Gazdzinski
Принадлежит: Individual

A network apparatus useful for providing directions and other information to a user of a client device in wireless communication therewith. In one embodiment, the apparatus includes one or more wireless interfaces and a network interface for communication with a server. User speech inputs in the form of digitized representations are received by the apparatus and used by the server as the basis for retrieving information including graphical representations of location or entities that the user wishes to find.

Подробнее
20-09-2012 дата публикации

Apparatus and method for supporting reading of document, and computer readable medium

Номер: US20120239390A1
Принадлежит: Toshiba Corp

According to one embodiment, an apparatus for supporting reading of a document includes a model storage unit, a document acquisition unit, a feature information extraction, and an utterance style estimation unit. The model storage unit is configured to store a model which has trained a correspondence relationship between first feature information and an utterance style. The first feature information is extracted from a plurality of sentences in a training document. The document acquisition unit is configured to acquire a document to be read. The feature information extraction unit is configured to extract second feature information from each sentence in the document to be read. The utterance style estimation unit is configured to compare the second feature information of a plurality of sentences in the document to be read with the model, and to estimate an utterance style of the each sentence of the document to be read.

Подробнее
04-10-2012 дата публикации

Techniques for style transformation

Номер: US20120251016A1
Принадлежит: Intel Corp

Techniques to stylistically transform source text are disclosed. A source text and information about an output channel may be received. The source text may be stylistically transformed based on the information about the output channel. The stylistically transformed source text may be output. Other embodiments are described and claimed.

Подробнее
01-11-2012 дата публикации

Remote healthcare system and healthcare method using the same

Номер: US20120278072A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

A remote healthcare system includes a healthcare staff terminal which includes an input part configured to input text to be transmitted to a patient by a healthcare staff member, and a first transmitter-receiver part configured to transmit the text and a qualifier of the healthcare staff member; a server which includes a second transmitter-receiver part configured to receive the text and the qualifier of the healthcare staff member transmitted from the healthcare staff terminal, an acoustic source database having an acoustic source of the healthcare staff member stored therein, and a converter configured to change the text into voice using the stored acoustic source of the healthcare staff member; and a patient terminal which includes a third transmitter-receiver part configured to receive the voice converted from the text and the text transmitted by the second transmitter-receiver part of the server, and an output part configured to output the voice to the patient who is managed by the healthcare staff member.

Подробнее
15-11-2012 дата публикации

Electronic Holder for Reading Books

Номер: US20120290304A1
Автор: Khaled Jafar Al-Hasan
Принадлежит: Individual

A book support and optical scanner assembly for converting printed text to an audio output includes a support for supporting an open book and a pair of optical scanners adapted to scan opposite pages. The assembly also includes means for moving the scanners from the top of the page to the bottom of a page. Further, both scanners can be rotated off of the book for turning a page. In addition, the assembly includes a text to audio converter for converting the scanned text into spoken words and in one embodiment a translator to translate the scanned text into a pre-selected language.

Подробнее
29-11-2012 дата публикации

Methods and apparatus for correcting recognition errors

Номер: US20120304057A1
Принадлежит: Nuance Communications Inc

Techniques for error correction using a history list comprising at least one misrecognition and correction information associated with each of the at least one misrecognitions indicating how a user corrected the associated misrecognition. The techniques include converting data input from a user to generate a text segment, determining whether at least a portion of the text segment appears in the history list as one of the at least one misrecognitions, if the at least a portion of the text segment appears in the history list as one of the at least one misrecognitions, obtaining the correction information associated with the at least one misrecognition, and correcting the at least a portion of the text segment based, at least in part, on the correction information.

Подробнее
06-12-2012 дата публикации

System and Method for Enhancing Locative Response Abilities of Autonomous and Semi-Autonomous Agents

Номер: US20120306741A1
Автор: Kalyan M. Gupta
Принадлежит: KNEXUS RESEARCH Corp

A computer system and method according to the present invention can receive multi-modal inputs such as natural language, gesture, text, sketch and other inputs in order to simplify and improve locative question answering in virtual worlds, among other tasks. The components of an agent as provided in accordance with one embodiment of the present invention can include one or more sensors, actuators, and cognition elements, such as interpreters, executive function elements, working memory, long term memory and reasoners for responses to locative queries, for example. Further, the present invention provides, in part, a locative question answering algorithm, along with the command structure, vocabulary, and the dialog that an agent is designed to support in accordance with various embodiments of the present invention.

Подробнее
06-12-2012 дата публикации

Voice Synthesis Apparatus

Номер: US20120310651A1
Автор: Keijiro Saino
Принадлежит: Yamaha Corp

A voice signal is synthesized using a plurality of phonetic piece data each indicating a phonetic piece containing at least two phoneme sections corresponding to different phonemes. In the apparatus, a phonetic piece adjustor forms a target section from first and second phonetic pieces so as to connect the first and second phonetic pieces to each other such that the target section includes a rear phoneme section of the first piece and a front phoneme section of the second piece, and expands the target section by a target time length to form an adjustment section such that a central part is expanded at an expansion rate higher than that of front and rear parts of the target section, to thereby create synthesized phonetic piece data having the target time length. A voice synthesizer creates a voice signal from the synthesized phonetic piece data.

Подробнее
27-12-2012 дата публикации

Method for producing ammonium tungstate aqueous solution

Номер: US20120328506A1

A method for producing an ammonium tungstate aqueous solution includes the steps of: adding sulfuric acid to a solution containing tungstate ions; bringing the solution having the sulfuric acid added therein, into contact with an anion exchange resin; and bringing the anion exchange resin into contact with an aqueous solution containing ammonium ions.

Подробнее
27-12-2012 дата публикации

Mobile wireless communications device for hearing and/or speech impaired user

Номер: US20120329518A1
Автор: Neeraj Garg
Принадлежит: Research in Motion Ltd

A mobile wireless communications device includes a housing and transceiver carried by the housing for transmitting and receiving radio frequency (RF) signals carrying communications data of speech. A processor is coupled to the transceiver for processing the communications data as speech that is transmitted and received to and from the transceiver. A keyboard and display is carried by the housing and connected to the processor. A speech-to-text and text-to-speech module converts communications data as speech received from the transceiver to text that is displayed on the display and converting text that is typed by a user on the keyboard in the communications data as speech to be transmitted from the transceiver as an RF signal.

Подробнее
27-12-2012 дата публикации

System and method for translation

Номер: US20120330643A1
Автор: John Frei, Yan Auerbach
Принадлежит: SPEECHTRANS Inc

A system and method of improving a translation system are disclosed, in which the method may include presenting initial text in a source language and a corresponding translation text sequence in a target language, to a user on a computing device; prompting the user to propose alternative text for at least a portion of the translation text sequence; receiving proposed alternative translation text from the user; assigning a rating, by the user, to the proposed alternative translation text; and storing the received proposed translation text in a database.

Подробнее
27-12-2012 дата публикации

Method, system and processor-readable media for automatically vocalizing user pre-selected sporting event scores

Номер: US20120330666A1
Принадлежит: Verna IP Holdings LLC

A method and system for vocalizing user-selected sporting event scores. A customized spoken score application module can be configured in association with a device. A real-time score can be preselected by a user from an existing sporting event website for automatically vocalizing the score in a multitude of languages utilizing a speech synthesizer and a translation engine. An existing text-to-speech engine can be integrated with the spoken score application module and controlled by the application module to automatically vocalize the preselected scores listed on the sporting event site. The synthetically-voiced, real-time score can be transmitted to the device at a predetermined time interval. Such an approach automatically and instantly pushes the real time vocal alerts thereby permitting the user to continue multitasking without activating the pre-selected vocal alerts.

Подробнее
27-12-2012 дата публикации

Speech synthesizer, navigation apparatus and speech synthesizing method

Номер: US20120330667A1
Принадлежит: HITACHI LTD

Included in a speech synthesizer, a natural language processing unit divides text data, input from a text input unit, into a plurality of components (particularly, words). An importance prediction unit estimates an importance level of each component according to the degree of how much each component contributes to understanding when a listener hears synthesized speech. Then, the speech synthesizer determines a processing load based on the device state when executing synthesis processing and the importance level. Included in the speech synthesizer, a synthesizing control unit and a wave generation unit reduce the processing time for a phoneme with a low importance level by curtailing its processing load (relatively degrading its sound quality), allocate a part of the processing time, made available by this reduction, to the processing time of a phoneme with a high importance level, and generates synthesized speech in which important words are easily audible.

Подробнее
10-01-2013 дата публикации

Mobile computing apparatus and method of reducing user workload in relation to operation of a mobile computing apparatus

Номер: US20130013314A1
Принадлежит: TomTom International BV

A mobile computing apparatus comprises a processing resource arranged to support, when in use, an operational environment, the operational environment supporting receipt of textual content, a workload estimator arranged to estimate a cognitive workload for a user, and a text-to-speech engine. The text-to-speech engine is arranged to translate at least part of the received textual content to a signal reproducible as audible speech in accordance with a predetermined relationship between the amount of the textual content to be translated and a cognitive workload level in a range of cognitive workload levels, the range of cognitive workload levels comprising at least one cognitive workload level between end values.

Подробнее
24-01-2013 дата публикации

User Profile Based Speech To Text Conversion For Visual Voice Mail

Номер: US20130023287A1
Принадлежит: AT&T MOBILITY II LLC

Messages in a message system are converted from one format to another format in accordance with preferred message formats and/or conditions. Message formats can include text messages, multimedia messages, visual voicemail messages, and/or other audio/visual messages. Based on conditions such as recipient device location or velocity and a preferred message format a message can be converted into an appropriate transmission format and transmitted and/or communicated to the recipient in its appropriate format (e.g., text, multimedia, audio, etc. . . . ).

Подробнее
14-03-2013 дата публикации

Apparatus and method for generating vocal organ animation

Номер: US20130065205A1
Автор: Bong-Rae Park
Принадлежит: CLUSOFT CO Ltd

The present disclosure relates to an apparatus and method for generating a vocal organ animation very similar to a pronunciation pattern of a native speaker in order to support foreign language pronunciation education. The present disclosure checks an adjacent phonetic value in phonetic value constitution information, extracts a detail phonetic value based on the adjacent phonetic value, extracting pronunciation pattern information corresponding to the detail phonetic value and pronunciation pattern information corresponding to a transition section allocated between detail phonetic values, and performs interpolation to the extracted pronunciation pattern information, thereby generating a vocal organ animation.

Подробнее
14-03-2013 дата публикации

Parametric speech synthesis method and system

Номер: US20130066631A1
Автор: Fengliang Wu, Zhenhua Wu
Принадлежит: Goertek Inc

The present invention provides a parametric speech synthesis method and a parametric speech synthesis system. The method comprises sequentially processing each frame of speech of each phone in a phone sequence of an input text as follows: for a current phone, extracting a corresponding statistic model from a statistic model library and using model parameters of the statistic model that correspond to the current frame of the current phone as rough values of currently predicted speech parameters; according to the rough values and information about a predetermined number of speech frames occurring before the current time point, obtaining smoothed values of the currently predicted speech parameters; according to global mean values and global standard deviation ratios of the speech parameters obtained through statistics, performing global optimization on the smoothed values of the speech parameters to generate necessary speech parameters; and synthesizing the generated speech parameters to obtain a frame of speech synthesized for the current frame of the current phone. With this solution, the capacity of an RAM needed by speech synthesis will not increase with the length of the synthesized speech, and the time length of the synthesized speech is no longer limited by the RAM.

Подробнее
21-03-2013 дата публикации

Alarm method and apparatus in portable terminal

Номер: US20130070575A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

An alarm method in a portable terminal is provided, including setting an alarm by setting an alarm time, setting output information to be output at the alarm time, and setting an output device for receiving and displaying the output information; outputting a preset alarm sound when the set alarm time arrives, and obtaining the output information to be output at the alarm time; and transmitting the obtained output information to the set output device.

Подробнее
28-03-2013 дата публикации

Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus

Номер: US20130080176A1
Принадлежит: AT&T INTELLECTUAL PROPERTY II, L.P.

A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. Statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. The system synthesizes a large body of speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur. 1. A method comprising: assigning a default value as the associated concatenation cost; and', 'updating the concatenation cost database by synthesizing a body of speech, identifying the acoustic unit sequential pair in the body of speech, and recording a respective concatenation cost in the concatenation cost database., 'when, while synthesizing speech, an acoustic unit sequential pair does not have an associated concatenation cost in a concatenation cost database2. The method of claim 1 , further comprising synthesizing the speech using the respective concatenation cost.3. The method of claim 1 , wherein recording the respective concatenation cost comprises:assigning a value to each acoustic unit in the acoustic unit sequential pair; anddetermining a difference associated with the value assigned to each acoustic unit, to yield the respective concatenation cost.4. The method of claim 1 , wherein the concatenation cost database contains a portion of all possible concatenation costs associated with ...

Подробнее
04-04-2013 дата публикации

SPEECH SAMPLES LIBRARY FOR TEXT-TO-SPEECH AND METHODS AND APPARATUS FOR GENERATING AND USING SAME

Номер: US20130085759A1
Принадлежит: VIVOTEXT LTD.

A method for converting translating text into speech with a speech sample library is provided. The method comprises converting translating an input text to a sequence of triphones; determining musical parameters of each phoneme in the sequence of triphones; detecting, in the speech sample library, speech segments having at least the determined musical parameters; and concatenating the detected speech segments. 1. A method for converting text into speech with a speech sample library , comprising:converting an input text to a sequence of triphones;determining musical parameters of each phoneme in the sequence of triphones;detecting, in the speech sample library, speech segments having at least the determined musical parameters; andconcatenating the detected speech segments.2. The method of claim 1 , further comprising:adjusting the musical parameters of speech segments prior to concatenating the speech segments.3. The method of claim 1 , wherein the at least one musical parameter is any one of: a pitch curve claim 1 , a pitch perception claim 1 , duration claim 1 , and a volume.4. The method of claim 3 , wherein a value of a musical vector is an index indicative of a sub range in which its respective at least one musical parameter lies.5. The method of claim 1 , wherein the sequence of triphones includes overlapping triphones.6. The method of claim 2 , wherein determining the musical parameters of each phoneme in the sequence of triphones further includes: providing a set of numerical targets for each of the musical parameters.7. The method of claim 6 , wherein detecting the speech segments having at least the determined musical parameters further includes:searching the speech sample library for at least one of a central phoneme, phonemic context, and a musical index indicating at least one range of at least one of the musical parameters within which at least of the numerical targets lies.8. The method of claim 1 , wherein each of the speech segments comprises at ...

Подробнее
04-04-2013 дата публикации

TRAINING AND APPLYING PROSODY MODELS

Номер: US20130085760A1
Автор: Jr. James H., Stephens
Принадлежит: MORPHISM LLC

Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles. 126-. (canceled)27. A computer-implementable method for synthesizing audible speech , with varying prosody , from textual content , the method comprising:maintaining an inventory of prosody models with lexicons,selecting a subset of multiple prosody models from the inventory of prosody models;associating prosody models in the subset of multiple prosody models with different segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models;applying the associated prosody models to the different segments of the text to produce prosody annotations for the text;considering annotations of the prosody annotations to reconcile conflicting prosody annotations due to multiple prosody models associated with a segment of the text; andsynthesizing audible speech from the text and the reconciled prosody annotations.28. The method of claim 27 , wherein the reconciling is based on a reconciliation policy.29. The method of claim 28 , wherein the reconciliation policy considers the annotations of the prosody annotations that comprise a prosody model identifier and a prosody model confidence for the prosody annotation.30. The method of claim 29 , wherein annotations of the prosody annotations are represented by markup elements that indicate the scope of the tagged text.31. The method of claim 30 , wherein the reconciliation eliminates conflicting annotations that result from applications of multiple models.32. The method of claim 31 , wherein the selecting is based on input parameters.33. The ...

Подробнее
18-04-2013 дата публикации

FACILITATING TEXT-TO-SPEECH CONVERSION OF A USERNAME OR A NETWORK ADDRESS CONTAINING A USERNAME

Номер: US20130096920A1
Принадлежит: RESEARCH IN MOTION LIMITED

To facilitate text-to-speech conversion of a username, a first or last name of a user associated with the username may be retrieved, and a pronunciation of the username may be determined based at least in part on whether the name forms at least part of the username. To facilitate text-to-speech conversion of a domain name having a top level domain and at least one other level domain, a pronunciation for the top level domain may be determined based at least in part upon whether the top level domain is one of a predetermined set of top level domains. Each other level domain may be searched for one or more recognized words therewithin, and a pronunciation of the other level domain may be determined based at least in part on an outcome of the search. The username and domain name may form part of a network address such as an email address, URL or URI. 1. A method , for a wireless communication device , for text-to-speech conversion of a network address , the method comprising:determining that a part of a username in the network address comprises one of a recognized word from a spoken language, a first name, and a last name; andgenerating a representation of a pronunciation of the part, pronounced as a whole.2. The method of claim 1 , wherein the part comprises the entire username.3. The method of claim 1 , wherein determining comprises searching the username for the part.4. The method of claim 1 , wherein the network address is an email address.5. The method of claim 4 , wherein the email address contains an ‘@’ symbol claim 4 , and wherein the username corresponds to the portion of the email address preceding the ‘@’ symbol.6. The method of claim 5 , further comprising retrieving the username as the portion of the email address preceding the ‘@’ symbol.7. The method of claim 4 , wherein determining comprises identifying the part as one of a first name and a last name included in a display name received in conjunction with the email address.8. The method of claim 1 , ...

Подробнее
25-04-2013 дата публикации

Mobile voice platform architecture with remote service interfaces

Номер: US20130102295A1
Принадлежит: GM GLOBAL TECHNOLOGY OPERATIONS LLC

A mobile voice platform for providing a user speech interface to computer-based services includes a mobile device having a processor, communication circuitry that provides access to the computer-based services, an operating system, and one or more applications that are run using the operating system and that utilize one or more of the computer-based services via the communication circuitry. The mobile voice platform includes at least one non-transient digital storage medium storing a program module having computer instructions that, upon execution by the processor, receives speech recognition results representing user speech that has been processed using automated speech recognition, determines a desired computer-based service based on the speech recognition results, accesses a remotely-stored service interface associated with the desired service, initiates the desired service using the service interface, receives a service result from the desired service, and provides a text-based service response for conversion to a speech response to be provided to the user.

Подробнее
02-05-2013 дата публикации

FACILITATING TEXT-TO-SPEECH CONVERSION OF A DOMAIN NAME OR A NETWORK ADDRESS CONTAINING A DOMAIN NAME

Номер: US20130110512A1
Принадлежит: RESEARCH IN MOTION LIMITED

A method and apparatus of facilitating text-to-speech conversion of a domain name are provided. At a processor of a computing device, a pronunciation of a top level domain of a network address is determined by one or more of: generating a phonetic representation of each character in the top level domain pronounced individually; and, generating a tokenized representation of each individual character of the top level domain suitable for interpretation by a text-to-speech engine. For each other level domain of the network address, at the processor, a pronunciation of the other level domain is determined based on one or more recognized words within the other level domain. 1. A method comprising: generating a phonetic representation of each character in the top level domain pronounced individually; and,', 'generating a tokenized representation of each individual character of the top level domain suitable for interpretation by a text-to-speech engine; and, 'determining, at a processor of a computing device, a pronunciation of a top level domain of a network address by one or more offor each other level domain of the network address, determining, at the processor, a pronunciation of the other level domain based on one or more recognized words within the other level domain.2. The method of claim 1 , wherein the determining the pronunciation of a top level domain of a network address further comprises determining whether said top level domain is one of a set of top level domains.3. The method of claim 2 , wherein the set represents top level domains that are pronounced as a whole.4. The method of claim 1 , wherein the determining the pronunciation of a top level domain of a network address further comprises one or more of:generating a phonetic representation of the top level domain pronounced as a whole; andgenerating a tokenized representation of the top level domain as a whole suitable for interpretation by a text-to-speech engine.5. The method of claim 1 , wherein the ...

Подробнее
09-05-2013 дата публикации

Message and vehicle interface integration system and method

Номер: US20130117021A1
Автор: Frances H James
Принадлежит: GM GLOBAL TECHNOLOGY OPERATIONS LLC

A method and system uses an integration application to extract an information feature from a message and to provide the information feature to a vehicle interface device which acts on the information feature to provide a service. The extracted information feature may be automatically acted upon, or may be outputted for review, editing, and/or selection before being acted on. The vehicle interface device may include a navigation system, infotainment system, telephone, and/or a head unit. The message may be received by the vehicle interface device or from a portable or remote device in linked communication with the vehicle interface device. The message may be a voice-based or text-based message. The service may include placing a call, sending a message, or providing navigation instructions using the information feature. An off-board or back-end service provider in communication with the integration application may extract and/or transcribe the information feature and/or provide a service.

Подробнее
09-05-2013 дата публикации

Personalized Vocabulary for Digital Assistant

Номер: US20130117022A1
Принадлежит: Apple Inc

Methods, systems, and computer readable storage medium related to operating an intelligent digital assistant are disclosed. A text string is obtained from a speech input received from a user. The received text string is interpreted to derive a representation of user intent based at least in part on a plurality of words associated with a user and stored in memory associated with the user, the plurality of words including words from a plurality of user interactions with an automated assistant. At least one domain, a task, and at least one parameter for the task, are identified based at least in part on the representation of user intent. The identified task is performed. An output is provided to the user, where the output is related to the performance of the task.

Подробнее
09-05-2013 дата публикации

Apparatus and method for representing an image in a portable terminal

Номер: US20130117025A1
Автор: Hyunmi Park, Sanghyuk Koh
Принадлежит: SAMSUNG ELECTRONICS CO LTD

An apparatus for displaying an image in a portable terminal includes a camera to photograph the image, a touch screen to display the image and to allow selecting an object area of the displayed image, a memory to store the image, a controller to detect at least one object area within the image when displaying the image of the camera or the memory and to recognize object information of the detected object area to be converted into a voice, and an audio processing unit to output the voice.

Подробнее
09-05-2013 дата публикации

SPEECH SYNTHESIZER, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Номер: US20130117026A1
Автор: Kato Masanori
Принадлежит: NEC Corporation

State duration creation means creates a state duration indicating a duration of each state in a hidden Markov model, based on linguistic information and a model parameter of prosody information. Duration correction degree computing means derives a speech feature from the linguistic information, and computes a duration correction degree which is an index indicating a degree of correcting the state duration, based on the derived speech feature. State duration correction means corrects the state duration based on a phonological duration correction parameter and the duration correction degree, the phonological duration correction parameter indicating a correction ratio of correcting a phonological duration. 110.-. (canceled)11. A speech synthesizer comprising:a state duration creation unit for creating a state duration indicating a duration of each state in a hidden Markov model, based on linguistic information and a model parameter of prosody information;a duration correction degree computing unit for deriving a speech feature from the linguistic information, and computing a duration correction degree based on the derived speech feature, the duration correction degree being an index indicating a degree of correcting the state duration; anda state duration correction unit for correcting the state duration based on a phonological duration correction parameter and the duration correction degree, the phonological duration correction parameter indicating a correction ratio of correcting a phonological duration.12. The speech synthesizer according to claim 11 , wherein the duration correction degree computing unit estimates a temporal change degree of the speech feature derived from the linguistic information claim 11 , and computes the duration correction degree based on the estimated temporal change degree.13. The speech synthesizer according to claim 12 , wherein the duration correction degree computing unit estimates a temporal change degree of a spectrum or a pitch from ...

Подробнее
16-05-2013 дата публикации

VIDEO GENERATION BASED ON TEXT

Номер: US20130124206A1
Автор: Rezvani Behrooz, ROUHI Ali
Принадлежит: Seyyer, Inc.

Techniques for generating a video sequence of a person based on a text sequence, are disclosed herein. Based on the received text sequence, a processing device generates the video sequence of a person to simulate visual and audible emotional expressions of the person, including using an audio model of the person's voice to generate an audio portion of the video sequence. The emotional expressions in the visual portion of the video sequence are simulated based a priori knowledge about the person. For instance, the a priori knowledge can include photos or videos of the person captured in real life. 1. A method comprising:inputting a text sequence at a processing device; andgenerating, by the processing device, a video sequence of a person based on the text sequence to simulate visual and audible emotional expressions of the person, including using an audio model of the person's voice to generate an audio portion of the video sequence.2. The method of claim 1 , wherein the processing device is a mobile device claim 1 , the text sequence is inputted from a second mobile device via a Short Message Service (SMS) channel claim 1 , and said generating a video sequence of a person comprises generating claim 1 , by the mobile device claim 1 , a video sequence of a person based on shared information stored on the mobile device and the second mobile device.3. The method of claim 1 , wherein the text sequence includes a set of words including at least one word claim 1 , and wherein the video sequence is generated such that the person appears to utter the words in the video sequence.4. The method of claim 1 , wherein the text sequence includes a text representing an utterance claim 1 , and wherein the video sequence is generated such that the person appears to utter the utterance in the video sequence.5. The method of claim 1 , wherein the text sequence includes a word and an indicator for the word claim 1 , the indicator indicates an emotional expression of the person at a time ...

Подробнее
23-05-2013 дата публикации

System and Method for Generating Challenge Items for CAPTCHAs

Номер: US20130132093A1
Автор: GROSS JOHN NICHOLAS

Challenge items for an audible based electronic challenge system are generated using a variety of techniques to identify optimal candidates. The challenge items are intended for use in a computing system that discriminates between humans and text to speech (TTS) system. 119.-. (canceled)20. A method embodied in a computer readable medium of selecting challenge data to be used for accessing data and/or resources of a computing system comprising:(a) providing data identifying a first set of diphones to be assessed by a computing system, wherein each of said first set of diphones represents a sound associated with an articulation of a pair of phonemes in a natural language;(b) generating an a plurality of articulation scores using the computing system based on measuring acoustical characteristics of a machine text to speech (TTS) system articulation of each of said first set of diphones; and(c) selecting challenge text including words and phrases from the natural language using the computing system based on said plurality of articulation scores;wherein said challenge text is useable by an utterance-based challenge system for discriminating between humans and machines.21. The method of further including a step: processing input speech by an entity using said challenge item database to distinguish between a human and a machine synthesized voice.22. A method embodied in a computer readable medium of selecting challenge data to be used for accessing data and/or resources of a computing system comprising:a) selecting a candidate challenge item which includes text words and/or visual images;b) measuring first acoustical characteristics of a computer synthesized utterance when articulating challenge content associated with said candidate challenge item;c) measuring second acoustical characteristics of a human utterance when articulating said challenge content;d) generating a challenge item score based on measuring a difference in said first and second acoustical ...

Подробнее
06-06-2013 дата публикации

Voice over ip method for developing interactive voice response system

Номер: US20130142317A1
Автор: Douglas F. Reynolds
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

A personal interactive voice response service node includes a memory that stores executable instructions, and a processor that executes the executable instructions. The personal interactive voice response service node accepts personalized instructions to define interactive voice response service node prompts on behalf of an individual assigned a communications address serviced by the personal interactive voice response service node. The personal interactive voice response service node accepts personalized instructions to define actions to take in response to selections of corresponding interactive voice response service node prompts. When executed by the processor, the executable instructions cause the personal interactive voice response service node to identify a selection of an interactive voice response service node prompt and execute an action associated with the selection identified.

Подробнее
06-06-2013 дата публикации

Enhanced voice conferencing with history

Номер: US20130144603A1
Принадлежит: ELWHA LLC

Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to enhance voice conferencing among multiple speakers. Some embodiments of the AEFS enhance voice conferencing by recording and presenting voice conference history information based on speaker-related information. The AEFS receives data that represents utterances of multiple speakers who are engaging in a voice conference with one another. The AEFS then determines speaker-related information, such as by identifying a current speaker, locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS records conference history information (e.g., a transcript) based on the determined speaker-related information. The AEFS then informs a user of the conference history information, such as by presenting a transcript of the voice conference and/or related information items on a display of a conferencing device associated with the user.

Подробнее
06-06-2013 дата публикации

SYSTEMS AND METHODS DOCUMENT NARRATION

Номер: US20130144625A1
Принадлежит: K-NFB READING TECHNOLOGY, INC.

Disclosed are techniques and systems to provide a narration of a text in multiple different voices. In some aspects, systems and methods described herein can include receiving a user-based selection of a first portion of words in a document where the document has a pre-associated first voice model and overwriting the association of the first voice model, by the one or more computers, with a second voice model for the first portion of words. 1. A computer implemented method , comprising:receiving a user-based selection of a first portion of words in a document, at least of portion of the document being displayed on a user interface on a display device, the document being pre-associated with a first voice model;applying, by the one or more computers, in response to the user-based selection of the first portion of words, a first set of indicia to the user-selected first portion of words in the document; andoverwriting the association of the first voice model, by the one or more computers, with a second voice model for the first portion of words.2. The method of wherein the words in the first portion of words are narrated using the second voice model and at least some of the other words in the document are narrated using the first voice model.3. The method of claim 1 , wherein the method further comprises:associating, by the one or more computers, the first voice model with the document, prior to receiving the user-based selection of the first portion of words.4. The method of claim 1 , wherein the words in the first portion of words are narrated using the second voice model and remaining words in the document are narrated using the first voice model.5. The method of claim 1 , wherein the first voice model comprises a default voice model.6. The method of claim 1 , further comprising:applying, in response to a user-based selection of a second portion of words in the document, a second highlighting indicium to the user-selected second portion of words in the document; ...

Подробнее
13-06-2013 дата публикации

SYSTEM AND METHOD FOR SINGING SYNTHESIS CAPABLE OF REFLECTING TIMBRE CHANGES

Номер: US20130151256A1

Herein provided is a system for singing synthesis capable of reflecting not only pitch and dynamics changes but also timbre changes of a user's singing. A spectral transform surface generating section temporally concatenates all the spectral transform curves estimated by a second spectral transform curve estimating section to define a spectral transform surface. A synthesized audio signal generating section generates a transform spectral envelope at each instant of time by scaling a reference spectral envelope based on the spectral transform surface. Then, the synthesized audio signal generating section generates an audio signal of a synthesized singing voice reflecting timbre changes of an input singing voice, based on the transform spectral envelope and a fundamental frequency contained in a reference singing voice source data. 1. A system for singing synthesis capable of reflecting voice timbre changes comprising: an audio signal storing section operable to store an audio signal of an input singing voice;', 'a singing voice source database in which singing voice source data on K sorts of different singing voices, K being an integer of one or more, and singing voice source data on the same singing voice with J sorts of voice timbres, J being an integer of two or more, are accumulated;', 'a singing synthesis parameter data estimating section operable to estimate singing synthesis parameter data representing the audio signal of the input singing voice with a plurality of parameters including at least a pitch parameter and a dynamics parameter;', 'a singing synthesis parameter data storing section operable to store the singing synthesis parameter data;', 'a lyrics data storing section operable to store lyrics data corresponding to the audio signal of the input singing voice; and', 'a singing voice synthesizing section operable to output an audio signal of a synthesized singing voice, based on at least the singing voice source data on one sort of singing voice ...

Подробнее
11-07-2013 дата публикации

METHODS AND APPARATUS FOR FORMANT-BASED VOICE SYNTHESIS

Номер: US20130179167A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method. 136.-. (canceled)37. A method of processing a voice signal to extract information to facilitate training a speech synthesis model for use with a formant-based text-to-speech synthesizer , the method comprising acts of:detecting a plurality of candidate features in the voice signal;grouping different combinations of the plurality of candidate features into a plurality of candidate feature sets;forming a plurality of voice waveforms, each of the plurality of voice waveforms formed, at least in part, by processing a respective one of the plurality of candidate feature sets;performing at least one comparison between the voice signal and each of the plurality of voice waveforms;selecting at least one of the plurality of candidate feature sets based, at least in part, on the at least one comparison with the voice signal; andtraining the speech synthesis model based, at least in part, on the selected at least one of the plurality of candidate feature sets.38. The method of claim 37 , further comprising an act of converting the voice signal into a same format as the plurality of voice waveforms prior to performing the at least one comparison.39. The method of claim 37 , wherein forming the plurality of voice waveforms includes forming the plurality of voice waveforms in a same format as the ...

Подробнее
18-07-2013 дата публикации

DEVICE FOR SUPPLEMENTING VOICE AND METHOD FOR CONTROLLING THE SAME

Номер: US20130185077A1
Принадлежит: INHA-INDUSTRY PARTNERSHIP INSTITUTE

A device for supplementing a voice includes: a sensing unit sensing a bio-signal corresponding to a first vibration of vocalization and generating a first signal corresponding to the bio-signal; a vibration unit generating a second vibration using the first signal; and a power unit supplying a power to the sensing unit and the vibration unit. 1. A device for supplementing a voice , comprising:a switching unit outputting a signal;a vibration generating unit generating a vibration according to the signal; anda power unit supplying a power to the switching unit and the vibration generating unit.2. The device according to claim 1 , wherein the vibration generating unit generates the vibration using one of an electromagnetic induction between a magnet and a coil and a piezoelectric phenomenon.3. The device according to claim 1 , wherein the power unit includes a battery.4. The device according to claim 3 , wherein the power unit is recharged by one of an electromagnetic induction method claim 3 , a human body communication method claim 3 , an electromagnetic resonance method claim 3 , an electromagnetic wave method claim 3 , an ultrasonic method and a solar heat method.5. The device according to claim 1 , wherein the power unit claim 1 , the vibration generating unit and the switching unit are installed at one of an upper portion of a larynx claim 1 , a cervical portion claim 1 , a vocal cord and a subcutaneous tissue adjacent to the vocal cord.6. The device according to claim 1 , wherein the power unit claim 1 , the vibration generating unit and the switching unit wrapped by a capsule are installed at one of an upper portion of a larynx claim 1 , a cervical portion claim 1 , a vocal cord and a subcutaneous tissue adjacent to the vocal cord.7. The device according to claim 1 , wherein the vibration generating unit is installed at one of an upper portion of a larynx claim 1 , a cervical portion claim 1 , a vocal cord and a subcutaneous tissue adjacent to the vocal cord ...

Подробнее
25-07-2013 дата публикации

Speech synthesis method and apparatus for electronic system

Номер: US20130191130A1
Принадлежит: ASUSTeK Computer Inc

A speech synthesis method for an electronic system and a speech synthesis apparatus are provided. In the speech synthesis method, a speech signal file including text content is received. The speech signal file is analyzed to obtain prosodic information of the speech signal file. The text content and the corresponding prosodic information are automatically tagged to obtain a text tag file. A speech synthesis file is obtained by synthesizing a human voice profile and the text tag file.

Подробнее
25-07-2013 дата публикации

Computerized information and display apparatus

Номер: US20130191750A1
Автор: Robert F. Gazdzinski
Принадлежит: West View Research LLC

Apparatus useful for obtaining and displaying information. In one embodiment, the apparatus includes a network interface, display device, and speech recognition apparatus configured to receive user speech input and enable performance of various tasks via a remote entity, such as obtaining desired information relating to directions, sports, finance, weather, or any number of other topics. The downloaded may also, in one variant, be transmitted to a personal user device, such as via a data interface.

Подробнее
01-08-2013 дата публикации

Explicit character filtering of ambiguous text entry

Номер: US20130194191A1
Принадлежит: Nuance Communications Inc

The present invention relates to a method and apparatus for explicit filtering in ambiguous text entry. The invention provides embodiments including various explicit text entry methodologies, such as 2-key and long pressing. The invention also provides means for matching words in a database using build around methodology, stem locking methodology, word completion methodology, and n-gram searches.

Подробнее
08-08-2013 дата публикации

ELECTRONIC APPARATUS AND FUNCTION GUIDE METHOD THEREOF

Номер: US20130204623A1
Принадлежит: YAMAHA CORPORATION

In an electronic apparatus having a plurality of functions, a connecting unit connects the electronic apparatus to an external device which presents text information in a form recognizable by a visually impaired user. A function selection unit selects a function to be executed. A storage unit stores a table defining correspondence between the plurality of functions and a plurality of text files each containing text information. A text file selection unit selects a text file corresponding to the selected function with reference to the table. An acquisition unit acquires file information from the selected text file. A transmission unit transmits the acquired file information to the external device. 1. An electronic apparatus having a plurality of functions , comprising:a connecting unit that connects the electronic apparatus to an external device that has a presenting unit for presenting text information in a form desired by a user;a function selection unit that selects a function to be executed;a storage unit that stores matching information defining correspondence between the plurality of functions and a plurality of text files each containing text information;a text file selection unit that selects a text file corresponding to the function selected by the function selection unit with reference to the matching information;an acquisition unit that acquires file information from the selected text file; anda transmission unit that transmits the acquired file information to the external device connected by the connecting unit.2. The electronic apparatus according to claim 1 , wherein the function selection unit selects a function in either of a first manipulation mode or a second manipulation mode claim 1 , the electronic apparatus further comprising a control unit that executes the selected function when the function is selected by the function selection unit in the first manipulation mode and that does not execute the selected function when the function is selected by ...

Подробнее
08-08-2013 дата публикации

CONTEXTUAL CONVERSION PLATFORM FOR GENERATING PRIORITIZED REPLACEMENT TEXT FOR SPOKEN CONTENT OUTPUT

Номер: US20130204624A1
Автор: Ben-Ezri Daniel
Принадлежит:

A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform. 1. A method , comprising: receiving data related to content of a target;', 'filtering the data to locate a target term;', 'accessing one or more tables in a repository, the one or more tables comprising entries with a substitution unit corresponding to the target term, the entries arranged according to a prioritized scheme that defines a position for the substitution unit in the tables; and', 'generating an output comprising data that represents the substitution unit to be utilized by a text-to-speech generator to generate spoken content,', 'wherein the position of the substitution unit in the one or more tables is assigned based on a specificity characteristic that describes the relative inclusivity of the substitution unit as compared to other substitution units in the one or more tables., 'at a computer comprising a computer program to implement processing operations2. The method of claim 1 , further comprising:breaking the content into at least one contextual unit that includes the target term; andinserting the substitution unit in the contextual unit in place of the target term.3. The method of claim 1 , further comprising:identifying a context cue in the data, the context cue identifying characteristics of the target; andselecting a table from the one or more tables in which the substitution unit is compatible with the characteristics of target.4. The method of claim 1 , further comprising: ...

Подробнее
15-08-2013 дата публикации

SYSTEM AND METHOD FOR MAKING AN ELECTRONIC HANDHELD DEVICE MORE ACCESSIBLE TO A DISABLED PERSON

Номер: US20130211837A1
Принадлежит: RESEARCH IN MOTION LIMITED

An electronic handheld device is described having an options module for providing a user with at least one option in the handheld device, each option associated with an enabling mode of operation of the handheld device. The device also includes an enabling module for implementing, in response to a particular option being selected by a user, an associated enabling mode of operation. Each enabling mode of operation makes the handheld device more accessible to a person having a corresponding disability. 1. A method of operating an electronic handheld device , comprising:displaying text on the display of the device; andwhen an input associated with the text, is received at the device, producing speech output associated with the text.2. The method of claim 1 , further comprising operating the device in an enabling mode.3. The method of claim 1 , wherein the input associated with the text is a rolling input over the text on the display.4. The method of claim 1 , wherein the input associated with the text is a keystroke.5. The method of claim 1 , wherein the input associated with the text is a menu selection.6. The method of claim 1 , wherein the speech output associated with the text corresponds to the text.7. The method of claim 1 , wherein the speech output associated with the text includes information or options associated with the text.8. An method of operating an electronic handheld device claim 1 , comprising:displaying a graphic on the display; andwhen an input associated with the graphic, is received at the device, produce speech output associated with the graphic.9. The method of claim 8 , further comprising operating the device in an enabling mode.10. The method of claim 8 , wherein the input associated with the graphic is a rolling input over the graphic on the display.11. The method of claim 8 , wherein the input associated with the graphic is a keystroke.12. The method of claim 8 , wherein the input associated with the graphic is a menu selection.13. The ...

Подробнее
15-08-2013 дата публикации

Apparatus and method for emotional voice synthesis

Номер: US20130211838A1
Принадлежит: ACRIIL Inc

The present disclosure provides an emotional voice synthesis apparatus and an emotional voice synthesis method. The emotional voice synthesis apparatus includes a word dictionary storage unit for storing emotional words in an emotional word dictionary after classifying the emotional words into items each containing at least one of an emotion class, similarity, positive or negative valence, and sentiment strength; voice DB storage unit for storing voices in a database after classifying the voices according to at least one of emotion class, similarity, positive or negative valence and sentiment strength in correspondence to the emotional words; emotion reasoning unit for inferring an emotion matched with the emotional word dictionary with respect to at least one of each word, phrase, and sentence of document including text and e-book; and voice output unit for selecting and outputting a voice corresponding to the document from the database according to the inferred emotion.

Подробнее
15-08-2013 дата публикации

Feature sequence generating device, feature sequence generating method, and feature sequence generating program

Номер: US20130211839A1
Автор: Masanori Kato
Принадлежит: NEC Corp

Spread level parameter correcting means 501 receives a contour parameter as information representing the contour of a feature sequence (a sequence of features of a signal considered as the object of generation) and a spread level parameter as information representing the level of a spread of the distribution of the features in the feature sequence. The spread level parameter correcting means 501 corrects the spread level parameter based on a variation of the contour parameter represented by a sequence of the contour parameters. Feature sequence generating means 502 generates the feature sequence based on the contour parameters and the corrected spread level parameters.

Подробнее
15-08-2013 дата публикации

METHOD AND DEVICE FOR PROCESSING VOCAL MESSAGES

Номер: US20130211845A1
Автор: IMPARATO Ciro
Принадлежит: LA VOCE.NET DI CIRO IMPARATO

A method for automatically generating at least one voice message with the desired voice expression, starting from a prestored voice message, including assigning a vocal category to one word or to groups of words of the prestored message, computing, based on a vocal category/vocal parameter correlation table, a predetermined level of each one of the vocal parameters, emitting said voice message, with the vocal parameter levels computed for each word or group of words. 1. Method for automatically generating at least one voice message with the desired vocal expression , starting from a prestored voice message , comprising the steps of:assigning a vocal category to one word or to groups of words of the prestored message,computing, based on a vocal category/vocal parameter correlation table, a predetermined level of each one of the vocal parameters,omitting said voice message, with the vocal parameter levels computed for each word or group of words.2. The method according to claim 1 , wherein such vocal categories are chosen among friendship claim 1 , trust claim 1 , confidence claim 1 , passion claim 1 , apathy and anger.3. The method according to claim 1 , wherein such vocal parameters are chosen among volume claim 1 , tone claim 1 , time claim 1 , rhythm.4. Method for automatically decoding a message being listened to claim 1 , in order to detect its vocal expression and the emotion of the person who recorded the voice message claim 1 , comprising the steps of:assigning a level of each one of the vocal parameters to each word or group of words of the message being listened to,extracting, based on a vocal category/vocal parameter correlation table, the vocal categories of such words or groups of words starting from such vocal parameters assigned in the preceding step,determining the vocal expression of said voice message, based on the analysis of such extracted vocal categories.5. The method according to claim 4 , wherein such vocal categories are chosen among ...

Подробнее
22-08-2013 дата публикации

APPARATUS FOR TEXT-TO-SPEECH DELIVERY AND METHOD THEREFOR

Номер: US20130218567A1
Принадлежит: TomTom International B.V.

A method and apparatus for determining the manner in which a processor-enabled device should produce sounds from data is described. The device ideally synthesizes sounds digitally, and reproduces pre-recorded sounds, together with an audible delivery thereof, a memory in which is stored a database of a plurality data at least some of which is in the form of text-based indicators, and one or more pre-recorded sounds device is further capable of repeatedly determining one or more physical conditions, e.g. current GPS location, which is compared with one or more reference values provided in memory such that a positive result of the comparison gives rise to an event requiring a sound to be produced by the device. 116-. (canceled)18. The navigation device of claim 20 , wherein each of the one or more events is associated with an event type.19. The navigation device of claim 21 , wherein the event type is one of: a Route instruction claim 21 , a Point of Interest (POI) claim 21 , Toll charge claim 21 , Traffic condition claim 21 , Short Message Service (SMS) claim 21 , Flash message claim 21 , Weather claim 21 , Tutorial claim 21 , Warning claim 21 , Tip and Signpost notification.20. The navigation device of claim 21 , wherein the audible notification for each of the one or more events is based on the event type associated with each of the one or more events.21. The navigation device of claim 21 , wherein the event type is associated with a priority claim 21 , and wherein the processor is further operable to provide the audible notification of the one or more events by order of the priority associated with each of the one or more events.22. The navigation device of claim 21 , wherein the processor is further operable to provide an audible notification for the one or more events based on input received from a displayable set of event types which can be selected or de-selected claim 21 , depending on user preference.23. The navigation device of further comprising a Global ...

Подробнее
22-08-2013 дата публикации

SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND COMPUTER PROGRAM PRODUCT

Номер: US20130218568A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to an embodiment, a speech synthesis device includes a first storage, a second storage, a first generator, a second generator, a third generator, and a fourth generator. The first storage is configured to store therein first information obtained from a target uttered voice. The second storage is configured to store therein second information obtained from an arbitrary uttered voice. The first generator is configured to generate third information by converting the second information so as to be close to a target voice quality or prosody. The second generator is configured to generate an information set including the first information and the third information. The third generator is configured to generate fourth information used to generate a synthesized speech, based on the information set. The fourth generator configured to generate the synthesized speech corresponding to input text using the fourth information. 1. A speech synthesis device comprising:a first storage configured to store therein first information obtained from a target uttered voice;a second storage configured to store therein second information obtained from an arbitrary uttered voice;a first generator configured to generate third information by converting the second information so as to be close to a target voice quality or prosody;a second generator configured to generate an information set including the first information and the third information;a third generator configured to generate fourth information used to generate a synthesized speech, based on the information set; anda fourth generator configured to generate the synthesized speech corresponding to input text using the fourth information.2. The device according to claim 1 ,wherein the first information and the second information are stored together with attribute information thereof, andthe second generator generates the information set by adding the first information and the entire or a portion of the third information, the ...

Подробнее
22-08-2013 дата публикации

TEXT-TO-SPEECH USER'S VOICE COOPERATIVE SERVER FOR INSTANT MESSAGING CLIENTS

Номер: US20130218569A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author. 118-. (canceled)19. A method comprising:analyzing text within a body of a first user's text instant message to determine text-to-speech synthesis control parameters that are to be used to produce a synthesized audible representation of the text within the body of the text instant message;extracting, from text-to-speech synthesis control parameters that are associated with the first user and comprise one or more voice synthesis control parameters which determine distinctive intelligible characteristics representative of the first user, a subset of the text-to-speech synthesis control parameters associated with the first user, the subset corresponding to the text-to-speech synthesis control parameters determined during the analyzing as those that are to be used to produce the synthesized audible representation of the text within the body of the text instant message;sending the text instant message along with the subset of text-to-speech synthesis control parameters to a second user's device, the subset of text-to-speech synthesis control parameters being attached to the ...

Подробнее
29-08-2013 дата публикации

Methods employing phase state analysis for use in speech synthesis and recognition

Номер: US20130226569A1
Принадлежит: Lessac Tech Inc

A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.

Подробнее
29-08-2013 дата публикации

SPEECH SYNTHESIS APPARATUS AND METHOD

Номер: US20130226584A1
Автор: Kagoshima Takehiko
Принадлежит: KABUSHIKI KAISHA TOSHIBA

A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of an input phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by concatenating the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer. 120-. (canceled)21. An apparatus for synthesizing a speech of an input phoneme sequence , comprising:a selector configured to select a plurality of speech units for synthesizing the speech of the input phoneme sequence by referring to the speech unit information stored in an information memory;an acquisition unit configured to acquire a speech unit waveform corresponding to each speech unit of the plurality of speech units from a plurality of speech unit waveforms stored in a waveform memory; anda concatenation unit configured to synthesize the speech by concatenating the speech unit waveform acquired by the acquisition unit;wherein the acquisition unit copies at least two speech unit waveforms from a continuous region of the waveform memory onto a buffer by one access, a data quantity of the speech unit waveforms being less than or equal to a size of the buffer, the speech unit waveforms corresponding to at least two speech units included in the plurality of speech units.22. The apparatus according to claim 21 , further comprising:a text input unit by which text to be converted to the input phoneme sequence is input.23. The apparatus according to claim 21 , further comprising:at least one of a speaker and a head phone by which the speech is output. ...

Подробнее
05-09-2013 дата публикации

Automatic Sound Level Control

Номер: US20130231921A1
Принадлежит: AT&T Intellectual Property I, L.P.

A method includes identifying, at a computing device, a plurality of words in data. Each of the plurality of words corresponds to a particular word of a written language. The method includes determining a sound output level based on a location of the computing device. The method includes generating sound data based on the sound output level and the plurality of words identified in the data. 1. A method comprising:identifying, at a computing device, a plurality of words in data, wherein each of the plurality of words corresponds to a particular word of a written language;determining a sound output level based on at least in part on a location of the computing device; andgenerating sound data based on the sound output level and the plurality of words identified in the data.2. The method of claim 1 , further comprising determining a noise level external to the computing device claim 1 , wherein the sound output level is based on the noise level external to the computing device.3. The method of claim 2 , wherein determining the noise level external to the computing device includes receiving sound data from one or more sound input devices of the computing device.4. The method of claim 1 , wherein the data includes image data claim 1 , and wherein at least one of the plurality of words is identified in the image data.5. The method of claim 4 , wherein the at least one of the plurality of words is identified in the image data using optical character recognition.6. The method of claim 1 , wherein the data is accessed from a data file.7. The method of claim 6 , wherein the data file is in a portable document format.8. The method of claim 1 , further comprising outputting one or more sounds from the computing device based on the sound data.9. The method of claim 1 , further comprising accessing sound configuration data from a memory of the computing device claim 1 , the sound configuration data including sound data corresponding to one or more locations.10. The method of ...

Подробнее
05-09-2013 дата публикации

METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS

Номер: US20130231935A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings. 16-. (canceled)7. A method for use with a speech-enabled application , the method comprising:receiving, from the speech-enabled application, input comprising a plurality of text strings;identifying a first portion of a first text string of the plurality of text strings as differing from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string as not differing from a corresponding second portion of the second text string;assigning contrastive stress to the identified first portion of the first text string, but not to the identified second portion of the first text string;generating, using at least one computer system, speech synthesis output to render the plurality of text strings as speech having the assigned contrastive stress; andproviding the speech synthesis output for the speech-enabled application.8. The method of claim 7 , wherein the identifying comprises identifying the first portion of the first text string ...

Подробнее
19-09-2013 дата публикации

System and Method of Providing a Spoken Dialog Interface to a Website

Номер: US20130246069A1
Принадлежит: AT&T Intellectual Property II LP

Disclosed is a method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes selecting anchor texts within a website based on a term density, weighting those anchor texts based on a percent of salient words to total words, and incorporating the weighted anchor texts into a live spoken dialog interface, the weights determining a level of incorporation into the live spoken dialog interface.

Подробнее
26-09-2013 дата публикации

Method and Apparatus for Converting Text Information

Номер: US20130251121A1
Автор: Jiong Chen, Xiaohai Zhang
Принадлежит: Huawei Device Co Ltd

The present invention provides a method and an apparatus for converting text information. The method includes: receiving, by a first terminal, a call or data from a second terminal; obtaining, by the first terminal, according to a mapping relationship between identification information of the second terminal and voice characteristic parameters of an user of the second terminal, the voice characteristic parameters of the user of the second terminal corresponding to the identification information of the second terminal when the first terminal is in a working mode of text-to-voice conversion; and converting, by the first terminal, related text information about the call or data to audio information with the voice characteristic parameters of the user of the second terminal.

Подробнее
26-09-2013 дата публикации

SPEECH DIALOGUE SYSTEM, TERMINAL APPARATUS, AND DATA CENTER APPARATUS

Номер: US20130253926A1
Автор: Takahashi Jun
Принадлежит: FUJITSU LIMITED

A speech dialogue system includes a data center apparatus and a terminal apparatus. The data center apparatus acquires answer information for request information obtained in a speech recognition process for speech data from a terminal apparatus, creates a scenario including the answer information, creates first synthesized speech data concerning the answer information, transmits the first synthesized speech data to the terminal apparatus, and transmits the scenario to the terminal apparatus while the first synthesized speech data is being created in the creating the first synthesized speech data. The terminal apparatus creates second synthesized speech data concerning the answer information in the received scenario, receives the first synthesized speech data, selects one of the first synthesized speech data and the second synthesized speech data based on a determination result regarding whether the reception of the first synthesized speech data is completed, and reproduces speech. 1. A speech dialogue system comprising: receives speech data of speech sound transmitted from a terminal apparatus,', 'applies a speech recognition process to the speech data to acquire request information expressed by the speech data,', 'acquires answer information for the request information from an information source,', 'creates a scenario including the answer information,', 'creates first synthesized speech data expressing synthesized speech that generates sound of the answer information,', 'transmits the first synthesized speech data created in the creating the first synthesized speech data to the terminal apparatus, and', 'transmits the scenario to the terminal apparatus while the first synthesized speech data is being created in the creating the first synthesized speech data; and, 'a data center apparatus that'} acquires input of the speech sound to convert the speech sound to speech data expressing the speech sound,', 'transmits the speech data of the speech sound to the data ...

Подробнее
26-09-2013 дата публикации

Voice Control For Asynchronous Notifications

Номер: US20130253928A1
Принадлежит: GOOGLE INC.

A computing device may receive an incoming communication and, in response, generate a notification that indicates that the incoming communication can be accessed using a particular application on the communication device. The computing device may further provide an audio signal indicative of the notification and automatically activate a listening mode. The computing device may receive a voice input during the listening mode, and an input text may be obtained based on speech recognition performed upon the voice input. A command may be detected in the input text. In response to the command, the computing device may generate an output text that is based on at least the notification and provide a voice output that is generated from the output text via speech synthesis. The voice output identifies at least the particular application. 120-. (canceled)21. A method comprising:providing, by a computing device, an audio signal indicative of at least one notification in a plurality notifications and automatically activating a listening mode on the computing device, wherein the plurality of notifications relate to a plurality of incoming communications and a plurality of different applications on the computing device, such that each notification in the plurality of notifications indicates that a respective incoming communication in the plurality of incoming communications can be accessed using a respective application in the plurality of different applications;receiving a voice input during the listening mode;obtaining an input text based on speech recognition performed upon the voice input;detecting a command in the input text;generating an output text in response to detecting the command in the input text, wherein the output text is based on the plurality of notifications; andproviding, by the computing device, a voice output that identifies each of the different applications, wherein the voice output is generated from the output text via speech synthesis.22. The method of ...

Подробнее
26-09-2013 дата публикации

SOCIAL BROADCASTING USER EXPERIENCE

Номер: US20130253934A1
Принадлежит: JELLI, INC.

A method of providing user participation in a social broadcast environment is disclosed. A network communication is received from a user of a broadcast that includes a preference data indicating a preference of the user that a promoted content be included in the broadcast. Via a responsive network communication, a feedback data is provided to the user that includes a predicted future time at which the promoted content may be included in the broadcast. 121.-. (canceled)22. A method of promoting user participation in a radio broadcast , comprising: 1) designating a track currently being played on air during a radio broadcast; and', '2) causing the track currently being played to be taken off the radio broadcast via a power-up used by the user and applied to the track currently being played;, 'determining based on an attribution criterion that a user of the radio broadcast is to receive credit at least in part forusing a processor to process a user profile data associated with the user to generate an audio signature including an audio identifier of the user, wherein the audio identifier is speech synthesized from the user profile data; andincluding the audio signature in the radio broadcast after the track is taken off the radio broadcast.23. A method as recited in claim 22 , wherein the power-up is a bomb.24. A method as recited in claim 22 , wherein the power-up is a virus.25. A method as recited in claim 22 , wherein the user designates the track via one or more of: a form on a social broadcasting portal claim 22 , a widget on a third party site claim 22 , a form on a third party site claim 22 , and a mobile gateway.26. A method as recited in claim 22 , wherein the user is a host of a group of plurality of users of the radio broadcast.27. A method as recited in claim 26 , wherein the power-up includes a proxy from another member of the group.28. A method as recited in claim 27 , wherein the user profile data includes an affiliation of the group.29. A method as ...

Подробнее
03-10-2013 дата публикации

Voice analysis apparatus, voice synthesis apparatus, voice analysis synthesis system

Номер: US20130262098A1

A speech analysis apparatus is provided. An F0 extraction part extracts a pitch value from speech information. A spectrum extraction part extracts spectrum information from the speech information. An MVF extraction part extract a maximum voiced frequency and allows boundary information for respectively filtering a harmonic component and a non-harmonic component to be obtained. According to the speech analysis apparatus, speech synthesis apparatus, and speech analysis synthesis system of the present invention, speech that is closer to the original voice and is more natural may be synthesized. Also, speech may be represented with less data capacity.

Подробнее
03-10-2013 дата публикации

Text to speech method and system

Номер: US20130262109A1
Принадлежит: Toshiba Corp

A text-to-speech method for simulating a plurality of different voice characteristics includes dividing inputted text into a sequence of acoustic units; selecting voice characteristics for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model having a plurality of model parameters provided in clusters each having at least one sub-cluster and describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio with the selected voice characteristics. A parameter of a predetermined type of each probability distribution is expressed as a weighted sum of parameters of the same type using voice characteristic dependent weighting. In converting the sequence of acoustic units to a sequence of speech vectors, the voice characteristic dependent weights for the selected voice characteristics are retrieved for each cluster such that there is one weight per sub-cluster.

Подробнее
03-10-2013 дата публикации

PLAYBACK CONTROL APPARATUS, PLAYBACK CONTROL METHOD, AND PROGRAM

Номер: US20130262118A1
Принадлежит: SONY CORPORATION

A playback control apparatus includes a playback controller configured to control playback of first content and second content. The first content is to output first sound which is generated based on text information using speech synthesis processing. The second content is to output second sound which is generated not using the speech synthesis processing. The playback controller causes an attribute of content to be played back to be displayed on the screen, the attribute indicating whether or not the content is to output sound which is generated based on text information using speech synthesis processing. 1. A playback control apparatus comprising:a playback controller configured to control playback of first content and second content, the first content is to output first sound which is generated based on text information using speech synthesis processing, the second content is to output second sound which is generated not using the speech synthesis processing,wherein the playback controller causes an attribute of content to be played back to be displayed on the screen, the attribute indicating whether or not the content is to output sound which is generated based on text information using speech synthesis processing.2. The playback control apparatus according to claim 1 , wherein the playback controller further causes a display portion claim 1 , associated with sound output at that time claim 1 , to be displayed in a highlighted state.3. The playback control apparatus according to claim 1 , wherein the playback controller further changes a speaker or background music claim 1 , which is in part of the sound claim 1 , in accordance with content of the text information used in generating sound.4. The playback control apparatus according to claim 1 , wherein a text-to-speech function for generating sound based on the text information using the speech synthesis processing is configured to be turned on or off claim 1 , andthe playback controller causes the first content ...

Подробнее
03-10-2013 дата публикации

TEXT TO SPEECH SYSTEM

Номер: US20130262119A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

A text-to-speech method configured to output speech having a selected speaker voice and a selected speaker attribute, including: inputting text; dividing the inputted text into a sequence of acoustic units; selecting a speaker for the inputted text; selecting a speaker attribute for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model; and outputting the sequence of speech vectors as audio with the selected speaker voice and a selected speaker attribute. The acoustic model includes a first set of parameters relating to speaker voice and a second set of parameters relating to speaker attributes, which parameters do not overlap. The selecting a speaker voice includes selecting parameters from the first set of parameters and the selecting the speaker attribute includes selecting the parameters from the second set of parameters. 1. A text-to-speech method configured to output speech having a selected speaker voice and a selected speaker attribute ,said method comprising:inputting text;dividing said inputted text into a sequence of acoustic units;selecting a speaker for the inputted text;selecting a speaker attribute for the inputted text;converting said sequence of acoustic units to a sequence of speech vectors using an acoustic model; andoutputting said sequence of speech vectors as audio with said selected speaker voice and a selected speaker attribute,wherein said acoustic model comprises a first set of parameters relating to speaker voice and a second set of parameters relating to speaker attributes, wherein the first and second set of parameters do not overlap, and wherein selecting a speaker voice comprises selecting parameters from the first set of parameters which give the speaker voice and selecting the speaker attribute comprises selecting the parameters from the second set which give the selected speaker attribute.2. A method according to claim 1 , wherein there are a plurality of sets of ...

Подробнее
03-10-2013 дата публикации

SPEECH SYNTHESIS DEVICE AND SPEECH SYNTHESIS METHOD

Номер: US20130262120A1
Принадлежит: Panasonic Corporation

A speech synthesis device includes: a mouth-opening-degree generation unit which generates, for each of phonemes generated from input text, a mouth-opening-degree corresponding to oral-cavity volume, using information generated from the text and indicating the type and position of the phoneme within the text, such that the generated mouth-opening-degree is larger for a phoneme at the beginning of a sentence in the text than for a phoneme at the end of the sentence; a segment selection unit which selects, for each of the generated phonemes, segment information corresponding to the phoneme from among pieces of segment information stored in a segment storage unit and including phoneme type, mouth-opening-degree, and speech segment data, based on the type of the phoneme and the generated mouth-opening-degree; and a synthesis unit which generates synthetic speech of the text, using the selected pieces of segment information and pieces of prosody information generated from the text. 1. A speech synthesis device that generates synthetic speech of text that has been input , the speech synthesis device comprising:a prosody generation unit configured to generate, for each of phonemes generated from the text, a piece of prosody information by using the text;a mouth opening degree generation unit configured to generate, for each of the phonemes generated from the text, a mouth opening degree corresponding to an oral cavity volume, using information generated from the text and indicating a type of the phoneme and a position of the phoneme within the text, the mouth opening degree to be generated being larger for a phoneme positioned at a beginning of a sentence in the text than for a phoneme positioned at an end of the sentence;a segment storage unit in which pieces of segment information are stored, each of the pieces of segment information including a phoneme type, information on a mouth opening degree, and speech segment data;a segment selection unit configured to select, for ...

Подробнее
03-10-2013 дата публикации

SOUND SYNTHESIZING APPARATUS

Номер: US20130262121A1
Принадлежит: YAMAHA CORPORATION

A sound synthesizing apparatus includes a processor coupled to a memory. The processor configured to execute computer-executable units comprising: an information acquirer adapted to acquire synthesis information which specifies a duration and an utterance content for each unit sound; a prolongation setter adapted to set whether prolongation is permitted or inhibited for each of a plurality of phonemes corresponding to the utterance content of the each unit sound; and a sound synthesizer adapted to generate a synthesized sound corresponding to the synthesis information by connecting a plurality of sound fragments corresponding to the utterance content of the each unit sound. The sound synthesizer prolongs a sound fragment corresponding to the phoneme the prolongation of which is permitted in accordance with the duration of the unit sound. 1. A sound synthesizing method comprising:acquiring synthesis information which specifies a duration and an utterance content for each unit sound;setting whether prolongation is permitted or inhibited for each of a plurality of phonemes corresponding to the utterance content of the each unit sound; andgenerating a synthesized sound corresponding to the synthesis information by connecting a plurality of sound fragments corresponding to the utterance content of the each unit sound,wherein in the generating process, a sound fragment corresponding to the phoneme the prolongation of which is permitted, among a plurality of phonemes corresponding to the utterance content of the each unit sound, is prolonged in accordance with the duration of the unit sound.2. The sound synthesizing method according to claim 1 , wherein in the setting process claim 1 , whether the prolongation of each of the phonemes is permitted or inhibited is set in response to an instruction from a user.3. The sound synthesizing method according to claim 2 , further comprising:displaying a set image which provides a plurality of phonemes corresponding to the utterance ...

Подробнее
10-10-2013 дата публикации

SPEECH SYNTHESIS SYSTEM, SPEECH SYNTHESIS PROGRAM PRODUCT, AND SPEECH SYNTHESIS METHOD

Номер: US20130268275A1
Принадлежит:

Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values. 1. At least one computer-readable storage device encoded with a speech synthesis program which causes a system for synthesizing speech from text to perform:determining a first speech segment sequence corresponding to an input text, by selecting speech segments from a speech segment database according to a first cost calculated based at least in part on a statistical model of prosody variations;determining prosody modification values for the first speech segment sequence, after the first speech segment sequence is selected, by using a second cost calculated based at least in part on the statistical model of prosody variations, wherein the first cost is different from the second cost; andapplying the determined prosody modification values to the first speech segment sequence to produce a second speech segment sequence whose prosodic characteristics are different from prosodic characteristics of the first speech segment sequence.2. The at least one computer readable storage device of claim 1 , wherein the first cost for determining the first speech segment sequence includes a spectrum continuity cost claim 1 , a duration error cost claim 1 , a ...

Подробнее
17-10-2013 дата публикации

WARNING SYSTEM WITH SYNTHESIZED VOICE DIAGNOSTIC ANNOUNCEMENT CAPABILITY FOR FIELD DEVICES

Номер: US20130275137A1
Автор: Balhareth Hamad S.
Принадлежит: Saudi Arabian Oil Company

Field devices, including sensors and final elements, are provided with a speech synthesizer and optionally a speech control chip, to sound audible voice maintenance and fault alarms to alert field personnel and, optionally, a voice message upon manual activation of a pushbutton or other switch directing them how to perform the maintenance task or clear the fault. 1. An instrumentation and control field device comprising:a processor;a memory coupled to the processor;a speech synthesizer to produce sounds, including allophones;a speaker;an amplifier;at least one pushbutton;an alarm routine stored in the memory and adapted to be executed by the processor upon a device alert, causing an audible alarm to be sounded by the amplifier and speaker to attract the attention of personnel in the vicinity of the field device;a status routine stored in the memory and adapted to be executed in the processor upon manual actuation of a switch that causes a text status to be received by the speech synthesizer, the speech synthesizer converting the text status to a spoken status and forwarding the spoken status to be sounded by the amplifier and speaker; andan optional instructional routine stored in the memory and adapted to be executed in the processor following enunciation of the text status by the speech synthesizer and transmitting a text instruction to the speech synthesizer which converts the text instruction to a spoken instruction and forwards the spoken instruction to be sounded by the amplifier and speaker.2. The instrumentation and control field device of in which the instruction is transmitted upon the manual actuation of a switch.3. The instrumentation and control field device of claim 1 , wherein the audible alarm is a prerecorded sound file.4. The instrumentation and control field device of claim 2 , wherein the audible alarm is selected from a plurality of sounds and the sound is selected based upon the nature of the device alert.5. The instrumentation and control ...

Подробнее
17-10-2013 дата публикации

Hands-Free List-Reading by Intelligent Automated Assistant

Номер: US20130275138A1
Принадлежит:

Systems and methods for providing hands-free reading of content comprising: identifying a plurality of data items for presentation to a user, the plurality of data items associated with a domain-specific item type and sorted according to a particular order; based on the domain-specific item type, generating a speech-based overview of the plurality of data items; for each of the plurality of data items, generating a respective speech-based, item-specific paraphrase for the data item based on respective content of the data item; and providing, to a user through the speech-enabled dialogue interface, the speech-based overview, followed by the respective speech-based, item-specific paraphrases for at least a subset of the plurality of data items in the particular order. 1. A method for providing information through a speech-enabled dialogue interface , comprising:identifying a plurality of data items for presentation to a user, the plurality of data items associated with a domain-specific item type and sorted according to a particular order;based on the domain-specific item type, generating a speech-based overview of the plurality of data items;for each of the plurality of data items, generating a respective speech-based, item-specific paraphrase for the data item based on respective content of the data item; andproviding, to a user through the speech-enabled dialogue interface, the speech-based overview, followed by the respective speech-based, item-specific paraphrases for at least a subset of the plurality of data items in the particular order.2. The method of claim 1 , further comprising:while providing the respective speech-based, item-specific paraphrases, inserting a pause between each pair of adjacent speech-based, item-specific paraphrases; andentering a listening mode to capture user input during the pause.3. The method of claim 1 , further comprising:while providing the respective speech-based, item-specific paraphrases in a sequential order, advancing a ...

Подробнее
24-10-2013 дата публикации

Vehicle-Based Message Control Using Cellular IP

Номер: US20130282375A1
Принадлежит:

Architecture for playing back personal text-based messages such as email and voicemail over a vehicle-based media system. The user can use a cell phone that registers over a cellular network to an IMS (IP multimedia subsystem) to obtain an associated IP address. The personal messages are then converted into audio signals using a remote text-to-voice (TTV) converter and transmitted to the phone based on the IP address. The phone then transmits the audio signals to the vehicle media system for playback using an unlicensed wireless technology (e.g., Bluetooth, Wi-Fi, etc.). Other alternative embodiments include transmitting converted message directly to the media system, via a satellite channel, converting the messages via a TTV converter on the cell phone, and streaming the converted messages to the phone and/or the media system for playback. 1. A method , comprising:receiving, by a text-to-audio converter component, from an internet protocol multimedia system being distinct from and in communication with the text-to-audio converter component, a registration request relating to a mobile communication device, the registration request being received following an association, by the internet protocol multimedia system, of an internet protocol address with the mobile communication device;associating the internet protocol address with the text-to-audio converter component;receiving, by the text-to-audio converter component, a text-based communication from the internet protocol multimedia system; andconverting, by the text-to-audio converter component, the text-based communication to an audio message.2. The method of claim 1 , wherein the text-to-audio converter component is included in the mobile communication device.3. The method of claim 2 , further comprising transmitting the audio message to a vehicle media system.4. The method of claim 1 , wherein the text-to-audio converter component is remote to the mobile communication device.5. The method of claim 4 , further ...

Подробнее
24-10-2013 дата публикации

FILE FORMAT, SERVER, VIEWER DEVICE FOR DIGITAL COMIC, DIGITAL COMIC GENERATION DEVICE

Номер: US20130282376A1
Автор: NONAKA Shunichiro
Принадлежит:

A viewer device for a digital comic comprising: an information acquisition unit that acquires a digital comic in a file format for a digital comic viewed on a viewer device, the file format including speech balloon information including information of a speech balloon region that indicates a region of a speech balloon, first text information indicating a dialogue within each speech balloon, the first text information being correlated with each speech balloon, and first display control information including positional information and a transition order of a anchor point so as to enable the image of the entire page to be viewed on a monitor of the viewer device in a scroll view; and a voice reproduction section that synthesizes a voice for reading the letter corresponding to the text information based on an attribute of the character, an attribute of the speech balloon or the dialogue, and outputs the voice. 1. A viewer device for a digital comic comprising:an information acquisition unit that acquires a digital comic in a file format for a digital comic viewed on a viewer device, the file format includinga high-definition image of an entire page for each page of a comic,speech balloon information including information of a speech balloon region that indicates a region of a speech balloon in which a dialogue of a character of the comic is placed within the image,first text information indicating a dialogue within each speech balloon, the first text information being correlated with each speech balloon, andfirst display control information including positional information and a transition order of a predetermined anchor point so as to enable the image of the entire page to be viewed on a monitor of the viewer device in a scroll view;a display unit;an image display control unit that scroll-reproduces or panel-reproduces the image of each page on a screen of the display unit based on the display control information of the acquired digital comic;a letter display control ...

Подробнее
24-10-2013 дата публикации

COMMUNICATION DEVICE TRANSFORMING TEXT MESSAGE INTO SPEECH

Номер: US20130282377A1
Принадлежит:

The application discloses a communication device and method of processing a text message in the communication device. An aspect of the present application is a method of processing text message in a communication device, the method including receiving a text message from an external sender, receiving a request to transform the text message into voice data, transforming the received text message into voice data according to the request, and transmitting the voice data to an external sound reproduction device through a wireless communication module. 1. A method of processing a text message in a communication device , the method comprising:storing voice information corresponding to a counterpart during a telephone conversation with the counterpart if the voice information of the counterpart has not been stored;receiving text messages from the counterpart;determining whether to transform the received text messages into voice data according to control information, wherein the control information corresponds to a condition for transforming the text messages into the voice data;selectively transforming a text message among the text messages into the voice data using the stored voice information of the counterpart according to a result of the determining step; andtransmitting the transformed voice data to a sound reproduction device.2. The method of claim 1 , wherein the counterpart is a person whose telephone number has been stored.3. The method of claim 1 , further comprising:recording the voice of the counterpart during the telephone conversation with the counterpart.4. The method of claim 3 , further comprising:sampling the recorded voice of the counterpart to be stored as the voice information of the counterpart.5. A communication device comprising:a storage configured to store voice information;a receiver configured to receive text messages from a counterpart; anda controller configured to:store the voice information in the storage which correspond to the counterpart ...

Подробнее
31-10-2013 дата публикации

Realistic Speech Synthesis System

Номер: US20130289998A1
Принадлежит: SRC Inc

A system and method for realistic speech synthesis which converts text into synthetic human speech with qualities appropriate to the context such as the language and dialect of the speaker, as well as expanding a speaker's phonetic inventory to produce more natural sounding speech.

Подробнее
14-11-2013 дата публикации

SPEECH GENERATION DEVICE WITH A HEAD MOUNTED DISPLAY UNIT

Номер: US20130300636A1
Принадлежит: DYNAVOX SYSTEMS LLC

A speech generation device is disclosed. The speech generation device may include a head mounted display unit having a variety of different components that enhance the functionality of the speech generation device. The speech generation device may further include computer-readable medium that, when executed by a processor, instruct the speech generation device to perform desired functions. 1. A speech generation device , comprising:a head mounted display unit configured as an item to be worn on a user's head, the head mounted display unit including a display device for displaying one or more images within a field of view of the user;a speaker for generating audio outputs; anda processing unit communicatively coupled to the head mounted display unit and the speaker, the processing unit including a processor and a related computer-readable medium storing instructions executable by the processor,wherein the instructions stored on the computer-readable medium configure the speech generation device to generate text-to-speech output.2. (canceled)3. The speech generation device of claim 1 , wherein the head mounted display unit comprises a frame and one or more lenses secured within a portion of the frame claim 1 , the lenses being configured to be disposed within the field of view of the user.4. The speech generation device of claim 3 , wherein the display device is configured to project the one or more images onto the one or more lenses.5. (canceled)6. (canceled)7. (canceled)8. (canceled)9. The speech generation device of claim 1 , wherein the head mounted display unit comprises a frame claim 1 , the display device being secured to a portion of the frame at a location within the field of view of the user.10. The speech generation device of claim 9 , wherein the display device is configured as a lens of the head mounted display unit.11. (canceled)12. (canceled)13. (canceled)14. The speech generation device of claim 9 , wherein the display device comprises a non- ...

Подробнее
14-11-2013 дата публикации

METHOD AND SYSTEM FOR OPERATING COMMUNICATION SERVICE

Номер: US20130304457A1
Принадлежит: SAMSUNG ELECTRONICS CO. LTD.

An operation method capable of adaptively operating at least one of a Speech To Text (STT) service and a Text To Speech (TTS) service according to setting or user operation and a system thereof are provided. The method includes requesting a specific type of a communication service connection to a reception side terminal by a transmission side terminal, and performing an operation of at least one of a speech to text service providing speech recognition based text and a text to speech service converting the text into speech data between the reception side terminal and the transmission side terminal, and includes one of recognizing speech data provided from the transmission side terminal and converting the speech data into a text based on a first speech process supporting device connected to the transmission side terminal. 1. A system for operating a communication service , the system comprising:a transmission side terminal for requesting connection of the communication service, and for converting reception data or user input data according to at least one of preset input/output modes and input/output modes determined by a user after the communication service is connected;a reception side terminal for receiving a communication service connection acceptance request according to the request the connection of the communication service, for starting the communication service using the at least one of the preset input/output modes and the input/output modes determined by the user when the communication service connection acceptance request is accepted, for converting the reception data and the user input data or receiving converted data by transmitting the user input data to a speech process supporting device for converting the user input data according a type of the reception data and the input/output modes, for displaying conversion data corresponding to the reception data, and for transmitting conversion data corresponding to the user input data; anda speech process ...

Подробнее
14-11-2013 дата публикации

SYSTEM AND METHOD FOR AUDIBLY PRESENTING SELECTED TEXT

Номер: US20130304474A1
Принадлежит:

Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user. 1. A method comprising:displaying, via a processor, text via a touch-sensitive display;receiving, from the touch-sensitive display, input identifying a portion of the text; andaudibly presenting the portion of the text.2. The method of claim 1 , wherein receiving the input further comprises receiving non-contiguous separate touches on the touch-sensitive display claim 1 , wherein the non-contiguous separate touches indicate a number of paragraphs of the text to be audibly presented as the portion of the text.3. The method of claim 1 , wherein the input comprises data associated with a first tap at a first location and a second tap at a second location claim 1 , and the portion of the text is identified as text displayed between the first location and the second location.4. The method of claim 1 , wherein audibly presenting the portion of the text occurs via a speaker associated with the touch-sensitive display.5. The method of claim 1 , wherein the touch-sensitive display is part of a mobile phone.6. The method of claim 1 , wherein audibly presenting the portion of the text comprises communicating pre-recorded phonemes combined together.7. The method of claim 1 , wherein the input further comprises an area of the touch-sensitive display indicated by user touch.8. A system comprising:a processor; anda computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising ...

Подробнее
21-11-2013 дата публикации

Electronic Apparatus

Номер: US20130311187A1
Автор: Nakamae Midori
Принадлежит: KABUSHIKI KAISHA TOSHIBA

An electronic apparatus comprises a storage module, a manipulation module, a voice output control module, and a display module. The storage module configured to store book data. The manipulation module is configured to convert a manipulation of a user into an electrical signal while the voice output control module configured to reproduce a voice by reading the book data in the storage module based on the manipulation, and the display module is configured to display the book data. When it is determined that a part to be reproduced includes an illustration or a figure, the user is urged to view the display module and the illustration or the figure is displayed at the display module. 1. An electronic apparatus comprising:a storage module configured to store book data;a manipulation module configured to convert a manipulation of a user into an electrical signal;a voice output control module configured to reproduce a voice by reading the book data in the storage module based on the manipulation; anda display module configured to display the book data,wherein when it is determined that a part to be reproduced includes an illustration or a figure, the user is urged to view the display module and the illustration or the figure is displayed at the display module.2. The electronic apparatus of claim 1 , wherein when it is determined that the user is not viewing the display module during voice reproduction of the book data claim 1 , the user is urged to view the display module and the illustration or the figure is displayed at the display module.3. The electronic apparatus of claim 1 , further comprising:a control module, configured to store, in the storage module, a position of voice reproduction of the book data by the voice output control module, and to synchronize the position of the voice reproduction with a reproduction position in the book data.4. The electronic apparatus of claim 1 , further comprising:a control module;wherein a reproduction part in the book data is ...

Подробнее
21-11-2013 дата публикации

Text-to-speech device, speech output device, speech output system, text-to-speech methods, and speech output method

Номер: US20130311188A1
Автор: ADACHI Takuma
Принадлежит: Panasonic Corporation

An audio read-out device comprises an audio signal generator, a first information receiver, a first information transmitter, a first controller, and a mixed audio signal generator, and when the first information receiver receives audio output enablement information indicating that audio output is disabled, the first controller causes the mixed audio signal generator to generate a mixed audio signal composed of a broadcast audio signal and causes the first information transmitter to transmit the mixed audio signal until the first information receiver receives audio output enablement information indicating that audio output is enabled; and when the first information receiver receives audio output enablement information indicating that audio output is enabled, the first controller causes the mixed audio signal generator to generate a mixed audio signal obtained by mixing a read-out audio signal and a broadcast audio signal, and causes the first information transmitter to transmit the mixed audio signal. 1. An audio read-out device connected via a network to an audio output device that outputs a read-out audio signal , the audio read-out device comprising:an audio signal generator configured to generate the read-out audio signal from text information;a first information receiver configured to receive audio output enablement information from the audio output device via the network;a first information transmitter configured to transmit the read-out audio signal generated by the audio signal generator, to the audio output device via the network;a first controller configured to, when the first information receiver receives audio output enablement information indicating that audio output is disabled, cause the first information transmitter to wait to transmit the read-out audio signal until the first information receiver receives audio output enablement information indicating that audio output is enabled, and to, when the first information receiver receives audio output ...

Подробнее
21-11-2013 дата публикации

Voice processing apparatus

Номер: US20130311189A1
Принадлежит: Yamaha Corp

In a voice processing apparatus, a processor performs generating a converted feature by applying a source feature of source voice to a conversion function, generating an estimated feature based on a probability that the source feature belongs to each element distribution of a mixture distribution model that approximates distribution of features of voices having different characteristics, generating a first conversion filter based on a difference between a first spectrum corresponding to the converted feature and an estimated spectrum corresponding to the estimated feature, generating a second spectrum by applying the first conversion filter to a source spectrum corresponding to the source feature, generating a second conversion filter based on a difference between the first spectrum and the second spectrum, and generating target voice by applying the first conversion filter and the second conversion filter to the source spectrum.

Подробнее
05-12-2013 дата публикации

SPEECH SYNTHESIS SYSTEM, SPEECH SYNTHESIS METHOD AND SPEECH SYNTHESIS PROGRAM

Номер: US20130325477A1
Принадлежит: NEC Corporation

A speech synthesis system includes: a training database storing training data which is set of features extracted from speech waveform data; a feature space division unit which divides a feature space which is a space concerning to the training data into partial spaces; a sparse or dense state detection unit which detects a sparse or dense state to each partial space which is the divided feature space, generates sparse or dense information which is information indicating the sparse or dense state and outputs the sparse or dense information; and a pronunciation information correcting unit which corrects pronunciation information which is used for speech synthesis based on the outputted sparse or dense information. 1. A speech synthesis system comprising:a training database storing training data which comprises a set of features extracted from speech waveform data;a feature space division unit which divides a feature space which comprises a space concerning to the training data which said training database stores into partial spaces;a sparse or dense state detection unit which detects a sparse or dense state to each partial space which comprises the feature space divided by said feature space division unit, generates sparse or dense information which comprises information indicating the sparse or dense state and outputs the sparse or dense information; anda pronunciation information correcting unit which corrects pronunciation information which is used for speech synthesis based on the sparse or dense information outputted from said sparse or dense state detection unit.2. The speech synthesis system according to claim 1 , further comprising:a prosody training unit which trains a prosody model in the partial space which comprises the feature space divided by said feature space division unit and generates a prosody generation model;a prosody generation model storage unit which stores the prosody generation model generated by said prosody training unit and the sparse or ...

Подробнее
12-12-2013 дата публикации

SMART PHONE WITH SELF-TRAINING, LIP-READING AND EYE-TRACKING CAPABILITIES

Номер: US20130332160A1
Автор: Posa John G.
Принадлежит:

Smartphones and other portable electronic devices include self-training, lip-reading, and/or eye-tracking capabilities. In one disclosed method, an eye-tracking application is operative to use the video camera of the device to track the eye movements of the user while text is being entered or read on the display. If it is determined that the user is moving at a rate of speed associated with motor vehicle travel, as though GPS or other methods, a determination is made if the user is engaged in a text-messaging session, and if the user is looking away from the device during the text-messaging session assumptions may be made about texting while driving, including corrective actions. 1. A method of training a smart phone or other portable electronic device having a microphone , a display , a keyboard , an audio output and a memory , comprising the steps of:receiving words spoken by a user through the microphone;utilizing a speech-to-text algorithm to converting the spoken words into raw text;displaying the raw text on the display;correcting errors in the text using the keyboard;storing, in the memory, data representative of the spoken words in conjunction with the corrected text; andusing the stored information to train the device so as to increase the likelihood that when the same word or words are spoken in the future the corrected text will be generated.2. The method of claim 1 , wherein the spoken words are part of a phone conversation claim 1 , with the raw text being displayed whether or not the user wishes to correct the text.3. The method of claim 1 , including the step of suggesting words for the user to speak claim 1 , either using the display or through the audio output.4. A method of training a smart phone or other portable electronic device having a microphone claim 1 , a camera and a memory claim 1 , comprising the steps of:watching a user's lips with the camera as they speak or mouth-out words;storing, in the memory, data representative of the words in ...

Подробнее
12-12-2013 дата публикации

Method and System for Enhancing a Speech Database

Номер: US20130332169A1
Принадлежит: AT&T INTELLECTUAL PROPERTY II, L.P.

A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis. 1. A method comprising:receiving text as part of a text-to-speech process; identifying primary speech segments in the primary speech database which do not meet a need of the text-to-speech process, wherein the primary speech segments comprise one of half-phones, half-phonemes, demi-syllables, and polyphones;', 'identifying replacement speech segments which satisfy the need in a secondary speech database; and', 'enhancing the primary speech database by substituting, in the primary database, the primary speech segments with the replacement speech segments; and, 'selecting a speech segment associated with the text, wherein the speech segment is selected from a primary speech database which has been modified bygenerating speech corresponding to the text using the speech segment.2. The method of claim 1 , wherein the need is based on one of dialect differences claim 1 , geographic language differences claim 1 , regional language differences claim 1 , accent differences claim 1 , national language differences claim 1 , idiosyncratic speech differences claim 1 , and database coverage differences.3. The method of claim 1 , wherein the primary speech segments are one of diphones claim 1 , triphones claim 1 , and phonemes.4. The method of claim 1 , wherein the primary speech database has been further modified by identifying boundaries of the primary speech segments.5. ...

Подробнее
12-12-2013 дата публикации

METHOD AND SYSTEM FOR PROCESSING CONTENT

Номер: US20130332170A1
Автор: Melamed Gal
Принадлежит:

Provided are a method and system for processing user input and web based content by transforming content to metadata and by using a plurality of vocabularies, including specific vocabularies (e.g. location dependent, culture dependent, personalized, non formal, and more), and other methods to process voice or non-voice content. 1. A method for processing content , carried out using an electronic processor the method comprisingtransforming non-voice content to metadata;mapping the non voice content to the metadata;transmitting the metadata to a connected device said connected device is configured to determine a single metadata object to use as input to a text-to-speech system;converting the metadata to a format suitable for submitting to the text-to-speech system;submitting the converted metadata to the text-to-speech system; andpresenting the non-voice content as speech.2. The method according to comprising extracting the non-voice content from a network.3. (canceled)4. The method according to wherein the network comprises a social network claim 2 , an instant messaging textual representation service or a combination thereof.5. The method according to wherein the non-voice content comprises informal text.6. The method according to comprisingextracting non-voice content from a web resource;identifying informal text within the non-voice content; andtransforming the identified informal text to metadata prior to converting the metadata into a format suitable for submitting to a text-to-speech system.7. The method according to wherein transforming the identified informal text to metadata comprisestagging the informal text in a platform specific manner to obtain tagged data; andtransforming the tagged data to metadata.8. The method according to further comprisingdetecting the language of the tagged data;detecting misspelled content;correcting spelling mistakes in the misspelled content;detecting informal text content; andtransforming the informal text content to a format ...

Подробнее
26-12-2013 дата публикации

Real-time message sentiment awareness

Номер: US20130346067A1
Автор: Dhruv A. Bhatt
Принадлежит: International Business Machines Corp

Provided are techniques for determining a sentiment of an electronic message. The electronic message is parsed to identify one or more sub-constructs. For at least one of the sub-constructs that is not false-positive, a sentiment indicator is assigned from a set of types of sentiment indicators, and a score is assigned for the sentiment indicator. A final score is obtained for at least one type of sentiment indicator in the electronic message by summing scores for that type of sentiment indicator. Based on the final score for the at least one type of sentiment indicator, a sentiment of the electronic message is identified.

Подробнее
26-12-2013 дата публикации

DEVICE FOR AIDING COMMUNICATION IN THE AERONAUTICAL DOMAIN

Номер: US20130346081A1
Принадлежит: Airbus (SAS)

The device () comprises means () for recording audio messages corresponding to all the incoming and outgoing audio communications, means () for transcribing, in real time, each of said audio messages into a textual message, means () for displaying, on at least one screen (), each textual message thus generated, and means able to play back any recorded audio message. 2. The device as claimed in claim 1 , playback means which are able to carry out audio playback of any recorded audio message; and', 'activation means able to be activated by an operator to identify and trigger the playback by the playback means of a recorded audio message., 'which comprises moreover3. The device as claimed in claim 2 ,wherein said activation means comprise, for each textual message displayed on the screen, an associated sensitive area which is displayed jointly with the textual message with which it is associated, and which is able to be activated so as to trigger the playback of the audio message corresponding to said textual message.4. The device as claimed in claim 1 ,which comprises means allowing an operator to copy at least part of a displayed textual message and to transmit it to a system of the aircraft.5. The device as claimed in claim 1 ,which comprises moreover means for determining the time of emission of each audio message, and in that said display means display moreover on said screen, jointly with the textual message with which it is associated, the corresponding emission time.6. The device as claimed in claim 1 ,which comprises means making it possible to access an automatic terminal information service, to transcribe into textual messages all the audio messages emitted by this service, to display said textual messages on said screen, and to play back any reordered audio message that may be listened to on request.7. The device as claimed in claim 1 ,wherein said screen of the display means represents a dedicated single graphical interface.8. The device as claimed in ...

Подробнее
02-01-2014 дата публикации

SOUND SYNTHESIS METHOD AND SOUND SYNTHESIS APPARATUS

Номер: US20140006031A1
Принадлежит:

A sound synthesis apparatus connected to a display device, includes a processor configured to: display a lyric on a screen of the display device; input a pitch based on an operation of a user, after the lyric has been displayed on the screen; and output a piece of waveform data representing a singing sound of the displayed lyric based on the inputted pitch. 1. A sound synthesis method using an apparatus connected to a display device , the sound synthesis method comprising:a first step of displaying a lyric on a screen of the display device;a second step of inputting a pitch based on an operation of a user, after the first step is completed; anda third step of outputting a piece of waveform data representing a singing sound of the displayed lyric based on the inputted pitch.2. The sound synthesis method according to claim 1 , further comprising:a fourth step of storing a piece of phrase data representing a sound corresponding to the lyric displayed on the screen into a storage in the apparatus, and the piece of phrase data being constituted by a plurality of pieces of syllable data,wherein in the third step, pitch conversion based on the inputted pitch is performed on each of the plurality of pieces of syllable data which constitutes the piece of phrase data to generate and output the piece of waveform data representing the singing sound with the pitch.3. The sound synthesis method according to claim 2 , wherein every time the pitch is inputted in the second step claim 2 , a sequence of syllable data is read among the plurality of pieces of syllable data stored in the storage and the pitch conversion based on the inputted pitch is performed on the sequence of syllable data.4. The sound synthesis method according to claim 2 , wherein the lyric displayed on the screen in the first step is constituted by a plurality of syllables claim 2 ,the sound synthesis method further comprising:a fifth step of selecting a syllable among the lyric displayed on the screen,wherein ...

Подробнее
02-01-2014 дата публикации

SYSTEM AND METHOD FOR DYNAMICALLY INTERACTING WITH A MOBILE COMMUNICATION DEVICE

Номер: US20140006032A1
Автор: Korn Jeffrey
Принадлежит:

Audio presentations of a media content delivered onto a device are interrupted using commands not otherwise known to or programmed in to a messaging application used to present the content to the user of the device. In one embodiment, an electronic message having textual content is received at the mobile device, where it is translated into an audio stream, and presented (i.e., played back) to the user of the device within the messaging application. The user provides, and the application receives a string of identical user commands that are not specifically defined or programmed in as commands within the messaging application, and playback of the audio stream is modified according to the received string of user commands. 1. A method for interrupting a presentation of a message delivered onto a mobile device , the method comprising the steps of:receiving an incoming electronic message at the mobile device, the incoming message comprising textual content;translating the textual content to an audio stream;initiating playback of the audio stream within a messaging application;receiving, by the messaging application, a string of substantially identical user commands, the commands not being specifically defined as commands within the messaging application; andtriggering interruption of playback of the audio stream based on receiving the string of user commands.2. The method of wherein the electronic message comprises one of an electronic mail message claim 1 , a text message claim 1 , an SMS message claim 1 , a news story claim 1 , a broadcast message claim 1 , a calendar event description claim 1 , a web page claim 1 , a web-based article claim 1 , a web log (blog) claim 1 , a weather report claim 1 , a digital text document claim 1 , a task from a task list claim 1 , or other structured electronic content.3. The method of wherein the string of identical user commands comprise a repetition of utterances.4. The method of wherein the utterances are monosyllabic.5. The ...

Подробнее
09-01-2014 дата публикации

METHOD AND APPARATUS FOR RECORDING AND PLAYING USER VOICE IN MOBILE TERMINAL

Номер: US20140012583A1
Автор: KWAK Byeonghoon, MOK Jieun
Принадлежит:

A method and an apparatus for recording and playing a user voice in a mobile terminal are provided. The method for recording and storing a user voice in a mobile terminal includes entering a page by executing an electronic book, identifying whether a user voice record file related to the page exists, generating a user voice record file related to the page by recording a text included in the page to a user voice if the user voice record file does not exist, and playing by synchronizing the user voice stored in the user voice record file with the text if the user voice record file exists. Accordingly, a user voice can be recorded corresponding to a text of a page when recording a specific record of an electronic book, and the text corresponding to the user voice being played can be highlighted by synchronizing the user voice and the text. 1. A method for recording and playing a user voice in a mobile terminal , the method comprising:entering a page by executing an electronic book;identifying whether a user voice record file related to the page exists;generating a user voice record file related to the page by recording a text included in the page to a user voice if the user voice record file does not exist; andplaying by synchronizing the user voice stored in the user voice record file with the text if the user voice record file exists.2. The method of claim 1 , wherein the generating of the user voice record file comprises recording a text included in the page to a user voice and a synchronization file including text location information corresponding to each time section of the user voice record file.3. The method of claim 1 , wherein the generating of the user voice record file further comprises:identifying whether a touch input corresponding to a text location is detected if a record command for the text is received; andstarting to record a user voice if the touch input is not detected within a predetermined time elapse.4. The method of claim 3 , further comprising ...

Подробнее
09-01-2014 дата публикации

PROSODY GENERATOR, SPEECH SYNTHESIZER, PROSODY GENERATING METHOD AND PROSODY GENERATING PROGRAM

Номер: US20140012584A1
Принадлежит: NEC Corporation

There is provided a prosody generator that generates prosody information for implementing highly natural speech synthesis without unnecessarily collecting large quantities of learning data. A data dividing means divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms. A density information extracting means extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing means . A prosody information generating method selecting means selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics. 1. A prosody generator comprising:a data dividing unit which divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms;a density information extracting unit which extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing unit, anda prosody information generating method selecting unit which selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics.2. The prosody generator according to claim 1 , further comprisinga prosody generation model preparing unit which prepares a prosody generation model representative of relations between speech and the prosody information by use of a learning ...

Подробнее
16-01-2014 дата публикации

ELECTRONIC DEVICE, INFORMATION PROCESSING APPARATUS,AND METHOD FOR CONTROLLING THE SAME

Номер: US20140019136A1
Автор: Tanaka Tomonori
Принадлежит:

The present invention provides a technology for enabling a natural voice reproduction in which, depending on a gazed character position, a position of a voice output character follows but not excessively reacts with the gazed character position. Therefore, in an electronic device provided with a display unit for displaying text on a screen, a voice outputting unit for outputting the text as voice, and a sight-line detection unit for detecting a sight-line direction of a user, a control unit changes a starting position at which a voice outputting unit starts voice output if a distance between the position of the current output character and the position of the current gazed character is a preset threshold or more. 1. An electronic device comprising:a display unit configured to display text on a screen;a voice outputting unit configured to output the text as voice;a sight-line detection unit configured to detect a sight-line direction of a user; anda control unit configured, assuming that a position of a character that the voice outputting unit is currently outputting as voice is defined as a position of a current output character, and a position of a character in the text that is present in the sight-line direction of the user detected by the sight-line detection unit is defined as a position of a current gazed character, to change a starting position at which the voice outputting unit starts voice output depending on a distance between the position of the current output character and the position of the current gazed character,the control unit including:a determination unit configured to determine whether or not the distance between the position of the current output character and the position of the current gazed character is a preset threshold or more; anda setting unit configured, if the determination unit determined that the distance is the threshold or more, to set the position of the current gazed character to the starting position at which the voice ...

Подробнее
16-01-2014 дата публикации

METHOD, SYSTEM AND SERVER FOR SPEECH SYNTHESIS

Номер: US20140019137A1
Автор: KITAGISHI Ikuo
Принадлежит: Yahoo Japan Corporation

A speech synthesis system synthesizes speech using a reading text and a speech dictionary set, and includes a server apparatus. The server apparatus includes an interface unit open to the public; a speech input reception unit that receives an input of speech from an external terminal through the interface unit to generate a speech dictionary set; a registration information reception unit that receives registration information relating to a speech owner who inputs the speech from the external terminal through the interface unit; a speech dictionary set maintaining unit that maintains a speech dictionary set generated from the speech of which the input has been received in association with the registration information of a person inputting the speech; and a speech dictionary set selecting unit that allows selection of a speech dictionary set maintained in the speech dictionary set maintaining unit from the external terminal through the interface unit. 1. A speech synthesis system that synthesizes speech using a reading text and a speech dictionary set , the speech synthesis system comprising a server apparatus including:an interface unit that is open to a public;a speech input reception unit that receives an input of speech from an external terminal through the interface unit to generate a speech dictionary set;a registration information reception unit that receives registration information relating to a speech owner who is a person inputting the speech from the external terminal through the interface unit;a speech dictionary set maintaining unit that maintains a speech dictionary set generated from the speech of which the input has been received in association with the registration information of the person inputting the speech; anda speech dictionary set selecting unit that allows selection of the speech dictionary set maintained in the speech dictionary set maintaining unit from the external terminal through the interface unit.2. The speech synthesis system ...

Подробнее
16-01-2014 дата публикации

Blood glucose meter with simplified programmable voice function

Номер: US20140019139A1
Принадлежит: PRODIGY DIABETES CARE LLC

A blood glucose meter with a simplified programmable voice function, including: a microprocessor; a memory that is both programmable and re-programmable coupled to the microprocessor; and an audio output device coupled to the microprocessor and the memory; wherein a language algorithm and a plurality of language components specific to a language selected by a user are disposed within the memory; and wherein the language algorithm and the plurality of language components are utilized to provide an audio output through the audio output device in the language selected by the user. The language algorithm is operable for determining which language components are utilized to provide the audio output and in what order based on the language selected by the user. Optionally, the audio output is generated by the microprocessor and the memory using a pulse-width modulation scheme and/or the like.

Подробнее