Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 62825. Отображено 100.
19-01-2012 дата публикации

Method and device for audio signal classification

Номер: US20120016677A1
Принадлежит: Huawei Technologies Co Ltd

The present invention discloses a method and a device for audio signal classification, and relates to the field of communications technologies, which solve a problem of high complexity of type classification of audio signals in the prior art. In the present invention, after an audio signal to be classified is received, a tonal characteristic parameter of the audio signal to be classified, where the tonal characteristic parameter of the audio signal to be classified is in at least one sub-band, is obtained, and a type of the audio signal to be classified is determined according to the obtained characteristic parameter. The present invention is mainly applied to an audio signal classification scenario, and implements audio signal classification through a relatively simple method.

Подробнее
19-01-2012 дата публикации

Intelligent Automated Assistant

Номер: US20120016678A1
Принадлежит: Apple Inc

An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

Подробнее
26-01-2012 дата публикации

Noise canceller and noise cancellation program

Номер: US20120020489A1
Автор: Tomohiro Narita
Принадлежит: Mitsubishi Electric Corp

A directivity control unit 10 calculates a main beam signal with its directivity turned toward an object sound direction and a sub-beam signal with its blind spot turned toward the object sound direction from output signals of a plurality of microphones 2 and 3 through signal processing, and a frequency analyzing unit 20 converts them to spectra. A sound source decision unit 30 decides on whether a sound source is voice, stationary noise or unstationary noise from the spectra of the main beam signal and sub-beam signal and outputs as a sound source decision result, and calculates the average spectrum which is a statistic of noise for the main beam signal. An interfering sound removing unit 50 subtracts the average spectrum from the spectrum of the main beam signal to remove interfering sounds.

Подробнее
26-01-2012 дата публикации

Speech to Text Conversion

Номер: US20120022867A1
Принадлежит: Google LLC

Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device.

Подробнее
26-01-2012 дата публикации

Dynamic Range Improvement Technique

Номер: US20120022877A1
Автор: Larry Joseph Kirn
Принадлежит: Individual

Apparatus and methods are disclosed for detecting and progressively attenuating specific frequencies prevalent in an audio signal. In contrast to conventional wide-band enhancement techniques over long time frames, narrow bandwidths and short attenuation times employed are commensurate with resonances and timing typical of speech. Apparent dynamic range is therefore increased through attenuation of longer-duration elements with declining informational contribution.

Подробнее
09-02-2012 дата публикации

Speech search device and speech search method

Номер: US20120036159A1
Принадлежит: Toyohashi University of Technology NUC

Provided are a speech search device, the search speed of which is very fast, the search performance of which is also excellent, and which performs fuzzy search, and a speech search method. Not only the fuzzy search is performed, but also the distance between phoneme discrimination features included in speech data is calculated to determine the similarity with respect to the speech using both a suffix array and dynamic programming, and an object to be searched for is narrowed by means of search keyword division based on a phoneme and search thresholds relative to a plurality of the divided search keywords, the object to be searched for is repeatedly searched for while increasing the search thresholds in order, and whether or not there is the keyword division is determined according to the length of the search keywords, thereby implementing speech search, the search speed of which is very fast and the search performance of which is also excellent.

Подробнее
16-02-2012 дата публикации

Teaching aid

Номер: US20120040315A1
Автор: Peter Lawrence King
Принадлежит: UNICUS INVESTMENTS Pty Ltd

The present invention relates to the field of voice and speech recognition, in one form, the invention relates to a teaching aid adapted to teach reading and spelling via a voice and/or speech recognition system adapted to assist persons having dyslexia. The invention also provides a mechanism to train a speech recognition system without the need for the used to read verbose passages of text.

Подробнее
23-02-2012 дата публикации

Sound source separation apparatus and sound source separation method

Номер: US20120045066A1
Принадлежит: Honda Motor Co Ltd

A sound source separation apparatus includes a transfer function storage unit that stores a transfer function from a sound source, a sound change detection unit that generates change state information indicating a change of the sound source on the basis of an input signal input from a sound input unit, a parameter selection unit that calculates an initial separation matrix on the basis of the change state information generated by the sound change detection unit, and a sound source separation unit that separates the sound source from the input signal input from the sound input unit using the initial separation matrix calculated by the parameter selection unit.

Подробнее
23-02-2012 дата публикации

System, method and apparatus with environmental noise cancellation

Номер: US20120045074A1
Принадлежит: C Media Electronics Inc

Disclosed herein are system, method and apparatus with environmental noise cancellation. The instant disclosure is particularly adapted to a receiver module having at least two inputs. The two inputs respectively receive a main audio portion and the audio with majority of environmental noise. The system firstly calibrates the audio signals to reduce the error caused by the difference between the two inputs. An adaptive beamforming technology and a speech extractor are respectively used to extract the environmental noise portion with less main audio and the main audio portion with less noise. After a process of time-to-frequency domain transformation, a non-linear noise suppression technology is introduced into estimating the environmental noise and acquiring a gain. After noise suppression processed with the gain, a sequence of audio signals is output after a frequency-to-time domain transformation.

Подробнее
23-02-2012 дата публикации

Method and Apparatus for Telephonically Accessing and Navigating the Internet

Номер: US20120047216A1
Принадлежит: Ben Franklin Patent Holding LLC

A method for accessing and browsing the interne through the use of a telephone and the associated DTMF signals is disclosed. The preferred embodiment provides a system that converts the information content of a web page from text to speech (voice signals), signals the hyperlink selections of a web page in an audio manner, and allows selection of the hyperlinks through the use of DTMF signals generated from a telephone keypad. Upon receiving a DTMF signal corresponding to a hyperlink, the corresponding web page is fetched and again delivered to the user via one of the available delivery methods such as voice, fax-on-demand, electronic mail, or regular mail.

Подробнее
15-03-2012 дата публикации

System for extraction of reverberant content of an audio signal

Номер: US20120063608A1
Принадлежит: Harman International Industries Inc

A reverberant characteristic of an acoustic space is superimposed on an audio signal that is received by an apparatus. The apparatus decomposes the audio signal into an estimated original dry signal component and an estimated reverberant characteristic of the acoustic space. Estimation of the original dry signal component and the reverberant characteristic of the acoustic space is based on determination of an estimated impulse response of the acoustic space from the received audio signal. Once the audio signal is decomposed, the estimated original dry signal component and the estimated reverberant characteristic of the acoustic space may be independently modified by the apparatus. The modified or unmodified estimated original dry signal component and estimated reverberant characteristic of the acoustic space may be combined by the apparatus to produce one or more adjusted frequency spectra.

Подробнее
15-03-2012 дата публикации

System and method for pronunciation modeling

Номер: US20120065975A1
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

Подробнее
19-04-2012 дата публикации

Automatically providing a user with substitutes for potentially ambiguous user-defined speech commands

Номер: US20120095765A1
Принадлежит: Nuance Communications Inc

A method for alleviating ambiguity issues of new user-defined speech commands. An original command for a user-defined speech command can be received. It can then be determined if the original command is likely to be confused with a set of existing speech commands. When confusion is unlikely, the original command can be automatically stored. When confusion is likely, a substitute command that is unlikely to be confused with existing commands can be automatically determined. The substitute can be presented as an alternative to the original command and can be selectively stored as the user-defined speech command.

Подробнее
10-05-2012 дата публикации

System for voice control of a medical implant

Номер: US20120116774A1
Автор: Peter Forsell
Принадлежит: MILUX HOLDING SA

An implantable system ( 11 ) for control of and communication with an implant ( 17 ) in a body, comprising a command input device ( 12 ) and a processing device ( 13 ) coupled thereto, the processing device ( 13 ) being adapted to generate input to a command generator ( 16 ) which is comprised in the system ( 11 ) coupled to the processing device ( 13 ) and which is adapted to generate and communicate commands to the medical implant ( 17 ) in response to input received from the processing device ( 13 ), the system ( 11 ) further comprising a memory unit ( 15 ) connected to at least one of said devices in the system ( 11 ) for storing a memory bank of commands. The command input device ( 12 ) is adapted to receive commands from a user as voice commands, and the processing device ( 13 ) comprises a filter adapted to filter voice commands against high frequency losses and frequency distortion caused by the mammal body ( 10 ).

Подробнее
17-05-2012 дата публикации

System and method for providing enhanced audio in a video environment

Номер: US20120120270A1
Принадлежит: Cisco Technology Inc

A method is provided in one example and includes receiving audio data at a microphone array that includes a plurality of microphones. The microphone array is provisioned at a first endpoint, which includes a camera element configured to capture video data associated with a video session involving the first endpoint and a second endpoint. The method also includes formatting the audio data into a time division multiplex (TDM) stream, and communicating the stream to a port for a subsequent communication over a network and to the second endpoint.

Подробнее
24-05-2012 дата публикации

Spatial noise suppression for a microphone array

Номер: US20120128176A1
Принадлежит: Microsoft Corp

A noise reduction system and a method of noise reduction includes utilizing an array of microphones to receive sound signals from stationary sound sources and a user that is speaking. Positions of the stationary sound sources relative to the array of microphones are estimated using sound signals emitted from the sound sources at an earlier time. Noise is suppressed in an audio signal based at least in part on the estimated positions of the stationary sound sources. A position of the user relative to the array of microphones can also be estimated

Подробнее
24-05-2012 дата публикации

Speech determination apparatus and speech determination method

Номер: US20120130711A1
Автор: Takaaki Yamabe
Принадлежит: JVCKenwood Corp

A signal portion per frame is extracted from an input signal, thus generating a per-frame signal. The per-frame signal in the time domain is converted into a per-frame signal in the frequency domain, thereby generating a spectral pattern of spectra. It is determined whether an energy ratio is higher than a threshold level. The energy ratio is a ratio of each spectral energy to subband energy in a subband that involves the spectrum. The subband is involved in subbands into which a frequency band is separated with a specific bandwidth. It is determined whether the per-frame signal is a speech segment, based on a result of the determination. Average energy is derived in the frequency direction for the spectra in the spectral pattern in each subband. Subband energy is derived per subband by averaging the average energy in the time domain.

Подробнее
31-05-2012 дата публикации

System and Method for Selective Enhancement Of Speech Signals

Номер: US20120134522A1
Принадлежит: Individual

A system and method for selectively enhancing an audio signal to make sounds, particularly speech sounds, more distinguishable. The system and method are designed to divide an input auditory signal into a plurality of spectral channels having associated unenhanced signals and perform enhancement processing on a first subset of the spectral channels and not perform enhancement processing on a second subset of the spectral channels. The enhancement processing is performed by determining an output gain for at least the first subset of spectral channels based on a time-varying history of energy of the unenhanced signals associated with each channel in the first subset of the spectral channels and applying the output gain for each of the first subset of the spectral channels to the unenhanced signals to form enhanced signals associated with each of the first subset of the spectral channels. The system and method are then designed to combine the plurality of enhanced signals associated with each of the first subset of the spectral channels and the unenhanced signals associated with each of the second subset of the spectral channels to form a selectively enhanced output auditory signal.

Подробнее
14-06-2012 дата публикации

Telephone or other device with speaker-based or location-based sound field processing

Номер: US20120150542A1
Автор: Wei Ma
Принадлежит: National Semiconductor Corp

A method includes obtaining audio data representing audio content from at least one speaker. The method also includes spatially processing the audio data to create at least one sound field, where each sound field has a spatial characteristic that is unique to a specific speaker. The method further includes generating the at least one sound field using the processed audio data. The audio data could represent audio content from multiple speakers, and generating the at least one sound field could include generating multiple sound fields around a listener. The spatially processing could include performing beam forming to create multiple directional beams, and generating the multiple sound fields around the listener could include generating the directional beams with different apparent origins around the listener. The method could further include separating the audio data based on speaker, where each sound field is associated with the audio data from one of the speakers.

Подробнее
14-06-2012 дата публикации

Method and system for reconstructing speech from an input signal comprising whispers

Номер: US20120150544A1
Принадлежит: NANYANG TECHNOLOGICAL UNIVERSITY

A system for reconstructing speech from an input signal comprising whispers is disclosed. The system comprises an analysis unit configured to analyse the input signal to form a representation of the input signal; an enhancement unit configured to modify the representation of the input signal to adjust a spectrum of the input signal, wherein the adjusting of the spectrum of the input signal comprises modifying a bandwidth of at least one formant in the spectrum to achieve a predetermined spectral energy distribution and amplitude for the at least one formant; and a synthesis unit configured to reconstruct speech from the modified representation of the input signal.

Подробнее
21-06-2012 дата публикации

Sound processing apparatus and recording medium storing a sound processing program

Номер: US20120155674A1
Автор: Naoshi Matsuo
Принадлежит: Fujitsu Ltd

A sound processing apparatus includes a first calculator that calculates first power based on a first signal received by a first microphone that is among the first microphone and a second microphone; a second calculator that calculates second power based on a second signal received by the second microphone; a gain calculator that calculates a gain on the basis of the ratio of the first power to the second power; and a multiplier that processes the second signal using the gain calculated by the gain calculator.

Подробнее
05-07-2012 дата публикации

Apparatus and method for voice command recognition based on a combination of dialog models

Номер: US20120173244A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

Provided are a voice command recognition apparatus and method capable of figuring out the intention of a voice command input through a voice dialog interface, by combining a rule based dialog model and a statistical dialog model rule. The voice command recognition apparatus includes a command intention determining unit configured to correct an error in recognizing a voice command of a user, and an application processing unit configured to check whether the final command intention determined in the command intention determining unit comprises the input factors for execution of an application.

Подробнее
19-07-2012 дата публикации

Device and method for controlling damping of residual echo

Номер: US20120183133A1
Принадлежит: Limes Audio AB

The present invention relates to a device, such as a communication device, comprising an adaptive foreground filter configured to calculate a first echo estimation signal based on a first input signal, and an adaptive background filter being more rapidly adapting than the foreground filter and configured to calculate a second echo estimation signal based on said first input signal. Embodiments of the device further comprise damping control means for controlling damping of an echo-cancelled output signal. The device in various embodiments includes that the damping control means is configured to calculate a maximum echo estimation signal using both the first and the second echo estimation signals, and control the damping of the echo-cancelled output signal based on said maximum echo estimation signal and/or a signal derived from said maximum echo estimation signal.

Подробнее
19-07-2012 дата публикации

Sound signal processing apparatus, sound signal processing method, and program

Номер: US20120183149A1
Автор: Atsuo Hiroe
Принадлежит: Sony Corp

An apparatus including a direction estimation unit detecting one or more direction points indicating a sound source direction of a sound signal for each of blocks divided in a predetermined time unit, and a direction tracking unit connecting the direction points to each other between the blocks and detecting a section in which a sound is active.

Подробнее
19-07-2012 дата публикации

Method and system for creating a voice recognition database for a mobile device using image processing and optical character recognition

Номер: US20120183221A1
Принадлежит: Denso Corp, Denso International America Inc

A method and system for controlling a mobile device from a head unit using voice control is disclosed. The head unit receives a graphical representation of a current user interface screen of the mobile device. The head unit than scans the graphical representation of the current user interface screen to determine the locations of potential input mechanisms. The potential input mechanisms are scanned using optical character recognition and voice commands are determined for the input mechanisms. The determined voice commands and their respective locations on the user interface screens are stored in a voice recognition database, which is queried with uttered voice commands during voice recognition.

Подробнее
16-08-2012 дата публикации

Method And Background Estimator For Voice Activity Detection

Номер: US20120209604A1
Автор: Martin Sehlstedt
Принадлежит: Individual

The present invention relates to a method and a background estimator in voice activity detector for updating a background noise estimate for an input signal. The input signal for a current frame is received and it is determined whether the current frame of the input signal comprises non-noise. Further, an additional determination is performed whether the current frame of the non-noise input comprises noise by analyzing characteristics at least related to correlation and energy level of the input signal, and background noise estimate is updated if it is determined that the current frame comprises noise.

Подробнее
16-08-2012 дата публикации

Method and apparatus for information extraction from interactions

Номер: US20120209606A1
Принадлежит: Nice Systems Ltd

Obtaining information from audio interactions associated with an organization. The information may comprise entities, relations or events. The method comprises: receiving a corpus comprising audio interactions; performing audio analysis on audio interactions of the corpus to obtain text documents; performing linguistic analysis of the text documents; matching the text documents with one or more rules to obtain one or more matches; and unifying or filtering the matches.

Подробнее
23-08-2012 дата публикации

Methods and apparatus for formatting text for clinical fact extraction

Номер: US20120212337A1
Принадлежит: Nuance Communications Inc

An original text that is a representation of a narration of a patient encounter provided by a clinician may be received and re-formatted to produce a formatted text. One or more clinical facts may be extracted from the formatted text. A first fact of the clinical facts may be extracted from a first portion of the formatted text, and the first portion of the formatted text may be a formatted version of a first portion of the original text. A linkage may be maintained between the first fact and the first portion of the original text.

Подробнее
23-08-2012 дата публикации

Sound Recognition Operation Apparatus and Sound Recognition Operation Method

Номер: US20120215537A1
Автор: Yoshihiro Igarashi
Принадлежит: Individual

According to one embodiment, a sound recognition operation apparatus includes a sound detection module, a keyword detection module, an audio mute module, and a transmission module. The sound detection module is configured to detect sound. The keyword detection module is configured to detect a particular keyword using voice recognition when the sound detection module detects sound. The audio mute module is configured to transmit an operation signal for muting audio sound when the keyword detection module detects the keyword. The transmission module is configured to recognize the voice command after the keyword is detected by the keyword detection module, and transmit an operation signal corresponding to the voice command.

Подробнее
30-08-2012 дата публикации

Network apparatus and methods for user information delivery

Номер: US20120221412A1
Автор: Robert F. Gazdzinski
Принадлежит: Individual

A network apparatus useful for providing directions and other information to a user of a client device in wireless communication therewith. In one embodiment, the apparatus includes one or more wireless interfaces and a network interface for communication with a server. User speech inputs in the form of digitized representations are received by the apparatus and used by the server as the basis for retrieving information including graphical representations of location or entities that the user wishes to find.

Подробнее
13-09-2012 дата публикации

Wireless synchronization of data and software components over a wireless network compatible to ieee802.11 standard(s) for mobile devices

Номер: US20120230315A1
Принадлежит: Flexiworld Technologies Inc

Wireless synchronization of data and software components over IEEE802.11 standard(s) are herein disclosed and enabled. An information apparatus, which includes a wireless communication unit compatible with IEEE802.11, may access a wireless local area network (WLAN). To setup the wireless synchronization, the user connects the information apparatus to a wireless output device over a wired connection (e.g., USB) and selects the wireless output device. Information associated with the wireless output device is saved in the mobile information apparatus for enabling wireless synchronization. Next, the user connects the mobile information apparatus to the WLAN, and, depending on the availability of the wireless output device in the network, the information apparatus may lock a wireless connection to the wireless output device for wireless synchronization. A client application in the mobile information apparatus and output controller software in the wireless output device may be required to facilitate the wireless synchronization over the WLAN.

Подробнее
27-09-2012 дата публикации

Methods and apparatus for formatting text for clinical fact extraction

Номер: US20120245926A1
Принадлежит: Nuance Communications Inc

An original text that is a representation of a narration of a patient encounter provided by a clinician may be received and re-formatted to produce a formatted text. One or more clinical facts may be extracted from the formatted text. A first fact of the clinical facts may be extracted from a first portion of the formatted text, and the first portion of the formatted text may be a formatted version of a first portion of the original text. A linkage may be maintained between the first fact and the first portion of the original text.

Подробнее
04-10-2012 дата публикации

Frame mapping approach for cross-lingual voice transformation

Номер: US20120253781A1
Принадлежит: Microsoft Corp

Frame mapping-based cross-lingual voice transformation may transform a target speech corpus in a particular language into a transformed target speech corpus that remains recognizable, and has the voice characteristics of a target speaker that provided the target speech corpus. A formant-based frequency warping is performed on the fundamental frequencies and the linear predictive coding (LPC) spectrums of source speech waveforms in a first language to produce transformed fundamental frequencies and transformed LPC spectrums. The transformed fundamental frequencies and the transformed LPC spectrums are then used to generate warped parameter trajectories. The warped parameter trajectories are further used to transform the target speech waveforms in the second language to produce transformed target speech waveform with voice characteristics of the first language that nevertheless retain at least some voice characteristics of the target speaker.

Подробнее
04-10-2012 дата публикации

Multi-mode audio codec and celp coding adapted therefore

Номер: US20120253797A1

In an embodiment, bitstream elements of sub-frames are encoded differentially to a global gain value so that a change of the global gain value results in an adjustment of an output level of the decoded representation of the audio content. Concurrently, the differential coding saves bits. Even further, the differential coding enables the lowering of the burden of globally adjusting the gain of an encoded bitstream. In another embodiment, a global gain control across CELP coded frames and transform coded frames is achieved by co-controlling the gain of the codebook excitation of the CELP codec, along with a level of the transform or inverse transform of the transform coded frames. In another embodiment, the gain value determination in CELP coding is performed in the weighted domain of the excitation signal.

Подробнее
04-10-2012 дата публикации

System and method for rapid customization of speech recognition models

Номер: US20120253799A1
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.

Подробнее
04-10-2012 дата публикации

Location-Based Conversational Understanding

Номер: US20120253802A1
Принадлежит: Microsoft Corp

Location-based conversational understanding may be provided. Upon receiving a query from a user, an environmental context associated with the query may be generated. The query may be interpreted according to the environmental context. The interpreted query may be executed and at least one result associated with the query may be provided to the user.

Подробнее
11-10-2012 дата публикации

Accelerometer vector controlled noise cancelling method

Номер: US20120259628A1
Автор: Georg Siotis
Принадлежит: SONY ERICSSON MOBILE COMMUNICATIONS AB

A telecommunication device is disclosed, comprising: a microphone array comprising a plurality of microphones, wherein each microphone receives an analogue acoustic signal; a position sensing device for determining how the telecommunication device is positioned in three-dimensions with respect to a user's mouth; at least one analogue/digital converter for converting each analogue acoustic signal into a digital signal; a digital signal processor for performing signal processing on the received digital signals comprising a controller, a plurality of delay circuits for delaying each received signal based on an input from the controller and a plurality of preamplifiers for adjusting the gain of each received signal based on a gain input from the controller, wherein the controller selects the appropriate delay and gain values applied to each received signal to remove noise from the received signals based on the determined position of the telecommunication device. A method for creating and controlling a location of a virtual microphone near a telecommunication device so as to reduce background noise in a speech signal is also disclosed.

Подробнее
18-10-2012 дата публикации

Apparatus and method for processing voice command

Номер: US20120265536A1
Принадлежит: Hyundai Motor Co

Disclosed is a technique for processing voice commands. In particular, the disclose technique increases a voice recognition rate without performing a process of inputting separate voice commands by updating a voice command table based on interaction with a user by storing similar commands input by the user once those commands have been confirmed by the user as similar command.

Подробнее
08-11-2012 дата публикации

Photo-realistic synthesis of image sequences with lip movements synchronized with speech

Номер: US20120284029A1
Автор: Frank Soong, Lijuan Wang
Принадлежит: Microsoft Corp

Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.

Подробнее
15-11-2012 дата публикации

Method and apparatus for processing multi-channel de-correlation for cancelling multi-channel acoustic echo

Номер: US20120288100A1
Автор: Nam-gook CHO
Принадлежит: SAMSUNG ELECTRONICS CO LTD

Provided are a method and apparatus for multi-channel de-correlation processing for cancelling a multi-channel acoustic echo. The method includes: dividing an input multi-channel audio signal into units of frames to form multi-channel audio signals in units of frames; analyzing eigen values and eigen vectors related to the multi-channel audio signals by using the multi-channel audio signals in units of frames every time contents are modified; and separating the multi-channel audio signals in units of frames into a plurality of signal component spaces by using the analyzed eigen values and eigen vectors.

Подробнее
15-11-2012 дата публикации

Transform-Domain Codebook In A Celp Coder And Decoder

Номер: US20120290295A1
Автор: Vaclav Eksler
Принадлежит: VoiceAge Corp

Codebook Arrangement for use in coding an input sound signal includes First and Second Codebook Stages. First Codebook Stage includes one of a time-domain CELP codebook and a transform-domain codebook. Second Codebook Stage follows the first codebook stage and includes the other of the time-domain CELP codebook and the transform-domain codebook. Codebook Stage includes an adaptive codebook may be provided before First Codebook Stage. A selector may be provided to select an order of the time-domain CELP codebook and the transform-domain codebook in First and Second Codebook Stages, respectively, as a function of characteristics of the input sound signal. The selector may also be responsive to both the characteristics of the input sound signal and a bit rate of the codec using Codebook Arrangement to bypass Second Codebook Stage. Codebook Arrangement can be used in a coder of an input sound signal.

Подробнее
22-11-2012 дата публикации

Method and apparatus for reducing noise pumping due to noise suppression and echo control interaction

Номер: US20120294453A1

An input signal is processed through noise suppression (NS) and echo control (EC) via a multipath model that reduces noise pumping effects while maintaining EC performance. A copy of a “noisy” input signal is sent to an EC component before the noisy signal is sent to a NS component, which processes the signal first, when there is a consistent noise level for estimation. The copy of the pre-processing noisy signal is sent to the EC component along with a “clean” or “noise-suppressed” signal output from the NS component. The EC component analyzes the noisy signal as if the EC was the first component in the signal chain to determine what actions to take. The EC component then applies these actions to the clean signal received from the NS component.

Подробнее
22-11-2012 дата публикации

Method and system for quickly recognizing and responding to user intents and questions from natural language input using intelligent hierarchical processing and personalized adaptive semantic interface

Номер: US20120296638A1
Автор: Ashish Patwa
Принадлежит: Individual

In embodiments of the present invention, capabilities are described for understanding and responding to the user intent and questions quickly wherein the understanding is based on supervised system learning, Intelligent layered semantic and syntactic information processing and personalized adaptive semantic interface. Supervised system learning creates reference pattern set for the intent repository and possible question categories. Each layer in the layered processing increases the probability of the intent/question recognition. Personalized adaptive voice interface learns from user's interactions over time by enriching the pattern sets and personal index for successfully resolved user intents and questions. Collectively, all these technologies improve the response time for correctly recognizing and responding to user's intents and questions.

Подробнее
29-11-2012 дата публикации

Number-assistant voice input system, number-assistant voice input method for voice input system and number-assistant voice correcting method for voice input system

Номер: US20120303368A1
Автор: Ting Ma
Принадлежит: Mitac International Corp

The present invention discloses a number-assistant voice input system, a number-assistant voice input method for a voice input system and a number-assistant voice correcting method for a voice input system, which apply software to drive a voice input system of an electronic device to provide a voice input logic circuit module. The voice input logic circuit module defines the pronunciation of numbers 1 to 26 as the paths to respectively input letters A to Z in the voice input system and allows users to selectively input or correct a letter by reading a number from 1 to 26 instead of a letter from A to Z.

Подробнее
06-12-2012 дата публикации

Method And Apparatus For Voice Activity Determination

Номер: US20120310641A1
Принадлежит: Nokia Oyj

In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector for making a first voice activity detection decision based at least in part on the voice activity of a first audio signal received from a first microphone. The apparatus also comprises a second voice activity detector for making a second voice activity detection decision based at least in part on an estimate of a direction of the first audio signal and an estimate of a direction of a second audio signal received from a second microphone. The apparatus further comprises a classifier for making a third voice activity detection decision based at least in part on the first and second voice activity detection decisions.

Подробнее
13-12-2012 дата публикации

Voice recognition grammar selection based on context

Номер: US20120316878A1
Принадлежит: Google LLC

The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.

Подробнее
27-12-2012 дата публикации

Information processing device, information processing system, information processing method, and information processing program

Номер: US20120330659A1
Автор: Kazuhiro Nakadai
Принадлежит: Honda Motor Co Ltd

An information processing device includes a display data creating unit configured to create display data including characters representing the content of an utterance based on a sound and a symbol surrounding the characters and indicating a first direction, and an image combining unit configured to determine the position of the display data based on a display position of an image representing a sound source of the utterance, and to combine the display data and the image of the sound source so that an orientation in which the sound is radiated is matched with the first direction.

Подробнее
03-01-2013 дата публикации

Automatic Language Model Update

Номер: US20130006640A1
Принадлежит: Google LLC

A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.

Подробнее
03-01-2013 дата публикации

Devices and Methods for Identifying a Prompt Corresponding to a Voice Input in a Sequence of Prompts

Номер: US20130006643A1
Принадлежит: Individual

This is directed to processing voice inputs received by an electronic device while prompts are provided. In particular, this is directed to providing a sequence of prompts to a user (e.g., voice over prompts) while monitoring for a voice input. When the voice input is received, a characteristic time stamp can be identified for the voice input, and can be compared to periods or windows associated with each of the provided prompts. The electronic device can then determine that the prompt corresponding to a window that includes the characteristic time stamp was the prompt to which the user wished to apply the voice input. The device can process the voice input to extract a user instruction, and apply the instruction to the identified prompt (e.g., and perform an operation associated with the prompt).

Подробнее
07-02-2013 дата публикации

Apparatus and method for recognizing voice

Номер: US20130035938A1
Автор: Ho Young JUNG

The present invention includes a hierarchical search process. The hierarchical search process includes three steps. In a first step, a word boundary is determined using a recognition method of determining a following word dependent on a preceding word, and a word boundary detector. In a second step, word unit based recognition is performed in each area by dividing an input voice into a plurality of areas based on the determined word boundary. Finally, in a third step, a language model is applied to induce an optimal sentence recognition result with respect to a candidate word that is determined for each area. The present invention may improve the voice recognition performance, and particularly, the sentence unit based consecutive voice recognition performance.

Подробнее
14-02-2013 дата публикации

Methods and apparatuses for echo cancelation with beamforming microphone arrays

Номер: US20130039504A1
Принадлежит: ClearOne Communications Inc

Embodiments include methods and apparatuses for sensing acoustic waves for a conferencing application. A conferencing apparatus includes a plurality of microphones oriented to cover a corresponding plurality of direction vectors and to develop a corresponding plurality of microphone signals. A processor is operably coupled to the plurality of microphones. The processor is configured to perform a beamforming operation to combine the plurality of microphone signals to a plurality of combined signals that is greater in number than one and less in number than the plurality of microphone signals. The processor is also configured perform an acoustic echo cancelation operation on the plurality of combined signals to generate a plurality of combined echo-canceled signals and select one of the plurality of combined echo-canceled signals for transmission.

Подробнее
14-02-2013 дата публикации

Spatio-temporal speech enhancement technique based on generalized eigenvalue decomposition

Номер: US20130041659A1
Принадлежит: Individual

Described herein is a speech enhancement method using microphone arrays and a new iterative technique for enhancing noisy speech signals under low signal-to-noise-ratio (SNR) environments. Included is the processing of observed noisy speech both in the spatial- and the temporal-domains to enhance the desired signal component speech and an iterative technique to compute the generalized eigenvectors of the multichannel data derived from the microphone array. The entire processing is done on the spatio-temporal correlation coefficient sequence of the observed data in order to avoid large matrix-vector multiplications. Also described is a speech enhancement system having two stages. In the first stage, the noise component of the observed signal is whitened, and in the second stage a spatio-temporal power method is used to extract the most dominant speech component. In both the stages, the filters are adapted using the multichannel spatio-temporal correlation coefficients of the data.

Подробнее
21-02-2013 дата публикации

Method, System and Computer Program Product for Suppressing Noise Using Multiple Signals

Номер: US20130046535A1
Принадлежит: Texas Instruments Inc

In response to a first envelope within a kth frequency band of a first channel, a speech level within the kth frequency band of the first channel is estimated. In response to a second envelope within the kth frequency band of a second channel, a noise level within the kth frequency band of the second channel is estimated. A noise suppression gain for a time frame n is computed in response to the estimated speech level for a preceding time frame, the estimated noise level for the preceding time frame, the estimated speech level for the time frame n, and the estimated noise level for the time frame n. An output channel is generated in response to multiplying the noise suppression gain for the time frame n and the first channel.

Подробнее
21-02-2013 дата публикации

Transcript editor

Номер: US20130047059A1
Принадлежит: Avid Technology Inc

A transcript editor enables text-based editing of time-based media that includes spoken dialog. It involves an augmented transcript that includes timing metadata that associates words and phrases within the transcript with corresponding temporal locations within the time-based media where the text is spoken, and editing the augmented transcript without the need for playback of the time-based media. After editing, the augmented transcript is processed by a media editing system to automatically generate an edited version of the time-based media that only includes the segments of the time-based media that include the speech corresponding to the edited augmented transcript.

Подробнее
14-03-2013 дата публикации

Apparatus and method for generating vocal organ animation

Номер: US20130065205A1
Автор: Bong-Rae Park
Принадлежит: CLUSOFT CO Ltd

The present disclosure relates to an apparatus and method for generating a vocal organ animation very similar to a pronunciation pattern of a native speaker in order to support foreign language pronunciation education. The present disclosure checks an adjacent phonetic value in phonetic value constitution information, extracts a detail phonetic value based on the adjacent phonetic value, extracting pronunciation pattern information corresponding to the detail phonetic value and pronunciation pattern information corresponding to a transition section allocated between detail phonetic values, and performs interpolation to the extracted pronunciation pattern information, thereby generating a vocal organ animation.

Подробнее
14-03-2013 дата публикации

Speech enhancement method

Номер: US20130066626A1
Автор: Hsien Cheng Liao

A speech enhancement method is disclosed. The method includes the steps of: receiving a plurality of frames of sound signals by a microphone array; calculating an inter-aural time difference for each frequency band of each frame of the sound signals corresponding to at least one two-microphone set of the microphone array; calculating a plurality of values of cumulative histograms according to the calculated inter-aural time differences; determining a first inter-aural time difference threshold according to the calculated value of the cumulative histograms; and filtering the plurality of frames of sound signals according to the first inter-aural time difference threshold.

Подробнее
14-03-2013 дата публикации

Audio transcription generator and editor

Номер: US20130066630A1
Автор: Kenneth D. Roe
Принадлежит: Kenneth D. Roe

A system for correcting errors in automatically generated audio transcriptions includes an audio recorder, a computerized transcription generator, a voice recording, a collection of link data, transcription text, an audio player, a system of cross linking, and a text editor including a text display with a cursor. The system permits a user to correct transcription errors using techniques of jump to position; show position; and track playback.

Подробнее
14-03-2013 дата публикации

Parametric speech synthesis method and system

Номер: US20130066631A1
Автор: Fengliang Wu, Zhenhua Wu
Принадлежит: Goertek Inc

The present invention provides a parametric speech synthesis method and a parametric speech synthesis system. The method comprises sequentially processing each frame of speech of each phone in a phone sequence of an input text as follows: for a current phone, extracting a corresponding statistic model from a statistic model library and using model parameters of the statistic model that correspond to the current frame of the current phone as rough values of currently predicted speech parameters; according to the rough values and information about a predetermined number of speech frames occurring before the current time point, obtaining smoothed values of the currently predicted speech parameters; according to global mean values and global standard deviation ratios of the speech parameters obtained through statistics, performing global optimization on the smoothed values of the speech parameters to generate necessary speech parameters; and synthesizing the generated speech parameters to obtain a frame of speech synthesized for the current frame of the current phone. With this solution, the capacity of an RAM needed by speech synthesis will not increase with the length of the synthesized speech, and the time length of the synthesized speech is no longer limited by the RAM.

Подробнее
14-03-2013 дата публикации

Echo Cancelling-Codec

Номер: US20130066638A1
Принадлежит: QNX Software Systems Ltd

Echo-cancellation is utilized in terminal devices such as speakerphones to compensate for acoustic echoes and interaction of the audio signal with the surrounding environment. An echo-cancelling codec incorporates encoding, decoding and acoustic echo-cancellation in a single device, enabling processing to be utilized that reduces processing and memory resources. The configuration enables processing information to also be shared between encoding, decoding and acoustic echo-cancellation functions to optimize operational characteristics. The acoustic echo cancelling codec interfaces between the amplitude signal domain, speaker and microphone, and an encoded data domain, a data interface, reducing component requirements required to provide echo-cancellation and coding functions.

Подробнее
21-03-2013 дата публикации

Voice controlled cell phone

Номер: US20130072237A1
Автор: Pradeep Ramdeo
Принадлежит: Individual

A hand-free voice controlled cell phone which includes an antenna, a transceiver coupled to the antenna, a processor coupled to the transceiver, a microphone coupled to the processor and a speaker coupled to the processor. A memory unit is within the processor. A first program within the memory unit is for converting a voice message from the microphone made by a person using the cell phone into a text message for deaf person. The processor can output the text message to the transceiver and out of the antenna, to allow another person using a remote second cell phone to receive the text message. The first program for converting the voice message into a text message includes a voice-to-text software.

Подробнее
11-04-2013 дата публикации

Mobile device context information using speech detection

Номер: US20130090926A1
Принадлежит: Qualcomm Inc

Systems and methods for speech detection in association with a mobile device are described herein. A method described herein for identifying presence of speech associated with a mobile device includes obtaining a plurality of audio samples from the mobile device while the mobile device operates in a mode distinct from a voice call operating mode, generating spectrogram data from the plurality of audio samples, and determining whether the plurality of audio samples include information indicative of speech by classifying the spectrogram data.

Подробнее
18-04-2013 дата публикации

Voice-Activated Pulser

Номер: US20130093445A1
Автор: David Edward Newman
Принадлежит: ZANAVOX

A voice-activated pulser can trigger an oscilloscope or a meter, upon a simple voice command, thereby enabling hands-free signal measurements. The pulser can also be used to control the circuit under test, activating it or changing parameters, all under voice control. The pulser includes numerous switch-selectable output modes that allow users to generate complex, tightly-controlled diagnostic sequences, all activated upon a voice command and hands-free. The invention includes a fast, robust command-interpretation protocol that completely eliminates the expense and complexity of word recognition. Visual indicators display the device status and various operating modes, and also confirm each output pulse. The device receives voice commands directly through an internal microphone, or through a detachable headset, and confirms each command with an acoustical signal in the headset.

Подробнее
18-04-2013 дата публикации

Method and device for phase-sensitive processing of sound signals

Номер: US20130094664A1
Автор: Dietmar Ruwisch
Принадлежит: Individual

A method and device for phase-sensitive processing of sound signals of at least one sound source may include arranging two microphones at a distance d from each other, capturing sound signals with both microphones, generating associated microphone signals, and processing the sound signals of the microphones. During a calibration mode, a calibration-position-specific, frequency-dependent phase difference vector φ0(f) between the associated calibration microphone signals may be calculated from their frequency spectra for the calibration position. Then, during an operating mode, a signal spectrum S of a signal to be output is calculated by multiplication of at least one of the two frequency spectra of the current microphone signals with a spectral filter function F.

Подробнее
18-04-2013 дата публикации

Recognizing device, computer-readable recording medium, recognizing method, generating device, and generating method

Номер: US20130096918A1
Автор: Shouji Harada
Принадлежит: Fujitsu Ltd

A recognizing device includes a memory and a processor coupled to the memory. The memory stores words included in a sentence and positional information indicating a position of the words in the sentence. The processor executes a process including comparing an input voice signal with reading information of a character string that connects a plurality of words stored in the memory to calculate a similarity; calculating a connection score indicating a proximity between the plurality of connected words based on positional information of the words stored in the memory; and determining a character string corresponding to the voice signal based on the similarity and the connection score.

Подробнее
25-04-2013 дата публикации

Mobile voice platform architecture with remote service interfaces

Номер: US20130102295A1
Принадлежит: GM GLOBAL TECHNOLOGY OPERATIONS LLC

A mobile voice platform for providing a user speech interface to computer-based services includes a mobile device having a processor, communication circuitry that provides access to the computer-based services, an operating system, and one or more applications that are run using the operating system and that utilize one or more of the computer-based services via the communication circuitry. The mobile voice platform includes at least one non-transient digital storage medium storing a program module having computer instructions that, upon execution by the processor, receives speech recognition results representing user speech that has been processed using automated speech recognition, determines a desired computer-based service based on the speech recognition results, accesses a remotely-stored service interface associated with the desired service, initiates the desired service using the service interface, receives a service result from the desired service, and provides a text-based service response for conversion to a speech response to be provided to the user.

Подробнее
09-05-2013 дата публикации

Personalized Vocabulary for Digital Assistant

Номер: US20130117022A1
Принадлежит: Apple Inc

Methods, systems, and computer readable storage medium related to operating an intelligent digital assistant are disclosed. A text string is obtained from a speech input received from a user. The received text string is interpreted to derive a representation of user intent based at least in part on a plurality of words associated with a user and stored in memory associated with the user, the plurality of words including words from a plurality of user interactions with an automated assistant. At least one domain, a task, and at least one parameter for the task, are identified based at least in part on the representation of user intent. The identified task is performed. An output is provided to the user, where the output is related to the performance of the task.

Подробнее
23-05-2013 дата публикации

System and method for crowd-sourced data labeling

Номер: US20130132080A1
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until at least one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses.

Подробнее
30-05-2013 дата публикации

Method and system for generating search network for voice recognition

Номер: US20130138441A1

Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.

Подробнее
06-06-2013 дата публикации

Communication device with reduced noise speech coding

Номер: US20130143618A1
Автор: Nambirajan Seshadri
Принадлежит: Broadcom Corp

A communication device includes memory, an input interface, a processing module, and a transmitter. The processing module receives a digital signal from the input interface, wherein the digital signal includes a desired digital signal component and an undesired digital signal component. The processing module identifies one of a plurality of codebooks based on the undesired digital signal component. The processing module then identifies a codebook entry from the one of the plurality of codebooks based on the desired digital signal component to produce a selected codebook entry. The processing module then generates a coded signal based on the selected codebook entry, wherein the coded signal includes a substantially unattenuated representation of the desired digital signal component and an attenuated representation of the undesired digital signal component. The transmitter converts the coded signal into an outbound signal in accordance with a signaling protocol and transmits it.

Подробнее
06-06-2013 дата публикации

System and method for continuous multimodal speech and gesture interaction

Номер: US20130144629A1
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing multimodal input. A system configured to practice the method continuously monitors an audio stream associated with a gesture input stream, and detects a speech event in the audio stream. Then the system identifies a temporal window associated with a time of the speech event, and analyzes data from the gesture input stream within the temporal window to identify a gesture event. The system processes the speech event and the gesture event to produce a multimodal command. The gesture in the gesture input stream can be directed to a display, but is remote from the display. The system can analyze the data from the gesture input stream by calculating an average of gesture coordinates within the temporal window.

Подробнее
13-06-2013 дата публикации

Apparatus, System, and Method For Distinguishing Voice in a Communication Stream

Номер: US20130151248A1
Автор: Forrest Baker, IV
Принадлежит: Noguar LC

An apparatus for distinguishing a voice is described. In one embodiment, the apparatus includes a server with a communication interface, a frame generator, and a sound analyzer. The communication interface processes an incoming communication stream with an echo canceller to cancel echo in the communication stream. The frame generator operates on a processor and generates a plurality of frames from the communication stream. Each of the plurality of frames contains data for a period of time from the communication stream. The frame generator also assigns a frame value to each of the plurality of frames. The sound analyzer determines a status of the communication stream by analyzing the frame values of the plurality of frames.

Подробнее
13-06-2013 дата публикации

Generic virtual personal assistant platform

Номер: US20130152092A1
Автор: Osher Yadgar
Принадлежит: SRI International Inc

A method for assisting a user with one or more desired tasks is disclosed. For example, an executable, generic language understanding module and an executable, generic task reasoning module are provided for execution in the computer processing system. A set of run-time specifications is provided to the generic language understanding module and the generic task reasoning module, comprising one or more models specific to a domain. A language input is then received from a user, an intention of the user is determined with respect to one or more desired tasks, and the user is assisted with the one or more desired tasks, in accordance with the intention of the user.

Подробнее
04-07-2013 дата публикации

Electronic apparatus and method for controlling the same

Номер: US20130169525A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

An electronic apparatus and a control method thereof are provided, which displays first voice guide information indicating voice commands available to control the electronic apparatus, and if a command to control an external device connected to the electronic apparatus is received, changes the first voice guide information and displays second voice guide information to indicating voice commands available to control the external device.

Подробнее
18-07-2013 дата публикации

Echo removing apparatus, echo removing method, program and recording medium

Номер: US20130185064A1
Автор: Mitsuhiro Suzuki
Принадлежит: Sony Corp

To be provided is an echo removing apparatus including a transmission path estimate update processing unit, and an output selection unit. A fixed section of the transmission path estimate is updated based on an error from an echo estimate determined using all of the fixed section, the holding section, and the update section. These sections are updated depending on whether an estimate obtained by adding the fixed section and the holding section is better than an estimate of the fixed section alone in every fixed period. Only when the estimate is better, the holding section is added to the fixed section cumulatively, and the update section is substituted into the holding section. Depending on whether an estimate is better, an error from the eco estimate determined using all these sections or the fixed section alone is selected as an output.

Подробнее
25-07-2013 дата публикации

Computerized information and display apparatus

Номер: US20130191750A1
Автор: Robert F. Gazdzinski
Принадлежит: West View Research LLC

Apparatus useful for obtaining and displaying information. In one embodiment, the apparatus includes a network interface, display device, and speech recognition apparatus configured to receive user speech input and enable performance of various tasks via a remote entity, such as obtaining desired information relating to directions, sports, finance, weather, or any number of other topics. The downloaded may also, in one variant, be transmitted to a personal user device, such as via a data interface.

Подробнее
01-08-2013 дата публикации

Multi-Channel Audio Processing

Номер: US20130195276A1
Автор: Pasi Ojala
Принадлежит: Nokia Oyj

A method including: receiving at least a first input audio channel and a second input audio channel; and using an inter-channel prediction model to form at least an inter-channel direction of reception parameter.

Подробнее
08-08-2013 дата публикации

Enhanced context awareness for speech recognition

Номер: US20130204622A1
Автор: Mika Grundstrom, Wenhui Lu
Принадлежит: Nokia Oyj

A method comprising establishing a call connection ( 200 ) between at least a first and a second terminal; monitoring ( 202 ), by at least the first terminal, a conversation during the call in order to detect ( 204 ) at least one predetermined context-related keyword repeated in at least the first and the second terminal; and in response to detecting ( 210 ) at least one repeated predetermined context-related keyword, providing an indication ( 212, 214, 216 ) about the detected context-related keyword to a user of at least the first terminal, said indication enabling opening an application linked to said context-related keyword.

Подробнее
15-08-2013 дата публикации

Wind suppression/replacement component for use with electronic systems

Номер: US20130211830A1
Принадлежит: AliphCom LLC

Techniques associated with an acoustic vibration sensor are described, including a first detector that receives a first signal and a second detector that receives a second signal and a third signal, wherein the first signal comprises a skin surface microphone signal, a static equalization filter coupled to the first detector and configured to generate an equalized first signal, a voice activity detector coupled to the first detector, and a wind detector coupled to the second detector, the wind detector configured to correlate the second signal and the third signal and to derive from the correlation a plurality of wind metrics associated with a wind noise, the wind detector is further configured to determine a magnitude associated with the wind noise, to determine whether to suspend an activity of the system, and to determine a duration of time that the magnitude associated with the wind noise exceeds a threshold.

Подробнее
22-08-2013 дата публикации

Apparatus and method for modifying an audio signal using envelope shaping

Номер: US20130216053A1
Автор: Sascha Disch

An apparatus for modifying an audio signal has an envelope shape determiner, a filterbank processor, a signal processor, a combiner and an envelope shaper. The envelope shape determiner determines envelope shape coefficients based on the a frequency domain audio signal representing a time domain input audio signal and the filterbank processor generates a plurality of bandpass signals in a subband domain based on the frequency domain audio signal. Further the signal processor modifies a subband domain bandpass signal of the plurality of subband domain bandpass signals based on a predefined modification target. The combiner combines at least a subset of the plurality of subband domain bandpass signals containing the modified subband domain bandpass signal to obtain a time domain audio signal. Further, the envelope shaper is operative to obtain a shaped audio signal.

Подробнее
22-08-2013 дата публикации

Enhanced Media Playback with Speech Recognition

Номер: US20130218565A1
Автор: Paritosh Patel
Принадлежит: Nuance Communications Inc

A method for enhancing a media file to enable speech-recognition of spoken navigation commands can be provided. The method can include receiving a plurality of textual items based on subject matter of the media file and generating a grammar for each textual item, thereby generating a plurality of grammars for use by a speech recognition engine. The method can further include associating a time stamp with each grammar, wherein a time stamp indicates a location in the media file of a textual item corresponding with a grammar. The method can further include associating the plurality of grammars with the media file, such that speech recognized by the speech recognition engine is associated with a corresponding location in the media file.

Подробнее
12-09-2013 дата публикации

Method of facial image reproduction and related device

Номер: US20130236102A1
Принадлежит: CyberLink Corp

To modify a facial feature region in a video bitstream, the video bitstream is received and a feature region is extracted from the video bitstream. An audio characteristic, such as frequency, rhythm, or tempo is retrieved from an audio bitstream, and the feature region is modified according to the audio characteristic to generate a modified image. The modified image is outputted.

Подробнее
19-09-2013 дата публикации

Automatic realtime speech impairment correction

Номер: US20130246058A1
Принадлежит: International Business Machines Corp

Automatic correcting of user's speech impairment in speech may include obtaining the audio signal of a given user's speech, and analyzing the obtained audio signal to identify artifacts caused by the user's impairment. The obtained audio signal may be modified by eliminating the identified artifacts from it. The modified audio signal may be provided, e.g., to be played or broadcast or transmitted.

Подробнее
19-09-2013 дата публикации

System and Method for Robust Estimation and Tracking the Fundamental Frequency of Pseudo Periodic Signals in the Presence of Noise

Номер: US20130246062A1
Принадлежит: VOCALZOOM SYSTEMS Ltd

Method and system for tracking fundamental frequencies of pseudo-periodic signals in the presence of noise that include receiving a time-frequency representation of signals measured in a predefined environment; estimating and tracking a fundamental frequency of a respective pseudo-periodic signal at each time frame of the time-frequency representation by tracking detections of harmonious frequencies in the time-frequency representation over time; and outputting each respective estimated fundamental frequency associated with the pseudo-periodic signal of each respective time frame.

Подробнее
19-09-2013 дата публикации

Method of enabling voice input for a visually based interface

Номер: US20130246920A1
Принадлежит: Research in Motion Ltd

A method of enabling voice input for a graphical user interface (GUI) based application on an electronic device. The method includes: obtaining required properties of one or more user interface objects of the GUI-based application, wherein the one or more user interface objects include one or more input objects; receiving a voice input; extracting from the voice input one or more elements; associating the one or more elements with the one or more input objects; identifying, based on said associating, an input object having a required property which is not satisfied; and outputting, based on the required property, audio output for a prompt for a further voice input.

Подробнее
03-10-2013 дата публикации

Text to speech method and system

Номер: US20130262109A1
Принадлежит: Toshiba Corp

A text-to-speech method for simulating a plurality of different voice characteristics includes dividing inputted text into a sequence of acoustic units; selecting voice characteristics for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model having a plurality of model parameters provided in clusters each having at least one sub-cluster and describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio with the selected voice characteristics. A parameter of a predetermined type of each probability distribution is expressed as a weighted sum of parameters of the same type using voice characteristic dependent weighting. In converting the sequence of acoustic units to a sequence of speech vectors, the voice characteristic dependent weights for the selected voice characteristics are retrieved for each cluster such that there is one weight per sub-cluster.

Подробнее
24-10-2013 дата публикации

Mixer with adaptive post-filtering

Номер: US20130279718A1
Принадлежит: QNX Software Systems Ltd

A noise reduction system includes multiple transducers that generate time domain signals. A transforming device transforms the time domain signals into frequency domain signals. A signal mixing device mixes the frequency domain signals according to a mixing ratio. Frequency domain signals are rotated in phase to generate phase rotated signals. A post-processing device attenuates portions of the output based on coherence levels of the signals.

Подробнее
24-10-2013 дата публикации

Systems and methods for audio signal processing

Номер: US20130282373A1
Принадлежит: Qualcomm Inc

A method for restoring a processed speech signal by an electronic device is described. The method includes obtaining at least one audio signal. The method also includes performing bin-wise voice activity detection based on the at least one audio signal. The method further includes restoring the processed speech signal based on the bin-wise voice activity detection.

Подробнее
31-10-2013 дата публикации

Reduced-delay subband signal processing system and method

Номер: US20130287226A1
Автор: Yair Kerner
Принадлежит: Conexant Systems LLC

A method for signal processing, receiving a time domain signal having a sample-rate Fs and generating N time domain signal bands, each having a bandwidth equal to Fs/N. Receiving the N signal bands and transforming a first time domain signal band to a frequency domain at a first resolution and a second time domain signal band to the frequency domain at a second resolution, where the first resolution may be different from the second resolution. Determining one or more first filter coefficients using the frequency domain components from the first signal band and one or more second filter coefficients using the frequency domain components from the second signal band. Transforming the first and second filter coefficients from the frequency domain to a time domain. Applying the first and second time domain filter coefficients to the first and second time domain signals, respectively.

Подробнее
31-10-2013 дата публикации

Negative Example (Anti-Word) Based Performance Improvement For Speech Recognition

Номер: US20130289987A1
Принадлежит: Interactive Intelligence Inc

A system and method are presented for negative example based performance improvements for speech recognition. The presently disclosed embodiments address identified false positives and the identification of negative examples of keywords in an Automatic Speech Recognition (ASR) system. Various methods may be used to identify negative examples of keywords. Such methods may include, for example, human listening and learning possible negative examples from a large domain specific text source. In at least one embodiment, negative examples of keywords may be used to improve the performance of an ASR system by reducing false positives.

Подробнее
07-11-2013 дата публикации

Physician and clinical documentation specialist workflow integration

Номер: US20130297347A1
Принадлежит: Nuance Communications Inc

A medical documentation system and a CDI system may be linked together, or integrated, so there is a tie between the two systems that allows for a much more efficient and effective CDI process. In one disclosed embodiment, a medical documentation system transmits to a CDI system a structured data set including at least some information relating to one or more medical facts the medical documentation system automatically extracted from text documenting a patient encounter.

Подробнее
12-12-2013 дата публикации

Adjusting audio beamforming settings based on system state

Номер: US20130329908A1
Принадлежит: Apple Inc

Audio beamforming is a technique in which sounds received from two or more microphones are combined to isolate a sound from background noise. A variety of audio beamforming spatial patterns exist. The patterns can be fixed or adapted over time, and can even vary by frequency. The different patterns can achieve varying levels of success for different types of sounds. To improve the performance of audio beamforming, a system can select a mode beam pattern based on a detected running application and/or device settings. The system can use the mode beam pattern to configure an audio beamforming algorithm. The configured audio beamforming algorithm can be used to generate processed the audio data from multiple audio signals. The system can then send processed audio data to the running application.

Подробнее
26-12-2013 дата публикации

Method for denoising an acoustic signal for a multi-microphone audio device operating in a noisy environment

Номер: US20130343558A1
Принадлежит: PARROT SA

This method comprises steps of: a) partitioning ( 10, 16 ) the spectrum of the noisy signal into a HF part and a LF part; b) operating denoising processes in a differentiated manner for each of the two parts of the spectrum with, for the HF part, a denoising by prediction of the useful signal from one sensor to the other between sensors of a first sub-array (R 1 ), by means of a first adaptive algorithm estimator ( 14 ), and, for the LF part, a denoising by prediction of the noise from one sensor to the other between sensors of a second sub-array (R 2 ), by means of a second adaptive algorithm estimator ( 18 ); c) reconstructing the spectrum by combining together ( 22 ) the signals delivered after denoising of the two parts of the spectrum, respectively; and d) selectively reducing the noise ( 24 ) by an Optimized Modified Log-Spectral Amplitude gain, OM-LSA, process.

Подробнее
02-01-2014 дата публикации

Audio signal processing device calibration

Номер: US20140003635A1
Принадлежит: Qualcomm Inc

A method includes, while operating an audio processing device in a use mode, retrieving first direction of arrival (DOA) data corresponding to a first audio output device from a memory of the audio processing device and generating a first null beam directed toward the first audio output device based on the first DOA data. The method also includes retrieving second DOA data corresponding to a second audio output device from the memory of the audio processing device and generating a second null beam directed toward the second audio output device based on the second DOA data. The first DOA data and the second DOA data are stored in the memory during operation of the audio processing device in a calibration mode.

Подробнее
02-01-2014 дата публикации

Computer implemented methods and apparatus for selectively interacting with a server to build a local dictation database for speech recognition at a device

Номер: US20140006028A1
Автор: Minzhi Hu
Принадлежит: Salesforce com Inc

Disclosed are methods, apparatus, systems, and computer-readable storage media for selectively interacting with a server to build a local dictation database for speech recognition at a device. In some implementations, a computing device receives an audio sample. The computing device may determine that the received audio sample does not match any of one or more existing audio samples stored in the local dictation database of the computing device. The received audio sample may be transmitted to a remote server for detection of one or more words indicated by the received audio sample. The computing device may receive data identifying the one or more words, and update the local dictation database to store the received audio sample in association with the one or more words.

Подробнее
06-02-2014 дата публикации

System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain

Номер: US20140037095A1
Принадлежит: Intellisis Corp

A system and method may be configured to process an audio signal. The system and method may track pitch, chirp rate, and/or harmonic envelope across the audio signal, may reconstruct sound represented in the audio signal, and/or may segment or classify the audio signal. A transform may be performed on the audio signal to place the audio signal in a frequency chirp domain that enhances the sound parameter tracking, reconstruction, and/or classification.

Подробнее
06-02-2014 дата публикации

Speech recognition models based on location indicia

Номер: US20140039888A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speech recognition using models that are based on where, within a building, a speaker makes an utterance are disclosed. The methods, systems, and apparatus include actions of receiving data corresponding to an utterance, and obtaining location indicia for an area within a building where the utterance was spoken. Further actions include selecting one or more models for speech recognition based on the location indicia, wherein each of the selected one or more models is associated with a weight based on the location indicia. Additionally, the actions include generating a composite model using the selected one or more models and the respective weights of the selected one or more models. And the actions also include generating a transcription of the utterance using the composite model.

Подробнее
13-02-2014 дата публикации

System and Method for Improving Speech Recognition Accuracy Using Textual Context

Номер: US20140046663A1
Автор: Dan Melamed
Принадлежит: AT&T INTELLECTUAL PROPERTY I LP

Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.

Подробнее
20-02-2014 дата публикации

Coding through combination of code vectors

Номер: US20140052440A1
Принадлежит: Nokia Oyj

It is inter alia disclosed to determine a single code vector index based on combining at least two code vector indexes, each code vector index being associated with a code vector of a respective codebook.

Подробнее
06-03-2014 дата публикации

Displaying additional data about outputted media data by a display device for a speech search command

Номер: US20140067402A1
Автор: Yongsin Kim
Принадлежит: LG ELECTRONICS INC

A speech search method performed by a display device, the method including outputting media data including audio data, receiving a speech search command for additional data about the outputted media data from a user, the speech search command including at least one query word, determining whether the at least one query word matches a query term that is full and searchable, when the at least one query word matches the query term that is full and searchable, performing a search for the additional data using the query term, and when the at least one query word does not match the query term that is full and searchable, determining the query term from a predetermined amount of the audio data prior to receiving the speech search command and performing the search for the additional data using the query term.

Подробнее
13-03-2014 дата публикации

Navigation apparatus

Номер: US20140074473A1
Принадлежит: Mitsubishi Electric Corp

A navigation apparatus capable of providing a user not only with guidance, but also with all of the guidance, operational procedure, operation screen and recognition vocabulary, that is, with an operational transition that is defined by the guidance, operational procedure, operation screen and recognition vocabulary, while altering the operational transition in accordance with the recognition vocabulary comprehension level of the user. Thus, it can increase the possibility for a user with a low recognition vocabulary comprehension level to achieve a task, or for a user with a high recognition vocabulary comprehension level to improve the comfortableness of the operation, thereby being able to provide all the users with the optimum operational transition.

Подробнее