Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 3751. Отображено 100.
16-02-2012 дата публикации

Methods and apparatus for embedding watermarks

Номер: US20120039504A1
Автор: Venugopal Srinivasan
Принадлежит: Individual

Methods and apparatus for embedding a watermark are disclosed. An example method disclosed herein to embed a watermark in a compressed data stream comprises obtaining a set of transform coefficients included in the compressed data stream, the set of transform coefficients having a respective first set of mantissa codes and a respective set of exponents, the first set of mantissa codes associated with a respective set of mantissa step sizes, identifying a first transform coefficient from the set of transform coefficients having a smallest magnitude among the set of transform coefficients, determining a second set of mantissa codes based on the first transform coefficient and the set of step sizes, and replacing the first set of mantissa codes included in the compressed data stream with the second set of mantissa codes to embed the watermark without uncompressing the compressed data stream.

Подробнее
06-02-2014 дата публикации

System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain

Номер: US20140037095A1
Принадлежит: Intellisis Corp

A system and method may be configured to process an audio signal. The system and method may track pitch, chirp rate, and/or harmonic envelope across the audio signal, may reconstruct sound represented in the audio signal, and/or may segment or classify the audio signal. A transform may be performed on the audio signal to place the audio signal in a frequency chirp domain that enhances the sound parameter tracking, reconstruction, and/or classification.

Подробнее
13-03-2014 дата публикации

Method and System for Building a Phonotactic Model for Domain Independent Speech Recognition

Номер: US20140074476A1
Автор: Giuseppe Riccardi
Принадлежит: AT&T Intellectual Property II LP

The invention concerns a method and corresponding system for building a phonotactic mode for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting die detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system daring the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.

Подробнее
07-01-2016 дата публикации

AUDIO COMMAND INTENT DETERMINATION SYSTEM AND METHOD

Номер: US20160004501A1
Принадлежит: HONEYWELL INTERNATIONAL INC.

Methods and apparatus are provided for generating aircraft cabin control commands from verbal speech onboard an aircraft. An audio command supplied to an audio input device is processed. Each word of the processed audio command is compared to words stored in a vocabulary map to determine a word type of each word. Each determined word type is processed to determine if an intent of the audio command is discernable. If the intent is discernable, an aircraft cabin control command is generated based on the discerned intent. If a partial intent is discernable, feedback is generated. 1. A method of generating aircraft cabin control commands from verbal speech onboard an aircraft , comprising the steps of:processing an audio command supplied to an audio input device, the audio command including at least one word;comparing each word of the processed audio command to words stored in a vocabulary map to determine a word type of each word, the vocabulary map comprising a predetermined set of word types; andprocessing each determined word type to determine if an intent of the audio command is discernable;if the intent is discernable, generating an aircraft cabin control command based on the discerned intent; andgenerating feedback if no or only a partial intent of the audio command is discernable.2. The method of claim 1 , wherein the step of processing each determined word type to determine if the intent of the audio command is discernable comprises:determining if the audio command includes at least a context word type and an action word type;identifying an anchor node in a normalized intent rules tree structure that corresponds to the context word type;determining if the action word type is associated with the anchor node and, if so, determining the intent therefrom.3. The method of claim 2 , wherein the normalized intent rules tree structure comprises:a root node, the root node associated with the aircraft;a plurality of context nodes, each context node corresponding to a ...

Подробнее
04-01-2018 дата публикации

PATTERN RECOGNITION DEVICE, PATTERN RECOGNITION METHOD, AND COMPUTER PROGRAM PRODUCT

Номер: US20180005087A1
Принадлежит:

According to an embodiment, a pattern recognition device is configured to divide an input signal into a plurality of elements, convert the divided elements into feature vectors having the same dimensionality to generate a set of feature vectors, and evaluate the set of feature vectors using a recognition dictionary including models corresponding to respective classes, to output a recognition result representing a class or a set of classes to which the input signal belongs. The models each include sub-models each corresponding to one of possible division patterns in which a signal to be classified into a class corresponding to the model can be divided into a plurality of elements. A label expressing a model including a sub-model conforming to the set of feature vectors, or a set of labels expressing a set of models including sub-models conforming to the set of feature vectors is output as the recognition result. 1. A pattern recognition device comprising:a division unit configured to divide an input signal into a plurality of elements;a feature extracting unit configured to convert the divided elements into feature vectors having the same dimensionality, and generate a set of feature vectors; anda recognition unit configured to evaluate the set of feature vectors using a recognition dictionary, and output a recognition result representing a class or a set of classes to which the input signal belongs, whereinthe recognition dictionary includes models corresponding to respective classes,the models each include sub-models each corresponding to one of possible division patterns in which a signal to be classified into a class corresponding to the model can be divided into a plurality of elements,each sub-model has a state corresponding to each element divided based on a division pattern corresponding to the sub-model, the state being expressed by a function of labels representing a feature vector and the state, andthe recognition unit outputs, as the recognition result, a ...

Подробнее
13-01-2022 дата публикации

Capturing device of remote warning sound component and method thereof

Номер: US20220013101A1
Автор: li-min Sun, Yi-Chang Liu
Принадлежит: AWNT Ltd

The present disclosure relates a capturing device of remote warning sound component which utilizes an audio pick-up device receives a remote sound signal in a remote range, and a processor generates a warning sound component through amplifying a sound feature point audio in a sound component according to warning voiceprint data and generates non-warning sound components through suppressing or shielding the sound feature point audio in the other sound components according to non-warning voiceprint information. Then the processor combines the warning sound component and the non-warning sound components to generate an output sound signal, allowing a speaker to output the output sound signal. Accordingly, the capturing device of the present disclosure provides instantly warning sound which is received (e.g. sound of car engine) from a remote range and outputs to allow the user in an early alert state, then reducing the probability of incident occurs thereby.

Подробнее
07-01-2021 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND PROGRAM

Номер: US20210005177A1
Принадлежит:

Implemented are an apparatus and a method for detecting misrecognition of a user speech on the basis of a subsequent interaction. The apparatus includes a voice recognition section that executes a voice recognition process on a user speech and a learning processing section that executes a process of updating a degree of confidence on the basis of an interaction made between a user and the information processing apparatus after the user speech. The degree of confidence is an evaluation value indicating the reliability of a voice recognition result of the user speech. The voice recognition section generates data on degrees of confidence in recognition of the user speech in which data plural user speech candidates based on the voice recognition result of the user speech are associated with the degrees of confidence which are evaluation values each indicating reliability of the corresponding user speech candidate. The learning processing section updates the degree-of-confidence values in the data on the degrees of confidence in recognition of the user speech, by analyzing context consistency or subject consistency in the interaction made between the user and the information processing apparatus after the user speech. 1. An information processing apparatus comprising:a voice recognition section that executes a voice recognition process on a user speech; anda learning processing section that executes a process of updating a degree of confidence, on a basis of an interaction made between a user and the information processing apparatus after the user speech, the degree of confidence being an evaluation value indicating reliability of a voice recognition result of the user speech.2. The information processing apparatus according to claim 1 , whereinthe learning processing section executes the process of updating the degree of confidence, by analyzing context consistency or subject consistency in the interaction made between the user and the information processing apparatus ...

Подробнее
27-01-2022 дата публикации

LEARNING DEVICE AND PATTERN RECOGNITION DEVICE

Номер: US20220028372A1
Принадлежит: NEC Corporation

The acoustic feature extraction means extracts an acoustic feature, using predetermined parameters, from an acoustic pattern obtained as a result of processing on an acoustic signal. The language vector calculation means calculates a language vector from a given label that represents an attribute of a source of the acoustic signal and that is associated with the acoustic pattern. The similarity calculation means calculates a similarity between the acoustic feature and the language vector. The parameter update means learns parameters so that the similarity becomes larger, and updates the predetermined parameters to the parameters obtained by learning. 1. A learning device comprising:an acoustic feature extraction unit that extracts an acoustic feature, using predetermined parameters, from an acoustic pattern obtained as a result of processing on an acoustic signal;a language vector calculation unit that calculates a language vector from a given label that represents an attribute of a source of the acoustic signal and that is associated with the acoustic pattern;a similarity calculation unit that calculates a similarity between the acoustic feature and the language vector; anda parameter update unit that learns parameters so that the similarity becomes larger, and updates the predetermined parameters to the parameters obtained by learning.2. The learning device according to claim 1 ,wherein the given label is defined for each hierarchy of category of the attribute of the source,wherein the learning device comprises,for each hierarchy of category,a parameter storage unit that stores the predetermined parameters;the acoustic feature extraction unit;the language vector calculation unit;the similarity calculation unit; andthe parameter update unit;whereinthe acoustic feature extraction unit of the highest hierarchy extracts the acoustic feature from a given acoustic pattern, using parameters stored in the parameter storage unit corresponding to the acoustic feature ...

Подробнее
11-01-2018 дата публикации

PATTERN RECOGNITION DEVICE, PATTERN RECOGNITION METHOD, AND COMPUTER PROGRAM PRODUCT

Номер: US20180012108A1
Автор: Ono Soichiro
Принадлежит:

According to an embodiment, a pattern recognition device recognizes a pattern of an input signal by converting the input signal to a feature vector and matching the feature vector with a recognition dictionary. The recognition dictionary includes a dictionary subspace basis vector for expressing a dictionary subspace which is a subspace of a space of the feature vector, and a plurality of probability parameters for converting similarity calculated from the feature vector and the dictionary subspace into likelihood. The device includes a recognition unit configured to calculate the similarity using a quadratic polynomial of a value of an inner product of the feature vector and the dictionary subspace basis vector, and calculate the likelihood using the similarity and an exponential function of a linear sum of the probability parameters. The recognition dictionary is trained by using an expectation maximization method using a constraint condition between the probability parameters. 1. A pattern recognition device that recognizes a pattern of an input signal by converting the input signal to a feature vector and matching the feature vector with a recognition dictionary , whereinthe recognition dictionary includes a dictionary subspace basis vector for expressing a dictionary subspace which is a subspace of a space of the feature vector, and a plurality of probability parameters for converting similarity calculated from the feature vector and the dictionary subspace into likelihood,the device comprising:a recognition unit configured to calculate the similarity using a quadratic polynomial of a value of an inner product of the feature vector and the dictionary subspace basis vector, and calculate the likelihood using the similarity and an exponential function of a linear sum of the probability parameters, whereinthe recognition dictionary is trained by using an expectation maximization method using a constraint condition between the probability parameters.2. The device ...

Подробнее
03-02-2022 дата публикации

METHOD AND ELECTRONIC DEVICE FOR PROVIDING SIGN LANGUAGE

Номер: US20220036625A1
Принадлежит:

A method for providing sign language is disclosed. The method includes receiving, by an electronic device, a natural language information input from at least one source for conversion into sign language. The natural language information input includes at least one sentence. The method further includes predicting, by the electronic device, an emphasis score for each word of the at least one sentence based on acoustic components. The method further includes rephrasing, by the electronic device, the at least one sentence based on the emphasis score of each of the words. The method further includes converting, by the electronic device, the at least one rephrased sentence into the sign language. The method further includes delivering, by the electronic device, the sign language. 1. A method for providing sign language , the method comprising:receiving, by an electronic device, a natural language information input from at least one source for conversion into the sign language, wherein the natural language information input comprises at least one sentence;predicting, by the electronic device, an emphasis score for each word of the at least one sentence based on acoustic components;rephrasing, by the electronic device, the at least one sentence based on the emphasis score of each of the words;converting, by the electronic device, the at least one rephrased sentence into sign language; anddelivering, by the electronic device, the sign language.2. The method as claimed in claim 1 , wherein the method comprises:determining, by the electronic device, a sound direction corresponding to at least one word from the plurality of words of the at least one input sentence; andautomatically displaying, by the electronic device, an indication indicating the sound direction while delivering the at least one word of the at least one rephrased sentence in the sign language.3. The method as claimed in claim 1 , wherein predicting claim 1 , by the electronic device claim 1 , an emphasis score ...

Подробнее
21-01-2016 дата публикации

VOICE SIGNAL MODULATION SERVICE FOR GEOGRAPHIC AREAS

Номер: US20160019912A1
Принадлежит:

Modulating a voice signal is provided. The voice signal corresponding to a voice communication is received from a sending voice communication device via a network. Voice signal features corresponding to the voice communication are extracted. A set of voice signal filters are selected to modulate the extracted voice signal features corresponding to the voice communication to an average voice signal associated with a geographic area where the voice communication is destined for. The voice signal features corresponding to the voice communication are modulated by applying the selected set of voice signal filters to generate the average voice signal associated with the geographic area where the voice communication is destined for.

Подробнее
03-02-2022 дата публикации

SPEECH ASSESSMENT USING DATA FROM EAR-WEARABLE DEVICES

Номер: US20220036878A1
Принадлежит:

A computing system may store user profile information of a user of an ear-wearable device, where the user profile information includes parameters that control operation of the ear-wearable device. The computing system may also obtain audio data from one or more sensors that are included in the ear-wearable device and determine whether to generate speech assessment data based on the user profile information of the user and audio data. In some examples, the computing system may compare one or more acoustic parameters determined based on the audio data with an acoustic criterion determined based on the user profile information of the user. If one or more acoustic parameters satisfy the acoustic criterion, the computing system may generate speech assessment data based on the determination. 1. A method comprising:storing user profile information of a user of an ear-wearable device, wherein the user profile information comprises parameters that control operation of the ear-wearable device;obtaining audio data from one or more sensors that are included in the ear-wearable device;determining whether to generate speech assessment data based on the user profile information of the user and the audio data, wherein the speech assessment data provides information regarding speech of the user; andgenerating the speech assessment data based on the determination to generate the speech assessment data.2. The method of claim 1 , wherein determining whether to generate the speech assessment data based on the user profile information of the user and the audio data further comprises:determining whether to generate speech assessment data based on sensor data or location data.3. The method of claim 1 , wherein determining whether to generate the speech assessment data based on the user profile information of the user and the audio data comprises:determining one or more acoustic parameters based on the audio data;determining an acoustic criterion based on the user profile information of the ...

Подробнее
03-02-2022 дата публикации

USER IDENTITY VERIFICATION USING VOICE ANALYTICS FOR MULTIPLE FACTORS AND SITUATIONS

Номер: US20220036905A1
Автор: JR. Robert O., Keith
Принадлежит:

A security platform architecture is described herein. A user identity platform architecture which uses a multitude of biometric analytics to create an identity token unique to an individual human. This token is derived on biometric factors like human behaviors, motion analytics, human physical characteristics like facial patterns, voice recognition prints, usage of device patterns, user location actions and other human behaviors which can derive a token or be used as a dynamic password identifying the unique individual with high calculated confidence. Because of the dynamic nature and the many different factors, this method is extremely difficult to spoof or hack by malicious actors or malware software. 1. A method programmed in a non-transitory memory of a device comprising:acquiring voice information from a user;acquiring additional information related to the voice information;analyzing the voice information and the additional information; andperforming a function based on the analysis of the voice information and additional information.2. The method of wherein the voice information includes one or more tones selected from warm claim 1 , clear claim 1 , soft claim 1 , scratchy claim 1 , mellow claim 1 , or breathiness.3. The method of wherein the voice information includes voice qualities including: pitch claim 1 , vocal fry claim 1 , strength claim 1 , rhythm claim 1 , resonance claim 1 , tempo claim 1 , texture claim 1 , or inflections.4. The method of wherein analyzing the voice information and the additional information includes machine learning.5. The method of wherein the voice information and the additional information are acquired simultaneously.6. The method of wherein the additional information includes situational information claim 1 , biometric information claim 1 , behavior information claim 1 , or environmental information.7. The method of wherein the situational information is acquired using one or more acquisition components of the device or by ...

Подробнее
17-01-2019 дата публикации

APPARATUS AND METHOD FOR GENERATING OLFACTORY INFORMATION RELATED TO MULTIMEDIA CONTENT

Номер: US20190019033A1

An apparatus for generating olfactory information related to multimedia content may comprise a processor. The processor may receive multimedia content, extract an odor image or an odor sound included in the multimedia content, and generate representative data related to the odor image or the odor sound by describing information on the extracted odor image or odor sound in a data format sharable by a media thing. 1. An olfactory information generator which generates olfactory information sharable between the real world and at least one virtual world , the olfactory information generator comprising a processor ,wherein the processor receives multimedia content, extracts an odor image or an odor sound included in the multimedia content, and generates representative data related to the odor image or the odor sound by describing information on the extracted odor image or odor sound in a data format sharable by a media thing.2. The olfactory information generator of claim 1 , wherein the processor analyzes the extracted odor image or odor sound and generates text-cased label information capable of describing an odor of the odor image or the odor sound through a semantic evaluation or an abstract process related to the analyzed odor image or odor sound.3. The olfactory information generator of claim 2 , wherein the processor updates the label information of the extracted odor image or odor sound by applying a pattern recognition technique to odor image or odor sound data included in a database related to the extracted odor image or odor sound.4. The olfactory information generator of claim 1 , wherein the processor extracts each of a plurality of odor images or odor sounds included in the multimedia content and generates the representative data by using information on each of the plurality of extracted odor images or odor sounds claim 1 , with a weight.5. The olfactory information generator of claim 1 , wherein the processor generates the representative data by using ...

Подробнее
21-01-2021 дата публикации

Speech Analysis System

Номер: US20210020164A1
Автор: SEKINE Kiyoshi
Принадлежит:

To provide a voice analysis system capable of performing voice recognition with higher accuracy. A voice analysis system including a first voice analysis terminal and a second voice analysis terminal, the first voice analysis terminal obtaining first conversation information, and the second voice analysis terminal obtaining second conversation information, wherein the voice analysis system comprises a conversation category selection unit which compares the number of related words included in the first conversation information and the number of related words included in the second conversation information, in each conversation category, and adopts the conversation category with the larger number of related words as a correct conversation category. 1. A voice analysis system comprising:a first voice analysis terminal; anda second voice analysis terminal, wherein a first term analysis unit configured to analyze a word included in a conversation to obtain first conversation information;', 'a first conversation storage unit configured to store the first conversation information analyzed by the first term analysis unit;', 'an analysis unit configured to analyze the first conversation information stored by the first conversation storage unit;', 'a presentation storage unit configured to store a plurality of presentation materials;', 'a related term storage unit configured to store related terms related to the respective presentation materials stored in the presentation storage unit; and', 'a display unit configured to display any of the presentation materials stored by the presentation storage unit,, 'the first voice analysis terminal is a terminal that includes a second term analysis unit configured to analyze the word included in the conversation to obtain second conversation information; and', 'a second conversation storage unit configured to store the second conversation information analyzed by the second term analysis unit,, 'the second voice analysis terminal is a ...

Подробнее
10-02-2022 дата публикации

ACOUSTIC EVENT DETECTION SYSTEM AND METHOD

Номер: US20220044698A1
Автор: HUANG HUNG-PIN
Принадлежит:

An acoustic event detection system and a method are provided. The system includes a voice activity detection subsystem, a database, and an acoustic event detection subsystem. The voice activity detection subsystem includes a voice receiving module, a feature extraction module, and a first determination module. The voice receiving module receives an original sound signal, the feature extraction module extracts a plurality of features from the original sound signal, and the first determination module executes a first classification process to determine whether or not the plurality of features match to a start-up voice. The acoustic event detection subsystem includes a second determination module and a function response module. The second determination module executes a second classification process to determine whether the features match to at least one of a plurality of predetermined voices. The function response module executes one of functions corresponding to the predetermined voices that is matched. 1. An acoustic event detection system , comprising: a voice receiving module configured to receive an original sound signal;', 'a feature extraction module configured to extract a plurality of features from the original sound signal; and', 'a first determination module configured to execute a first classification process to determine whether or not the plurality of features match to a start-up voice;, 'a voice activity detection subsystem, includinga database configured to store the plurality of extracted features; and a second determination module configured to, in response to the first determination module determining that the plurality of features match the start-up voice, execute a second classification process to determine whether or not the plurality of features match to at least one of a plurality of predetermined voices; and', 'a function response module configured to, in response to the second determination module determining that the plurality of features ...

Подробнее
24-01-2019 дата публикации

METHOD AND SYSTEM FOR FACILITATING DECOMPOSITION OF SOUND SIGNALS

Номер: US20190027161A1
Автор: Sharp Michael
Принадлежит:

Disclosed is a method of facilitating decomposition of sound signals. The method includes receiving multiple sound signals from multiple communication devices communicatively coupled to multiple sound recording devices. Further, the method includes processing multiple sound signals using at least one sound processing algorithm. Further, the method includes generating multiple sound layers corresponding to multiple sound signals based on the processing. Further, the method includes generating a visualization based on generating multiple sound layers. Further, the method includes transmitting the visualization corresponding to multiple sound layers to a user device. Further, the method includes storing multiple sound layers in association with multiple sound layer identifiers. Further, the method includes receiving a request for a sound layer from the user device. Further, the method includes retrieving the sound layer based on the request including the sound layer identifier. Further, the method includes transmitting the sound layer to the user device. 1. A system for facilitating decomposition of sound signals , the system comprising:a plurality of sound recording devices located in a plurality of locations of a physical space, wherein each sound recording device is configured to capture at least one sound in at least one direction; and [ communicating with each of the plurality of communication devices;', 'transmitting a visualization corresponding to a plurality of sound layers to a user device;', 'receiving a request for a sound layer from the user device, wherein the request comprises a sound layer identifier; and', 'transmitting the sound layer to the user device;, 'a central communication device configured for, processing the plurality of sound signals using at least one sound processing algorithm;', 'generating a plurality of sound layers corresponding to the plurality of sound signals based on the processing, wherein a sound layer comprises a sound ...

Подробнее
24-01-2019 дата публикации

Signal processing method of audio sensing device, and audio sensing system

Номер: US20190028799A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

A signal processing method of an audio sensing device is provided. The audio sensing device includes a plurality of resonators, at least some of the plurality of resonators having different frequency bands. The method includes setting a plurality of time frames corresponding to the plurality of resonators, and calculating a sound feature for each of the plurality of time frames, the sound feature being calculated based on an audio signal detected by each of the plurality of the resonators, wherein the plurality of time frames are set independently for each of the frequency bands, and at least some of the plurality of time frames are set to have different time intervals.

Подробнее
29-01-2015 дата публикации

NOISE ESTIMATION APPARATUS, NOISE ESTIMATION METHOD, NOISE ESTIMATION PROGRAM, AND RECORDING MEDIUM

Номер: US20150032445A1

A noise estimation apparatus which estimates a non-stationary noise component on the basis of the likelihood maximization criterion is provided. The noise estimation apparatus obtains the variance of a noise signal that causes a large value to be obtained by weighted addition of the sums each of which is obtained by adding the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a speech segment and a speech posterior probability in each frame, and the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a non-speech segment and a non-speech posterior probability in each frame, by using complex spectra of a plurality of observed signals up to the current frame. 1. A noise estimation apparatus which obtains a variance of a noise signal that causes a large value to be obtained by weighted addition of sums each of which is obtained by adding a product of a log likelihood of a model of an observed signal expressed by a Gaussian distribution in a speech segment and a speech posterior probability in each frame , and a product of a log likelihood of a model of an observed signal expressed by a Gaussian distribution in a non-speech segment and a non-speech posterior probability in each frame , by using complex spectra of a plurality of observed signals up to a current frame.2. The noise estimation apparatus according to claim 1 , wherein the variance of the noise signal claim 1 , a speech prior probability claim 1 , a non-speech prior probability claim 1 , and a variance of a desired signal that cause a large value to be obtained by weighted addition of the sums each of which is obtained by adding the product of the log likelihood of the model of the observed signal expressed by the Gaussian distribution in the speech segment and the speech posterior probability in each frame claim 1 , and the product of the log likelihood of the model of the observed signal expressed by the ...

Подробнее
28-01-2021 дата публикации

Systems and methods for detecting inmate to inmate conference calls

Номер: US20210029242A1
Автор: Stephen Lee Hodge
Принадлежит: Global Tel Link Corp

A system for detecting inmate to inmate conference calls in a correctional facility is disclosed herein. The system includes a database and a conference call detection server, wherein the conference call detection server is configured to monitor a plurality of inmate communications, convert an audio signal of each inmate communication to a frequency domain signal, identify frequency data comprising one or more frequency peaks and corresponding frequency values in the frequency domain signal for each inmate communication, generate a record comprising the frequency data for each inmate communication, resulting in a plurality of records, store the plurality of records in the database, detect an inmate to inmate conference call by matching a frequency subset of a new inmate communication with frequency data in a detected record in the database, and verify the inmate to inmate conference call by matching audio with voice biometric samples.

Подробнее
24-02-2022 дата публикации

REALISTIC ARTIFICIAL INTELLIGENCE-BASED VOICE ASSISTANT SYSTEM USING RELATIONSHIP SETTING

Номер: US20220059080A1
Принадлежит: O2O CO., LTD.

A voice conversation service is provided, wherein after user information is inputted and an initial response character according to recognition of a call word is set, when the call word or a voice command is inputted, the call word is recognized, the voice command is analyzed, an emotion of the user is identified through acoustic analysis, and a facial image of the user captured by a camera is recognized and a situation and emotion of the user are identified through gesture recognition, and thereafter, the initial response character set based on the recognized call word is set and displayed through a display unit, a voice conversation object and a surrounding environment are determined by setting a relationship between the voice command, user information, and emotion expression information, and after making the determined voice conversation object into a character, voice features are applied to provide a user-customized image and voice feedback. 1. A realistic artificial intelligence-based voice assistant system using relationship setting as a system capable of providing a realistic artificial intelligence (AI) voice assistant using relationship setting , the system comprising:a user basic information input unit that inputs user information and setting an initial response character according to call word recognition;a call word setting unit that sets a voice command call word;a voice command analysis unit that analyzes a voice command uttered by a user and grasps the user's emotions through sound analysis;an image processing unit that recognizes the user's facial image captured through a camera and grasps the user's situation and emotions through gesture recognition; anda relationship setting unit that learns image information based on user interest information and a voice command keyword acquired from the user basic information input unit by a machine learning algorithm to derive a voice conversation object, applies a voice feature matched to the derived voice ...

Подробнее
03-03-2022 дата публикации

Lung health sensing through voice analysis

Номер: US20220061694A1
Принадлежит: Hill Rom Services Pte Ltd

A patient monitoring system includes a microphone that collects audio data from a patient. The audio data is used to generate audio characteristics for categorization of the audio data and analysis of the audio data to determine a patient health status. The audio characteristics and the patient health status are tracked over time and utilized to monitor a respiratory condition of the patient. The system determines the patient health status based on a comparison of the audio characteristics with a database of audio characteristics associated with recorded health statuses of various patients. The system generates a report of the current patient health status based on the database of audio characteristics and associated health statuses.

Подробнее
15-02-2018 дата публикации

Apparatuses, methods and systems for a digital conversation management platform

Номер: US20180046923A1
Принадлежит: Newvaluexchange Ltd

The APPARATUSES, METHODS AND SYSTEMS FOR A DIGITAL CONVERSATION MANAGEMENT PLATFORM (“DCM-Platform”) transforms digital dialogue from consumers, client demands and, Internet search inputs via DCM-Platform components into tradable digital assets, and client needs based artificial intelligence campaign plan outputs. In one implementation, The DCM-Platform may capture and examine conversations between individuals and artificial intelligence conversation agents. These agents may be viewed as assets. One can measure the value and performance of these agents by assessing their performance and ability to generate revenue from prolonging conversations and/or ability to effect sales through conversations with individuals.

Подробнее
03-03-2022 дата публикации

Method for operating a hearing device based on a speech signal, and hearing device

Номер: US20220068293A1
Принадлежит: Sivantos Pte Ltd

A method for operating a hearing device on the basis of a speech signal. An acousto-electric input transducer of the hearing device records a sound containing the speech signal from surroundings of the hearing device and converts the sound into an input audio signal. A signal processing operation generates an output audio signal based on the input audio signal. At least one articulatory and/or prosodic feature of the speech signal is quantitatively acquired through analysis of the input audio signal by way of the signal processing operation, and a quantitative measure of a speech quality of the speech signal is derived on the basis of the property. At least one parameter of the signal processing operation for generating the output audio signal based on the input audio signal is set on the basis of the quantitative measure of the speech quality of the speech signal.

Подробнее
25-02-2021 дата публикации

APPARATUS AND METHOD FOR PROVIDING A FINGERPRINT OF AN INPUT SIGNAL

Номер: US20210056136A1
Принадлежит:

Embodiments provide an apparatus for providing a fingerprint of an input signal, wherein the apparatus is configured to determine intensity values for a plurality of time-frequency regions of the input signal, wherein the apparatus is configured to compare the intensity values associated with different time-frequency regions of the plurality of time-frequency regions, to obtain individual values of the fingerprint based on the comparison of intensity values associated with two time-frequency regions. 1. An apparatus for providing a fingerprint of an input signal ,wherein the apparatus is configured to determine intensity values for a plurality of time-frequency regions of the input signal,wherein the apparatus is configured to compare the intensity values associated with different time-frequency regions of the plurality of time-frequency regions, to acquire individual values of the fingerprint based on the comparison of intensity values associated with two time-frequency regions,wherein the plurality of time-frequency regions are defined by a rotating kernel, rotating around a spectral bin of a time-frequency representation of the input signal.2. The apparatus according to claim 1 ,wherein the plurality of time-frequency regions c overlap each other.3. The apparatus according to claim 1 ,wherein the plurality of time-frequency regions are centered around the same frequency of the input signal.4. The apparatus according to claim 1 ,wherein the plurality of time-frequency regions are centered around the same spectral bin of a time-frequency representation of the input signal.5. The apparatus according to claim 4 ,wherein the rotating kernel extends over at least two spectral bins of the time-frequency representation of the input signal.6. The apparatus according to claim 1 ,wherein the intensity values are energy values.7. The apparatus according to claim 1 ,wherein the apparatus is configured to compare the intensity values associated with different time-frequency ...

Подробнее
22-02-2018 дата публикации

ENHANCED ACCURACY OF USER PRESENCE STATUS DETERMINATION

Номер: US20180052985A1
Принадлежит:

Technologies are described herein for enhancing a user presence status determination. Visual data may be received from a depth camera configured to be arranged within a three-dimensional space. A current user presence status of a user in the three-dimensional space may be determined based on the visual data. A previous user presence status of the user may be transformed to the current user presence status, responsive to determining the current user presence status of the user. 1. A method for enhancing a user presence status determination , the method comprising:receiving user presence data for a user, the user presence data comprising one or more of login data, input device data, meeting information from a calendar, or information indicative of usage of a mobile device associated with the user;determining for the user, based at least in part on the user presence data, a first user presence status from a plurality of user presence statuses;receiving at least one of visual data from a camera or audio data from a microphone or location data from a geolocation system;determining for the user, based at least in part on at least one of the visual data or the audio data or the location data, a second user presence status from the plurality of user presence statuses, the second user presence status being different from the first user presence status; andupdating the first user presence status of the user to the second user presence status, the second user presence status being accessible by additional users.2. The method of claim 1 , wherein receiving the location data from the geolocation system includes receiving the location data from a global positioning system (GPS) receiver.3. The method of claim 1 , wherein receiving the location data from the geolocation system comprises receiving the location data from an Internet Protocol (IP) address tracking system.4. The method of claim 1 , wherein receiving the visual data from the camera comprises receiving the visual data ...

Подробнее
10-03-2022 дата публикации

AUDIO PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM

Номер: US20220076692A1
Автор: DENG Shuo

The embodiments of this application disclose an audio processing method and apparatus, an electronic device, and a storage medium. In the embodiments of this application, a current playback environment of audio may be obtained; audio recognition may be performed on ambient sound of the current playback environment in a case that the current playback environment is in a foreground state; foreground sound in the ambient sound may be determined according to an audio recognition result; the foreground sound in the ambient sound may be classified to determine a type of the foreground sound; and audio mixing may be performed on the foreground sound and the audio to obtain mixed playback sound based on the type of the foreground sound. 1. An audio processing method , executed by an electronic device , and comprising:obtaining a current playback environment of audio;performing audio recognition on ambient sound of the current playback environment in a case that the current playback environment is in a foreground state;determining foreground sound in the ambient sound according to an audio recognition result;classifying the foreground sound in the ambient sound to determine a type of the foreground sound; andperforming audio mixing on the foreground sound and the audio to obtain a mixed playback sound based on the type of the foreground sound.2. The method according to claim 1 , wherein the performing audio recognition on ambient sound of the current playback environment in a case that the current playback environment is in a foreground state comprises:sampling the ambient sound of the current playback environment in a case that the current playback environment is in the foreground state;extracting a Mel-frequency cepstrum coefficient feature of the ambient sound obtained by the sampling, to obtain a Mel feature of the ambient sound; andperforming audio recognition on the Mel feature of the ambient sound by using an adaptive discriminant network.3. The method according to ...

Подробнее
01-03-2018 дата публикации

EFFICIENT APPARATUS AND METHOD FOR AUDIO SIGNATURE GENERATION USING RECOGNITION HISTORY

Номер: US20180062778A1
Принадлежит:

Audio information is monitored by a user device that performs audio content recognition of any received audio content. The user device includes a scheduling logic unit, a probe, and an audio signature generator. The scheduling logic unit maintains a set of scheduling rules that define conditions that were present when previous audio content recognition of audio content received by the user device was successful. The scheduling logic unit receives currently present conditions of the user device, and compares the currently present conditions to the set of scheduling rules to determine if the currently present conditions match any scheduling rules. The user devices captures ambient audio content via the probe and generates audio signatures of the captured audio content using the audio signature generator if a match occurs, and inhibits capturing audio content by the user device and/or inhibits generating audio signatures if a match does not occur. 1. A method for monitoring audio information by a user device , the user device performing audio content recognition of any received audio content , the user device including (i) a scheduling logic unit , (ii) a probe that captures ambient audio content , and (iii) an audio signature generator that generates audio signatures for subsequent use in audio content recognition , the method comprising:(a) maintaining in the scheduling logic unit a set of scheduling rules that define conditions that were present when previous audio content recognition of audio content received by the user device is determined as being successful, the determination of successful audio content recognition being if an audio signature of the audio content previously received by the user device matches a previously stored audio signature, wherein the previous audio content is audio content of media content that is broadcast to multiple users, and wherein the set of scheduling rules are not associated with scheduled broadcast times of any audio content;(b ...

Подробнее
17-03-2022 дата публикации

SYSTEMS AND METHODS FOR VOICE AUDIO DATA PROCESSING

Номер: US20220084525A1
Автор: GUO Yunsan, YU Yichen

The present disclosure may provide a voice audio data processing system. The voice audio data processing system may obtain voice audio data, which includes one or more voices, each being respectively associated with one of one or more subjects. For one of the one or more voices and the subject associated with the voice, the voice audio processing system may generate a text based on the voice audio data. The text may have one or more sizes, each size corresponding to one of one or more volumes of the voice. The text may have one or more colors, each color corresponding to one of one or more emotion types of the voice. 1. A system , comprising:at least one storage device including a set of instructions; and{'claim-text': ['obtain voice audio data, which includes one or more voices, each being respectively associated with one of one or more subjects; and', {'claim-text': ['the text has one or more sizes, each size corresponding to one of one or more volumes of the voice, and', 'the text has one or more colors, each color corresponding to one of one or more emotion types of the voice.'], '#text': 'for one of the one or more voices and the subject associated with the voice, generate a text based on the voice audio data, wherein:'}], '#text': 'at least one processor in communication with the at least one storage device, wherein when executing the set of instructions, the at least one processor is configured to cause the system to:'}2. The system of claim 1 , wherein the at least one processor is further configured to cause the system to instruct a display device to display the text.3. The system of claim 2 , wherein to determine one of the one or more emotion types and the color corresponding to the emotion type claim 2 , the at least one processor is configured to cause the system to:determine, based on the voice audio data, the emotion type with a trained emotion determination model; anddetermine the color corresponding to the emotion type based on the emotion type.4. ...

Подробнее
12-03-2015 дата публикации

METHOD AND SYSTEM FOR AUTOMATICALLY DETECTING MORPHEMES IN A TASK CLASSIFICATION SYSTEM USING LATTICES

Номер: US20150073792A1
Автор: Riccardi Giuseppe
Принадлежит:

The invention concerns a method and corresponding system for building a phonotactic model for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model, detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting the detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system during the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication. 1. A method comprising:recognizing phonemes using a body movement and a current phonotactic model, to yield recognized phonemes; the recognized phonemes; and', 'a prior probability distribution associated with a domain; and, 'creating, via a processor, a new phonotactic model using morphemes detected fromreplacing the current phonotactic model with the new phonotactic model in a database.2. The method of claim 1 , wherein the new phonotactic model is associated with a second domain which is distinct from the domain of the prior probability distribution.3. The method of claim 1 , wherein the recognizing of the phonemes is further based on an environment of a user.4. The method of claim 1 , wherein the morphemes are expressed in multimodal form.5. The method of claim 1 , wherein the morphemes in the user input are derived from an action of a user.6. The method of claim 5 , wherein the action of the user comprises a focus of attention of the user.7. The method of claim 1 , wherein the new phonotactic model is used when recognizing phonemes for a future input from a user.8. The method of claim 1 , operating in conjunction with one of a telephone network claim 1 , Internet claim 1 , an intranet claim 1 , a cable television network claim 1 , a local area network claim 1 , and a wireless communication ...

Подробнее
15-03-2018 дата публикации

AUDIO IDENTIFICATION BASED ON DATA STRUCTURE

Номер: US20180075140A1
Принадлежит:

Example systems and methods represent audio using a sequence of two-dimensional (2D) Fourier transforms (2DFTs), and such a sequence may be used by a specially configured machine to perform audio identification, such as for cover song identification. Such systems and methods are robust to timbral changes, time skews, and pitch skews. In particular, a special data structure provides a time-series representation of audio, and this time-series representation is robust to key changes, timbral changes, and small local tempo deviations. Accordingly, the systems and methods described herein analyze cross-similarity between these time-series representations. In some example embodiments, such systems and methods extract features from an audio fingerprint and calculate a distance measure that is robust and invariant to changes in musical structure. 1. A computerized method comprising:accessing, using one or more hardware processors, reference audio to be represented by a reference data structure to be generated and stored in a reference database; performing a constant Q transform on multiple reference time slices of the reference audio;', 'binarizing the constant Q transformed reference time slices of the reference audio;', 'performing a two-dimensional Fourier transform on multiple reference time windows within the binarized and constant Q transformed reference time slices of the reference audio to obtain two-dimensional Fourier transforms of the reference time windows; and', 'sequentially ordering the two-dimensional Fourier transforms of the reference time windows in the reference data structure; and, 'generating, using the one or more hardware processors, the reference data structure from the reference audio by at leastcreating, within the reference database, a data association between the reference audio and the generated reference data structure that includes the sequentially ordered two-dimensional Fourier transforms of the reference time windows, the created data ...

Подробнее
07-03-2019 дата публикации

REAL-TIME VOCAL FEATURES EXTRACTION FOR AUTOMATED EMOTIONAL OR MENTAL STATE ASSESSMENT

Номер: US20190074028A1
Автор: Howard Newton
Принадлежит:

Embodiments of the present systems and methods may provide techniques for extracting vocal features from voice signals to determine an emotional or mental state of one or more persons, such as to determine a risk of suicide and other mental health issues. For example, as a person's mental state may indirectly alters his or her speech, suicidal risk in, for example, hotline calls, may be determined through speech analysis. In embodiments, such techniques may include preprocessing of the original recording, vocal feature extraction, and prediction processing. For example, in an embodiment, a computer-implemented method of determining an emotional or mental state of a person, the method comprising acquiring an audio signal relating to a conversation including the person, extracting signal components relating to an emotional or mental state of at least the person, and outputting information characterizing the extracted emotional or mental state of the person. 1. A computer-implemented method of determining an emotional or mental state of a person , the method comprising:acquiring an audio signal relating to a conversation including the person;extracting signal components relating to an emotional or mental state of at least the person; andoutputting information characterizing the extracted emotional or mental state of the person.2. The method of claim 1 , wherein acquiring the audio signal relating to a conversation comprises:recording a conversation between a caller to suicide help line and a counselor of the suicide help line.3. The method of claim 1 , wherein the signal components relating to emotional intent of at least one party comprises:extracting signal features from the audio signal comprising discriminative speech indicators, which differentiate between speech and silence;determining which extracted signal features to use; andenhancing the robustness of the determination against background noise.4. The method of claim 3 , wherein:determining which extracted ...

Подробнее
15-03-2018 дата публикации

ANALYZING CHANGES IN VOCAL POWER WITHIN MUSIC CONTENT USING FREQUENCY SPECTRUMS

Номер: US20180075866A1
Принадлежит: Microsoft Technology Licensing, LLC

Technologies are described for identifying familiar or interesting parts of music content by analyzing changes in vocal power using frequency spectrums. For example, a frequency spectrum can be generated from digitized audio. Using the frequency spectrum, the harmonic content and percussive content can be separated. The vocal content can then be separated from the harmonic and/or percussive content. The vocal content can then be processed to identify surge points in the digitized audio. In some implementations, the vocal content is included in the harmonic content during the separation procedure and is then separated from the harmonic content. 111-. (canceled)12. A method , implemented by a computing device , the method comprising:obtaining audio music content in a digitized format;generating a frequency spectrum of at least a portion of the music content;analyzing the frequency spectrum to separate harmonic content and percussive content;using results of the analysis, generating an audio track representing vocal content within the music content;processing the audio track representing vocal content to identify at least one surge point within the music content; andoutputting an indication of the at least one surge point.13. The method of wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises:performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content.14. The method of wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises: generating the frequency spectrum using a short-time Fourier transform (STFT) with a first frequency resolution; and', 'performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content; and, 'in a first pass applying an STFT with a second frequency resolution to the harmonic content produced in the first pass; and', 'performing median filtering to ...

Подробнее
16-03-2017 дата публикации

Systems and methods for managing, analyzing, and providing visualizations of multi-party dialogs

Номер: US20170078479A1
Принадлежит: Cogito Corp

Systems and methods are provided for managing and analyzing multi-party dialogs (e.g., call) between communication devices. A digital connection is established with each of a plurality of communication devices. The connection is switched between the communication devices from a POTS connection to digital connections, enabling the communication devices to communicate with each other via the computing device over the digital connections. Audio signals are part of a multi-party dialog between users of the plurality of communication devices. The received audio signals are split into corresponding first signals and second signals. The first signals are transmitted to the plurality of communication devices and are analyzed to produce measurements of features of the second signals. Feedback data is transmitted to at least one of the plurality of communication devices.

Подробнее
22-03-2018 дата публикации

METHODS AND SYSTEM FOR REDUCING FALSE POSITIVE VOICE PRINT MATCHING

Номер: US20180082690A1
Принадлежит:

The methods, apparatus, and systems described herein are designed to reduce false positive voice print matching with fraudulent callers. A voice print of a call is created and compared to known voice prints to determine if it matches one or more of the known voice prints, and to transaction data associated with a database of voice prints. The methods include a pre-processing step to separate speech from non-speech, selecting a number of elements that affect the voice print the most, and/or generating a first score based on the number of selected audio elements matching audio elements of a voice print from the plurality of fraudulent speakers, determining if the first score exceeds a predetermined threshold score for the fraudulent speaker, and comparing the selected audio elements for the unknown caller, where the score exceeds the predetermined threshold score, to the voice prints associated with the customer account. 1. A method of reducing false positive matches in voice prints which comprises:receiving an audio communication from an unknown caller, separating a first portion of the audio communication into silent and non-silent segments, and evaluating the non-silent segments to determine which portions thereof are speech or non-speech;generating a plurality of parameters that determine what is speech and non-speech in the non-silent segments;using the generated parameters to determine what is speech and non-speech for at least the remainder of the telephonic communications;comparing the speech to selected audio elements of a background model that characterizes the speech of the unknown caller relative to a plurality of other audio elements of the background model;comparing the selected audio elements of the speech to matching audio elements of a recorded voice print from a plurality of fraudulent speakers to determine whether the speech belongs to a fraudulent speaker;generating a first score based on the number of selected audio elements matching audio ...

Подробнее
25-03-2021 дата публикации

SERIAL FFT-BASED LOW-POWER MFCC SPEECH FEATURE EXTRACTION CIRCUIT

Номер: US20210090553A1
Автор: SHAN Weiwei, ZHU Lixuan
Принадлежит:

It discloses a serial FFT-based low-power MFCC speech feature extraction circuit, and belongs to the technical field of calculation, reckoning or counting. The circuit is oriented toward the field of intelligence, and is adapted to a hardware circuit design by optimizing an MFCC algorithm, and a serial FFT algorithm and an approximation operation on a multiplication are fully used, thereby greatly reducing a circuit area and power. The entire circuit includes a preprocessing module, a framing and windowing module, an FFT module, a Mel filtration module, and a logarithm and DCT module. The improved FFT algorithm uses a serial pipeline manner to process data, and a time of an audio frame is effectively utilized, thereby reducing a storage area and operation frequency of the circuit under the condition of meeting an output requirement. 1. A serial FFT-based low-power MFCC speech feature extraction circuit , comprising:a pre-emphasis module for preprocessing an input speech sequence;a framing and windowing module for performing framing and windowing operations on the pre-processed speech sequence;an FFT module for performing Fourier transform layer by layer and packet by packet on the sequence data subjected to framing and windowing operations and then outputting complex data subjected to bit permutation, wherein each layer of the Fourier transform performs two times of serial packeting and then butterfly operation on the input data, to output a product of the last butterfly operation output data and a twiddle factor to a next layer of the Fourier transform;a Mel filtration module for extracting an energy value of a complex output by the FFT module and performing multi-stage Mel filtration on the energy value to obtain a Mel value;a logarithm taking module for taking a logarithm value on the Mel value with 2 as a base through a lookup table; anda DCT module for performing DCT on the logarithm value of the Mel value with 2 as the base.2. The serial FFT-based low-power ...

Подробнее
25-03-2021 дата публикации

METHOD AND DEVICE FOR ANALYZING REAL-TIME SOUND

Номер: US20210090593A1
Автор: Park Han, Ryu Myeong Hoon
Принадлежит:

A real-time sound analysis device according to an embodiment of the present disclosure includes: an input unit for collecting a sound generated in real, time, a signal processor for processing the collected real-time sound data for easy machine learning, a first trainer for training a first function for distinguishing sound category information by learning the previously collected sound data in a machine learning manner; and a first classifier for classifying sound data signal processed by the first function into a sound category. According to an embodiment of the present disclosure, it is possible to learn the category and cause of a sound collected in real time based on machine learning, and more accurate prediction of the category and cause of the sound collected in real time is possible. 1. A reel-time sound analysis device based on artificial intelligence , the real-time sound analysis device comprising:an input unit configured to collect a sound generated in real time;a signal processor configured to process collected real-time sound data for easy machine learning;a first trainer configured to train a first function for distinguishing sound category information by learning previously collected sound data in a machine learning manner; anda first classifier configured to classify sound data signal processed by the first function into a sound category.2. The real-time sound analysis device of claim 1 , comprising:a first communicator configured to transmit and receive information about sound data,wherein the first communicator transmits the signal processed sound data to an additional analysis device.3. The real-time sound analysis device of claim 2 , wherein the first communicator receives a result of analyzing a sound cause through a second function trained by deep learning from the additional analysis device.4. The real-time sound analysis device of claim 1 , wherein the first trainer complements the first function by learning the real-time sound data in a ...

Подробнее
19-03-2020 дата публикации

AUTOMATIC DETERMINATION OF TIMING WINDOWS FOR SPEECH CAPTIONS IN AN AUDIO STREAM

Номер: US20200090678A1
Принадлежит:

The technology disclosed herein may determine timing windows for speech captions of an audio stream. In one example, the technology may involve accessing audio data comprising a plurality of segments; determining, by a processing device, that one or more of the plurality of segments comprise speech sounds; identifying a time duration for the speech sounds; and providing a user interface element corresponding to the time duration for the speech sounds, wherein the user interface element indicates an estimate of a beginning and ending of the speech sounds and is configured to receive caption text associated with the speech sounds of the audio data. 1. A method comprising:accessing audio data comprising a plurality of segments;determining, by a processing device, that one or more of the plurality of segments comprise speech sounds;identifying a time duration for the speech sounds; andproviding a user interface element corresponding to the time duration for the speech sounds, wherein the user interface element indicates an estimate of a beginning and ending of the speech sounds and is configured to receive caption text associated with the speech sounds of the audio data.2. The method of claim 1 , further comprising:inputting the plurality of segments of the audio data into a speech classifier for classification, wherein the speech classifier generates a set of raw scores representing likelihoods that respective segments include occurrences of a speech sound;generating binary scores for the audio data based on the set of raw scores, wherein one of the binary scores is generated based on an aggregation of raw scores from consecutive series of the segments of the audio data; andgenerating a timing window for one or more of the speech sounds in the audio data based on the binary scores, wherein the timing window indicates the estimate of a beginning time and an ending time of the one or more speech sounds in the audio data.3. The method of claim 2 , wherein inputting the ...

Подробнее
01-04-2021 дата публикации

LINEAR PREDICTION ANALYSIS DEVICE, METHOD, PROGRAM, AND STORAGE MEDIUM

Номер: US20210098009A1

An autocorrelation calculation unit 21 calculates an autocorrelation R(i) from an input signal. A prediction coefficient calculation unit 23 performs linear prediction analysis by using a modified autocorrelation R′(i) obtained by multiplying a coefficient w(i) by the autocorrelation R(i). It is assumed here, for each order i of some orders i at least, that the coefficient w(i) corresponding to the order i is in a monotonically increasing relationship with an increase in a value that is negatively correlated with a fundamental frequency of the input signal of the current frame or a past frame. 1. A linear prediction analysis method of obtaining , in each frame , which is a predetermined time interval , coefficients to be transformed to linear prediction coefficients corresponding to an input time-series signal , the linear prediction analysis method comprising:a step of receiving the input time-series signal, the time-series signal being a speech signal or an acoustic signal;{'sub': o', 'o', 'o', 'o', 'o', 'o', 'max, 'an autocorrelation calculation step of calculating an autocorrelation R(i) between an input time-series signal X(n) of a current frame and an input time-series signal X(n−i) i samples before the input time-series signal X(n) or an input time-series signal X(n+i) i samples after the input time-series signal X(n), for each i of i=0, 1, . . . , Pat least; and'}{'sub': max', 'o', 'o', 'o, 'a prediction coefficient calculation step of calculating coefficients to be transformed to first-order to P-order linear prediction coefficients, by using a modified autocorrelation R′(i) obtained by multiplying a coefficient w(i) by the autocorrelation R(i) for each i,'}{'b': 0', '1', '2, 'sub': t0', 't1', 't2', 't0', 't1', 't2', 't0', 't1', 't2', 't0', 't1', 't2, 'wherein a coefficient table t stores a coefficient w(i), a coefficient table t stores a coefficient w(i) and a coefficient table t stores a coefficient w(i), w(i) Подробнее

28-03-2019 дата публикации

SYSTEM AND METHOD FOR CLUSTER-BASED AUDIO EVENT DETECTION

Номер: US20190096424A1
Принадлежит:

Methods, systems, and apparatuses for audio event detection, where the determination of a type of sound data is made at the cluster level rather than at the frame level. The techniques provided are thus more robust to the local behavior of features of an audio signal or audio recording. The audio event detection is performed by using Gaussian mixture models (GMMs) to classify each cluster or by extracting an i-vector from each cluster. Each cluster may be classified based on an i-vector classification using a support vector machine or probabilistic linear discriminant analysis. The audio event detection significantly reduces potential smoothing error and avoids any dependency on accurate window-size tuning. Segmentation may be performed using a generalized likelihood ratio and a Bayesian information criterion, and the segments may be clustered using hierarchical agglomerative clustering. Audio frames may be clustered using K-means and GMMs. 1. A computer-implemented method for audio event detection , comprising:partitioning, by a computer, an audio signal into a plurality of audio frames;clustering, by the computer, the plurality of audio frames into a plurality of clusters containing audio frames having similar features; anddetecting, by the computer utilizing a supervised classifier, an audio event in at least one cluster of the plurality of clusters.2. The computer-implemented method of claim 1 , wherein the computer utilizes K-means to partition the audio signal into the plurality of audio frames.3. The computer-implemented method of claim 1 , wherein the computer utilizes at least one Gaussian mixture model to cluster the plurality of audio frames to the plurality of clusters.4. The computer-implemented method of claim 1 , further comprising:extracting, by the computer, an i-vector for the at least one cluster; anddetecting, by the computer, the audio event in the at least one cluster based upon the extracted i-vector.5. The computer-implemented method of claim ...

Подробнее
08-04-2021 дата публикации

PHONEME SOUND BASED CONTROLLER

Номер: US20210104225A1
Автор: Borgeat Frédéric
Принадлежит:

Disclosed herein is a phoneme sound based controller apparatus including: a sound input for receiving a sound signal; a phoneme sound detection module connected to the sound input to determine if at least one phoneme is detected in the sound signal; a dictionary containing at least one word, the word including at least one syllable, the syllable including the at least one phoneme; a grammar containing at least one rule, the at least one rule containing the at least one word, the at least one rule further containing at least one control action. At least one control action is taken if the at least one phoneme is detected in the sound input signal by the phoneme sound detection module. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. 2. The apparatus according to claim 1 , further comprising a detection output for providing a signal representing the determination by the phoneme sound detection module.3. The apparatus according to claim 2 , further comprising a speech recognition engine connected to the sound input claim 2 , the speech recognition engine providing a speech recognition context including the at least one word if the speech recognition engine recognizes the presence of the at least one word in the sound input.4. The apparatus according to claim 3 , further comprising a result output claim 3 , the result output including the at least one word if the detection output indicates that the at least one phoneme is detected in the input signal and the at least one word is recognized in the sound input.5. The apparatus according to claim 2 , further comprising a result output claim 2 , the result output including the at least one word if the detection output indicates that the at least one phoneme is detected in the input signal.6. The apparatus according to claim 1 , wherein the phoneme sound detection ...

Подробнее
23-04-2015 дата публикации

APPARATUSES, METHODS AND SYSTEMS FOR A DIGITAL CONVERSATION MANAGEMENT PLATFORM

Номер: US20150112666A1
Принадлежит:

The APPARATUSES, METHODS AND SYSTEMS FOR A DIGITAL CONVERSATION MANAGEMENT PLATFORM (“DCM-Platform”) transforms digital dialogue from consumers, client demands and, Internet search inputs via DCM-Platform components into tradable digital assets, and client needs based artificial intelligence campaign plan outputs. In one implementation, The DCM-Platform may capture and examine conversations between individuals and artificial intelligence conversation agents. These agents may be viewed as assets. One can measure the value and performance of these agents by assessing their performance and ability to generate revenue from prolonging conversations and/or ability to effect sales through conversations with individuals. 1. A digital conversation generating processor-implemented method , comprising:instantiating a conversational artificial-intelligence agent;identifying an individual target for conversation;initiating a conversation with the individual target by the artificial-intelligence agent by providing a first portion of a conversational dialogue to the individual target;recording a response from the individual target to the first portion of the conversational dialogue; andresponding to the response from the individual target with a next contextual portion of the conversational dialogue.220-. (canceled) This application is a continuation of and hereby claims priority under 35 USC §120 to U.S. patent application Ser. No. 13/887,115, filed May 3, 2013 and entitled “Apparatuses, Methods And Systems For A Digital Conversation Management Platform”, which in turn claims priority under 35 USC §120 to co-pending U.S. non-provisional patent application Ser. No. 13/558,914, filed on Jul. 26, 2012, entitled “Apparatuses, Methods And Systems For A Digital Conversation Management Platform”, and is a continuation of co-pending U.S. non-provisional patent application Ser. No. 13/013,158, filed on Jan. 25, 2011, entitled “Apparatuses, Methods And Systems For A Digital Conversation ...

Подробнее
29-04-2021 дата публикации

Method and apparatus for speech analysis

Номер: US20210125627A1
Автор: Dahae KIM
Принадлежит: LG ELECTRONICS INC

Disclosed are method and apparatus for speech analysis. The speech analysis apparatus and a server are capable of communicating with each other in a 5G communication environment by executing mounted artificial intelligence (AI) algorithms and/or machine learning algorithms. The speech analysis method and apparatus may collect and analyze speech data to build a database of structured speech data.

Подробнее
02-04-2020 дата публикации

SOUND PLAYBACK INTERVAL CONTROL METHOD, SOUND PLAYBACK INTERVAL CONTROL PROGRAM, AND INFORMATION PROCESSING APPARATUS

Номер: US20200104652A1
Автор: Sankoda Satoru
Принадлежит: FUJITSU LIMITED

A sound playback interval control method performed by a computer is provided for a speech recognition system. The method includes: arranging and displaying a word block subjected to correction and confirmation in a central portion of a first area on a display screen, the first area being an area in which a plurality of word blocks generated by using morphological analysis from a character string obtained by speech recognition are displayed, and performing playback control on sound of the word block subjected to correction and confirmation displayed in the first area. 1. A sound playback interval control method performed by a computer , the method comprising:arranging and displaying a word block subjected to correction and confirmation in a first area on a display screen, the first area being an area in which a plurality of word blocks generated by using morphological analysis from a character string obtained by speech recognition are displayed, andperforming playback control on sound of the word block subjected to correction and confirmation displayed in the first area.2. The sound playback interval control method according to claim 1 , wherein the word block subjected to correction and confirmation is arranged in a central portion of the first area.3. The sound playback interval control method according to claim 2 , whereinin the arranging and displaying, in response to an operation of confirming the word block subjected to correction and confirmation, the word block arranged in the central portion of the first area is changed to a next word block, andin the process of performing playback control, playback control is performed on sound of an interval corresponding to the first area in which the next word block is arranged in the central portion after the change.4. The sound playback interval control method according to claim 1 , the method further comprising:displaying a character string obtained by speech recognition for a range broader than the plurality of word ...

Подробнее
05-05-2016 дата публикации

COMPUTERIZED TOOL FOR CREATING VARIABLE LENGTH PRESENTATIONS

Номер: US20160124909A1
Принадлежит:

A computer based tool and method for automatically producing, from an existing presentation, a new presentation that fits within a specific presentation duration based on the priority associated with each element within the existing presentation and the presentation time for each individual element. 1. A computer based tool for automatically generating a modified presentation of a specified presentation time based upon an original presentation of a longer presentation time than the specified presentation time , comprising:one or more processors coupled to non-transient program and data storage; and retrieve the original presentation comprising a plurality of slides, each having at least one element;', 'assign a priority to each element within the plurality of slides;', 'calculate a speech rate based upon a speech sample obtained from a user;', 'determine a presentation time associated with each element based on the calculated speech rate; and', 'automatically generate the modified presentation of the specified presentation time based on the assigned priority of each element and the determined presentation time associated with each element, wherein higher priority elements are included in the modified presentation before lower priority elements., 'a non-transient program executable by the one or more processors to cause the one or more processors to2. The tool of claim 1 , wherein the speech rate is calculated by measuring the production of syllables in the speech sample.3. The tool of claim 2 , wherein the speech rate is calculated by measuring the production of words in the speech sample.4. A computer based tool for automatically analyzing an original presentation having an original presentation timing length and generating a new presentation that fits within a specific modified presentation timing length claim 2 , comprising:one or more processors coupled to non-transient program and data storage; and retrieve a first presentation comprising at least two slides, ...

Подробнее
27-05-2021 дата публикации

ELECTRONIC DEVICE AND METHOD FOR CONTROLLING THE SAME, AND STORAGE MEDIUM

Номер: US20210158801A1
Автор: PARK Jihun, SEOK Dongheon
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

Disclosed is an electronic device recognizing an utterance voice in units of individual characters. The electronic device includes: a voice receiver; and a processor configured to: obtain a recognition character converted from a character section of a user voice received through the voice receiver, and recognize a candidate character having high acoustic feature related similarity with the character section among a plurality of acquired candidate characters as an utterance character of the character section based on a confusion possibility with the acquired recognition character. 1. An electronic device , comprising:a voice receiver; and obtain a recognition character converted from a character section of a user voice received through the voice receiver, and', 'identify a candidate character having a high acoustic feature related similarity with the character section among acoustic features of a plurality of candidate characters as an utterance character of the character section based on a confusion possibility with the obtained recognition character., 'a processor configured to2. The electronic device of claim 1 , wherein the processor is configured to:convert the user voice received through the voice receiver into a character string, anddivide the character string into each character.3. The electronic device of claim 2 , wherein the processor is configured to analyze whether a pause section exists between characters of the character string.4. The electronic device of claim 3 , wherein the processor is configured to assign a lower weight to the confusion possibility of the recognition character in which the pause section exists than when no pause section exists.5. The electronic device of claim 1 , further comprising:a memory,wherein the processor is configured to store history information associated with an identification result of the candidate character in the memory.6. The electronic device of claim 5 , wherein the processor is configured to detect a confusion ...

Подробнее
27-05-2021 дата публикации

METHOD AND DEVICE FOR EVALUATING PERFORMANCE OF SPEECH ENHANCEMENT ALGORITHM, AND COMPUTER-READABLE STORAGE MEDIUM

Номер: US20210158832A1
Принадлежит:

A method for evaluating performance of a speech enhancement algorithm includes: acquiring a first speech signal including noise and a second speech signal including noise, wherein the first speech signal is acquired from a near-end audio acquisition device close to a sound source, the second speech signal is acquired from a far-end audio acquisition device far from the sound source, and the near-end audio acquisition device is closer to the sound source than the far-end audio acquisition device; acquiring a pseudo-pure speech signal based on the first speech signal and the second speech signal, as a reference speech signal; enhancing the second speech signal by using a preset speech enhancement algorithm, to obtain a denoised speech signal to be tested; and acquiring a correlation coefficient between the reference speech signal and the speech signal to be tested, for evaluating the speech enhancement algorithm. 1. A method for evaluating performance of a speech enhancement algorithm , comprising:acquiring a first speech signal including noise and a second speech signal including noise, wherein the first speech signal is acquired from a near-end audio acquisition device close to a sound source, the second speech signal is acquired from a far-end audio acquisition device far from the sound source, and the near-end audio acquisition device is closer to the sound source than the far-end audio acquisition device;acquiring a pseudo-pure speech signal based on the first speech signal and the second speech signal, as a reference speech signal;enhancing the second speech signal by using a preset speech enhancement algorithm, to obtain a denoised speech signal to be tested; andacquiring a correlation coefficient between the reference speech signal and the denoised speech signal to be tested, wherein the correlation coefficient is used for evaluating the speech enhancement algorithm.2. The method according to claim 1 , wherein acquiring the pseudo-pure speech signal based on ...

Подробнее
19-05-2016 дата публикации

LINEAR PREDICTION ANALYSIS DEVICE, METHOD, PROGRAM, AND STORAGE MEDIUM

Номер: US20160140975A1

An autocorrelation calculation unit calculates an autocorrelation R(i) from an input signal. A prediction coefficient calculation unit performs linear prediction analysis by using a modified autocorrelation R′(i) obtained by multiplying a coefficient w(i) by the autocorrelation R(i). It is assumed here, for each order i of some orders i at least, that the coefficient w(i) corresponding to the order i is in a monotonically increasing relationship with an increase in a value that is negatively correlated with a fundamental frequency of the input signal of the current frame or a past frame. 1. A linear prediction analysis method of obtaining , in each frame , which is a predetermined time interval , coefficients that can be transformed to linear prediction coefficients corresponding to an input time-series signal , the linear prediction analysis method comprising:{'sub': O', 'O', 'O', 'O', 'O', 'O', 'max, 'an autocorrelation calculation step of calculating an autocorrelation R(i) between an input time-series signal X(n) of a current frame and an input time-series signal X(n−i) i samples before the input time-series signal X(n) or an input time-series signal X(n+i) i samples after the input time-series signal X(n), for each i of i=0, 1, . . . , Pat least; and'}{'sub': max', 'O', 'max', 'O', 'max', 'O', 'max, 'a prediction coefficient calculation step of calculating coefficients that can be transformed to first-order to P-order linear prediction coefficients, by using a modified autocorrelation R′(i) (i=0, 1, . . . , P) obtained by multiplying a coefficient w(i) (i=0, 1, . . . , P) by the autocorrelation R(i) (i=0, 1, . . . , P) for each i,'}{'sub': 'O', 'for each order i of some orders i at least, the coefficient w(i) corresponding to the order i being in a monotonically increasing relationship with an increase in a period, a quantized value of the period, or a value that is negatively correlated with a fundamental frequency based on the input time-series signal of the ...

Подробнее
28-05-2015 дата публикации

VOICE INPUT CORRECTION

Номер: US20150149163A1
Принадлежит:

An embodiment provides a method, including: accepting, at an audio receiver of an information handling device, voice input of a user; interpreting, using a processor, the voice input; thereafter receiving, at the audio receiver, repeated voice input of the user; identifying a correction using the repeated voice input; and correcting, using the processor, the voice input using the repeated voice input, wherein the corrective voice input does not include a predetermined voice command. Other aspects are described and claimed. 1. A method , comprising:accepting, at an audio receiver of an information handling device, voice input of a user;interpreting, using a processor, the voice input;thereafter receiving, at the audio receiver, repeated voice input of the user;identifying a correction using the repeated voice input; andcorrecting, using the processor, the voice input using the repeated voice input, wherein the corrective voice input does not include a predetermined voice command.2. The method of claim 1 , further comprising receiving the repeated voice input more than once.3. The method of claim 2 , further comprising refining the correction based additional repeated voice input.4. The method of claim 1 , wherein the correcting comprises identifying in the repeated voice input an audio characteristic that differs from an audio characteristic of the voice input of the user.5. The method of claim 4 , wherein the correcting further comprises analyzing the audio characteristic of the repeated voice input to determine a correction is needed.6. The method of claim 5 , wherein the correcting further comprises analyzing the audio of the repeated voice input to determine the correction.7. The method of claim 5 , further comprising displaying an interpretation of the voice input to the user;wherein the correcting takes place after the displaying.8. The method of claim 1 , further comprising prompting the user for repeated voice input;wherein the prompting takes place prior to ...

Подробнее
28-05-2015 дата публикации

DYNAMIC SELECTION AMONG ACOUSTIC TRANSFORMS

Номер: US20150149167A1
Принадлежит: GOOGLE INC.

Aspects of this disclosure are directed to accurately transforming speech data into one or more word strings that represent the speech data. A speech recognition device may receive the speech data from a user device and an indication of the user device. The speech recognition device may execute a speech recognition algorithm using one or more user and acoustic condition specific transforms that are specific to the user device and an acoustic condition of the speech data. The execution of the speech recognition algorithm may transform the speech data into one or more word strings that represent the speech data. The speech recognition device may estimate which one of the one or more word strings more accurately represents the received speech data. 1. A method comprising:receiving speech data from a user device;receiving an indication of the user device;executing a speech recognition algorithm that selectively retrieves, from one or more storage devices, a plurality of pre-stored user and acoustic condition specific transforms based on the received indication of the user device, and that utilizes the received speech data as an input into pre-stored mathematical models of the retrieved plurality of pre-stored user and acoustic condition specific transforms to convert the received speech data into one or more word strings that each represent at least a portion of the received speech data, wherein each one of the plurality of pre-stored user and acoustic condition specific transforms is a transform that is both specific to the user device and specific to one acoustic condition from among a plurality of different acoustic conditions, wherein each of the different acoustic conditions comprises a context in which the speech data could have been provided, and wherein each of the plurality of pre-stored user and acoustic condition specific transforms and each of the pre-stored mathematical models that are utilized to convert the received speech data into the one or more word ...

Подробнее
09-05-2019 дата публикации

AUTOMATIC AUDIO DUCKING WITH REAL TIME FEEDBACK BASED ON FAST INTEGRATION OF SIGNAL LEVELS

Номер: US20190138262A1
Принадлежит:

Various embodiments describe audio signal processing. In an example, a computer system generates metrics, such as RMS levels, for audio slices from a foreground audio signal. A summed-area table is generated from the metrics. An observation window is used to determine whether to add a key frame or not. The observation window includes a set of audio slices. A total metrics, such as an average RMS level, is computed for the audio slices in the observation window. Based on the total metric, the computer system adds a key frame. The key frame references audio ducking parameters applicable to a background audio signal. 1. A computer-implemented method for audio signal processing , the method comprising:accessing, by a multimedia editing application hosted on a computing device, a first audio signal associated with a foreground label, wherein the foreground label indicates that the first audio signal is a foreground audio signal;generating, by the multimedia editing application, metrics corresponding to audio slices of the first audio signal and indicating values for an audio property of the first audio signal, wherein each metric corresponds to an audio slice, indicates a value for the audio property in the audio slice, and is generated based on an audio signal of the audio slice;computing, by the multimedia editing application, a total metric for an audio slice based on a set of the metrics corresponding to a set of the audio slices, wherein the set of the audio slices includes the audio slice; andadding, by the multimedia editing application, a key frame to a track based on the total metric, wherein the track organizes a presentation of the first audio signal and of a second audio signal having a background label, wherein a location of the key frame corresponds to a location of the audio slice on the track, and wherein the key frame indicates a change to the audio property of the second audio signal at the location on the track.2. The computer-implemented method of ...

Подробнее
30-04-2020 дата публикации

SYSTEMS AND METHODS FOR AUTOMATIC DETERMINATION OF INFANT CRY AND DISCRIMINATION OF CRY FROM FUSSINESS

Номер: US20200135229A1
Принадлежит: LENA Foundation

A method including receiving one or more datasets of audio data of a key child captured in a natural sound environment of the key child. The method also includes segmenting each of the one or more datasets of audio data to create audio segments. The audio segments include cry-related segments and non-cry segments. The method additionally includes determining periods of the cry-related segments that satisfy one or more threshold non-sparsity criteria. The method further includes performing a classification on the periods to classify each of the periods as either a cry period or a fussiness period. Other embodiments are described. 1. A system comprising:one or more processors; and receiving one or more datasets of audio data of a key child captured in a natural sound environment of the key child;', 'segmenting each of the one or more datasets of audio data to create audio segments, the audio segments comprising cry-related segments and non-cry segments;', 'determining periods of the cry-related segments that satisfy one or more threshold non-sparsity criteria; and', 'performing a classification on the periods to classify each of the periods as either a cry period or a fussiness period., 'one or more non-transitory computer-readable media storing computing instructions configured to run on the one or more processors and perform2. The system of claim 1 , wherein the computing instructions are further configured to perform:outputting metrics based on the classification of the periods.3. The system of claim 1 , wherein the periods are separated by gaps of at least a first time-duration threshold without any cry-related segments.4. The system of claim 3 , wherein the first time-duration threshold is approximately 15 seconds to approximately 5 minutes.5. The system of claim 1 , wherein the one or more threshold non-sparsity criteria comprise a first non-sparsity criterion in which each of the periods that satisfies the first non-sparsity criterion has a period duration ...

Подробнее
16-05-2019 дата публикации

Vocal Cord Stroboscopy

Номер: US20190142264A1
Принадлежит: KARL STORZ Imaging, Inc.

A system, method, scope device, and camera control module device for a stroboscopic laryngoscope, to enable scopes able to operate with a rolling shutter-type image sensor. With selected pulsing of a strobe light, subset image data from adjacent rolling-shutter frames is selected, gain compensated for missing light and combined into a new single video frame. 1. A medical scope system comprising:a stroboscopic laryngoscope;a microphone;a camera control module including an electronic controller communicatively coupled to the stroboscopic laryngoscope and the microphone;a display device communicatively coupled to the camera control module; andwherein the camera control module electronic controller is operable for:measuring a patient's vocalization with the microphone and determining a base frequency of the vocalization;during the patient's vocalization, causing a light emitter of the stroboscopic laryngoscope to pulse at a timing interval selected based on the base frequency of the vocalization;reading image data from an image sensor array of the stroboscopic laryngoscope according to a rolling shutter process including: (a) creating two or more light emitter pulses during first and second adjacent image frames, (b) reading the image data from lines of the image sensor array offset in time such that at least two of the two or more light emitter pulses each expose sensor pixels in both first and a second image frames simultaneously, and (c) selecting a first subset of the image data from the first frame and a second subset of the image sensor data from the second frame, the second subset including a different frame portion than the first subset, the first and second subsets including data resulting from the simultaneous exposure of the first and second frames; andcombining the image data from the first and second subsets to create a combined image frame based on the first and second frames.2. The system of claim 1 , in which the first subset includes only data scanned ...

Подробнее
04-06-2015 дата публикации

Methods and apparatus for identifying fraudulent callers

Номер: US20150154961A1
Принадлежит: Mattersight Corp

The methods, apparatus, and systems described herein are designed to identify fraudulent callers. A voice print of a call is created and compared to known voice prints to determine if it matches one or more of the known voice prints, and to transaction data associated with a database of voice prints. The methods include a pre-processing step to separate speech from non-speech, selecting a number of elements that affect the voice print the most, and/or computing an adjustment factor based on the scores of each received voice print against known voice prints.

Подробнее
02-06-2016 дата публикации

METHOD FOR IMPROVING ACOUSTIC MODEL, COMPUTER FOR IMPROVING ACOUSTIC MODEL AND COMPUTER PROGRAM THEREOF

Номер: US20160155438A1
Принадлежит:

Embodiments include methods and systems for improving an acoustic model. Aspects include acquiring a first standard deviation value by calculating standard deviation of a feature from first training data and acquiring a second standard deviation value by calculating standard deviation of a feature from second training data acquired in a different environment from an environment of the first training data. Aspects also include creating a feature adapted to an environment where the first training data is recorded, by multiplying the feature acquired from the second training data by a ratio obtained by dividing the first standard deviation value by the second standard deviation value. Aspects further include reconstructing an acoustic model constructed using training data acquired in the same environment as the environment of the first training data using the feature adapted to the environment where the first training data is recorded. 1. (canceled)2. (canceled)3. (canceled)4. (canceled)5. (canceled)6. (canceled)7. (canceled)8. (canceled)9. A computer for improving an acoustic model , comprising:a standard deviation value calculating unit for calculating standard deviation of a first feature from first training data to acquire a first standard deviation value and calculating standard deviation of a second feature from second training data acquired in a different environment from an environment of the first training data to acquire a second standard deviation value;a feature creating unit for creating a modified feature adapted to the environment where the first training data is recorded, by multiplying the second feature acquired from the second training data by a ratio obtained by dividing the first standard deviation value by the second standard deviation value; andan acoustic model reconstructing unit for reconstructing an acoustic model constructed using training data acquired in the same environment as the environment of the first training data, using the modified ...

Подробнее
31-05-2018 дата публикации

INFORMATION PROCESSING DEVICE, METHOD OF INFORMATION PROCESSING, AND PROGRAM

Номер: US20180150279A1
Принадлежит:

To provide a technology capable of allowing a user to find whether speech is uttered with a volume at which speech recognition can be performed. Provided is an information processing device including: a determination portion configured to determine a user-uttered speech volume on the basis of input speech; and a display controller configured to control a display portion so that the display portion displays a display object. The display controller causes the display portion to display a first motion object moving toward the display object when the user-uttered speech volume exceeds a speech recognizable volume. 1. An information processing device comprising:a determination portion configured to determine a user-uttered speech volume on the basis of input speech; anda display controller configured to control a display portion so that the display portion displays a display object,wherein the display controller causes the display portion to display a first motion object moving toward the display object when the user-uttered speech volume exceeds a speech recognizable volume.2. The information processing device according to claim 1 ,wherein the determination portion determines a user-uttered speech source direction, andthe display controller causes the display portion to display the first motion object on the basis of the user-uttered speech source direction.3. The information processing device according to claim 2 , further comprising:a speech recognition portion configured to acquire a recognition string by performing speech recognition on input speech from the user-uttered speech source direction.4. The information processing device according to claim 3 ,wherein the display controller causes the display portion to display the recognition string.5. The information processing device according to claim 1 ,wherein the determination portion determines a noise volume on the basis of the input speech, andthe display controller causes the display portion to display a second ...

Подробнее
15-09-2022 дата публикации

SYSTEMS AND METHODS FOR AUTHENTICATION USING SOUND-BASED VOCALIZATION ANALYSIS

Номер: US20220293123A1
Принадлежит: Covid Cough, Inc.

Systems and methods of the present disclosure enable authentication and/or anomaly detection using machine learning-based modelling. Audio recordings that represent audio from a forced cough vocalizations are received from a user device. One or more audio filters extract forced cough vocalization recordings from the audio recordings and signal data signatures representative of the forced cough vocalization recordings are generated. Gaussian mixture models are produced for each unique combination of the signal data signatures, where each unique combination include a group of model baselines and a test match baseline. Each Gaussian mixture model is used to produce a match value for the associated test match baseline based on the associated model baselines, and a statistical score is determined for each match value. One or more baseline Gaussian mixture models are determined based on the statistical score and stored in a user profile. 1. A method comprising: 'wherein the plurality of audio recordings represent audio from a plurality of forced cough vocalizations produced by the user;', 'receiving, by at least one processor, a plurality of audio recordings from a user device associated with a user;'}utilizing, by the at least one processor, at least one audio filter to extract a plurality of forced cough vocalization recordings from the plurality of audio recordings;generating, by the at least one processor, a plurality of signal data signatures representative of the plurality of forced cough vocalization recordings; a first group of non-repeated signal data signatures of the plurality of signal data signatures and', 'a second group of non-repeated signal data signatures of the plurality of signal data signatures;, 'generating, by the at least one processor, a plurality of combinations of signal data signatures, wherein each combination of signal data signatures of the plurality of combinations of signal data signatures comprises a unique combination ofgenerating, by ...

Подробнее
31-05-2018 дата публикации

METHOD AND DEVICE FOR SEARCHING ACCORDING TO SPEECH BASED ON ARTIFICIAL INTELLIGENCE

Номер: US20180151183A1
Автор: LI CHAO, LI Xiangang, SUN Jue
Принадлежит:

A method and a device for searching according to a speech based on artificial intelligence are provided. The method includes: identifying an input speech of a user to determine whether the input speech is a child speech; filtrating a searched result obtained according to the input speech to obtain a filtrated searched result, if the input speech is the child speech; and feeding the filtrated searched result back to the user. 1. A method for searching according to a speech based on artificial intelligence , comprising:identifying, by at least one computing device, an input speech of a user to determine whether the input speech is a child speech;filtrating, by the at least one computing device, a searched result obtained according to the input speech to obtain a filtrated searched result, if the input speech is the child speech; andfeeding, by the at least one computing device, the filtrated searched result back to the user.2. The method according to claim 1 , wherein filtrating claim 1 , by the at least one computing device claim 1 , a searched result obtained according to the input speech comprises:converting, by the at least one computing device, the input speech into a text content;obtaining, by the at least one computing device, the searched result by searching according to the text content; andfiltrating, by the at least one computing device, the searched result to remove a sensitive content unsuitable for a child.3. The method according to claim 2 , wherein obtaining claim 2 , by the at least one computing device claim 2 , the searched result by searching according to the text content comprises:searching, by the at least one computing device, according to the text content in a first database pre-established for children; andsearching, by the at least one computing device, according to the text content in a second database to obtain the searched result, if no content related to the input speech is searched in the first database.4. The method according to claim 1 ...

Подробнее
07-05-2020 дата публикации

METHODS AND DEVICES FOR OBTAINING AN EVENT DESIGNATION BASED ON AUDIO DATA

Номер: US20200143823A1
Принадлежит: MINUT AB

A method performed by a processing node (), comprising the steps of: i. obtaining (), from at least one communication device (), audio data () associated with a sound and storing () the audio data () in the processing node (), ii. Obtaining () an event designation () associated with the sound and storing () the event designation () in the processing node (), iii. determining () a model () which associates the audio data () with the event designation () and storing the model (), and iv. Providing () the model () to the communication device (). A method performed by the communication device (), as well as a processing node (), a communication device (), a system () and computer programs for performing the methods are also described. 118-: (canceled)19. A method performed by a processing node , comprising the steps of:i. obtaining first audio data from at least one communication device, and storing the first audio data in the processing node;ii. obtaining an event designation associated with the first audio data, and storing the event designation in the processing node;iii. determining a model that associates the first audio data with the event designation, and storing the model; andiv. providing the model to the communication device.20. The method according to claim 19 , wherein:step (i) comprises obtaining a first plurality of audio data from a plurality of communication devices, and storing the first plurality of audio data in the processing node;step (ii) comprises obtaining a first plurality of event designations associated with the first plurality of audio data, and storing the first plurality of event designations in the processing node;step (iii) comprises determining a first plurality of models, each model associating one of the first plurality of audio data with one of the first plurality of event designations, and storing the first plurality of models; andstep (iv) comprises providing the first plurality of models to the plurality of communication devices.22 ...

Подробнее
01-06-2017 дата публикации

Information processing apparatus, computer readable storage medium, and information processing method

Номер: US20170154639A1
Принадлежит: Fujitsu Ltd

An information processing apparatus including: a memory, and a processor coupled to the memory and the processor configured to: detect a plurality of sounds in sound data captured in a space within a specified period, classify the plurality of sounds into a plurality of kinds of sound based on similarities of the plurality of sounds respectively, and determine a state of a person in the space within the specified period based on counts of the plurality of kinds of sound.

Подробнее
17-06-2021 дата публикации

SOUND MONITORING SYSTEM

Номер: US20210183227A1
Принадлежит: CONSERVATION LABS, INC.

Converting a sound to a sound signature and then interpreting the signature based on a machine learning analytical approach. Generally, “interpreting” means quantifying and classifying. A system for identifying statuses of one or more target objects may comprise a device for observing sounds comprising a sound detector, a housing affixing the sound detector on, or in the vicinity of a target object, a processor, a power supply, and a device interface. The system may further comprise a data transmitter, a remote server for receiving data from one or more devices for observing sound of a target object and/or the surrounding environment, a plurality of server-side applications applying analytical operations to the data, and a plurality of end-user devices for accessing the data through a plurality of user interfaces. The status identification system can be used to detect statuses and events of target objects. 1. A device for identifying the status of a target object by observing sound in the human audible range , comprising:a housing;a first microphone mounted in the housing and located in a vicinity of the target object;a structure comprising a sound-isolating material mounted in the housing;a processor;and a power source;wherein the sound is generated by the target object or the surrounding environment;wherein the microphone detects a sound in a human audible range in the vicinity of the target object and converts the sound to a digital data; andwherein the device identifies a status of the target object by applying a plurality of machine learning algorithm to the digital data.2. The device of claim 1 , wherein the sound is generated by the target object and the surrounding environment.3. The device of claim 1 , wherein the first microphone is facing the target object and the structure comprising a sound-isolating material is a sound chamber in contact with the target object.4. The device of claim 1 , further comprising a second microphone that is mounted in the ...

Подробнее
16-05-2019 дата публикации

Systems and methods for detecting inmate to inmate conference calls

Номер: US20190149655A1
Автор: Stephen Lee Hodge
Принадлежит: Global Tel Link Corp

A system for detecting inmate to inmate conference calls in a correctional facility is disclosed herein. The system includes a database and a conference call detection server, wherein the conference call detection server is configured to monitor a plurality of inmate communications, convert an audio signal of each inmate communication to a frequency domain signal, identify frequency data comprising one or more frequency peaks and corresponding frequency values in the frequency domain signal for each inmate communication, generate a record comprising the frequency data for each inmate communication, resulting in a plurality of records, store the plurality of records in the database, detect an inmate to inmate conference call by matching a frequency subset of a new inmate communication with frequency data in a detected record in the database, and verify the inmate to inmate conference call by matching audio with voice biometric samples.

Подробнее
07-06-2018 дата публикации

SYSTEM AND COMPUTER-BASED METHOD FOR SIMULATING A HUMAN-LIKE CONTROL BEHAVIOUR IN AN ENVIRONMENTAL CONTEXT

Номер: US20180157957A1
Принадлежит:

A computer-based method for simulating a human-like decision in an environmental context, comprising: capturing environmental data with at least one sensor, realising a computer based method for realising a bi-directional compression of high dimensional data by compressing the data into a lower-dimensional map, if environmental data are captured during a learning phase of a computer-based model, evaluate the map of compressed data by determining the quality of the map by how well it separates data with different properties, the captured data corresponding to known pre-recorded data that have been pre-evaluated, if environmental data are captured after the learning phase, add new point to the compressed data and generate a signal indicating which human-like decision to use to correspond to the state of the operator. 2. The computer based method of claim 1 , further comprising an extraction of attributes from at least one data before initiating the bi-directional compression.3. The computer based method according to claim 1 , comprising a compression of a new point m belonging to the first space x on the second space y by solving the following equation:{'br': None, 'i': y', 'd', 'A', 'P,m', 'A', 'Q,t, 'sub': tϵy', {'sup2': 'n'}, '1', 'x', '2', 'y, '{tilde over ()}=argmin((μ()),(μ()))\u2003\u2003(Equation 2)'} {'br': None, 'i': x', 'd', 'A', 'P,m', 'A', 'Q,t, 'sub': mϵx', {'sup2': 'n'}, '1', 'x', '2', 'y, '{tilde over ()}=argmin((μ()),(μ()))\u2003\u2003(Equation 3)'}, 'and a decompression of a new point t belonging to the second space y on the first space x by solving the following equation{'sub': x', 'y, 'where μ(P,m) and μ(Q,t) are the distance vectors between the points of BiMap and the new points in the corresponding respective spaces, BiMap being a correspondence between points of the first space x and points of the second space y represented by equation 2 and equation 3, which Q points are calculated from points P by solving equation 1.'}4. A computer based ...

Подробнее
22-09-2022 дата публикации

Unsupervised keyword spotting and word discovery for fraud analytics

Номер: US20220301554A1
Автор: Hrishikesh RAO
Принадлежит: Pindrop Security Inc

Embodiments described herein provide for a computer that detects one or more keywords of interest using acoustic features, to detect or query commonalities across multiple fraud calls. Embodiments described herein may implement unsupervised keyword spotting (UKWS) or unsupervised word discovery (UWD) in order to identify commonalities across a set of calls, where both UKWS and UWD employ Gaussian Mixture Models (GMM) and one or more dynamic time-warping algorithms. A user may indicate a training exemplar or occurrence of call-specific information, referred to herein as “a named entity,” such as a person's name, an account number, account balance, or order number. The computer may perform a redaction process that computationally nullifies the import of the named entity in the modeling processes described herein.

Подробнее
23-05-2019 дата публикации

SIGNAL DETECTION DEVICE, SIGNAL DETECTION METHOD, AND SIGNAL DETECTION PROGRAM

Номер: US20190156853A1
Принадлежит: NEC Corporation

A signal detection device includes a compression unit which compresses an activation matrix by adding to each column an element of the row corresponding to a basis mapped to the information of the same acoustic element in an activation matrix computed by non-negative matrix factorization using a basis matrix, using the information of acoustic elements making up an acoustic event, mapped to the basis constituting the basis matrix. 1. A signal detection device comprising:a compression unit configured to compress an activation matrix by adding to each column an element of the row corresponding to a basis mapped to the information of the same signal element in the activation matrix computed by non-negative matrix factorization using a basis matrix, using the information of signal elements making up a signal pattern, mapped to the basis constituting the basis matrix.2. The signal detection device according to claim 1 , comprising:a detection unit configured to detect a signal pattern included in a signal corresponding to a spectrogram formed of the activation matrix using the compressed activation matrix and a detection model used to detect the signal pattern.3. The signal detection device according to claim 1 , comprising:a generation unit configured to generate the basis matrix including the bases respectively corresponding to the signal elements by performing the non-negative matrix factorization to the spectrogram including the signal element forming the signal pattern so as to satisfy a predetermined condition.4. The signal detection device according to claim 3 , comprising:an analysis unit configured to perform the non-negative matrix factorization to a spectrogram associated with information indicating whether the corresponding signal is a signal to be detected using the basis matrix generated by the generation unit.5. The signal detection device according to claim 4 , comprising:a learning unit configured to learn the detection model using the activation matrix ...

Подробнее
18-06-2015 дата публикации

APPARATUSES, METHODS AND SYSTEMS FOR A DIGITAL CONVERSATION MANAGEMENT PLATFORM

Номер: US20150170671A1
Принадлежит:

The APPARATUSES, METHODS AND SYSTEMS FOR A DIGITAL CONVERSATION MANAGEMENT PLATFORM (“DCM-Platform”) transforms digital dialogue from consumers, client demands and, Internet search inputs via DCM-Platform components into tradable digital assets, and client needs based artificial intelligence campaign plan outputs. In one implementation, The DCM-Platform may capture and examine conversations between individuals and artificial intelligence conversation agents. These agents may be viewed as assets. One can measure the value and performance of these agents by assessing their performance and ability to generate revenue from prolonging conversations and/or ability to effect sales through conversations with individuals. 120-. (canceled)21. A digital conversation generation processor-implemented method , comprising:creating a interactive computer dialogue agent; populating the created interactive computer dialogue agent application on a dialogue platform;receiving a dialogue action from an individual target via the dialogue platform;generating a dialogue line in response to the received dialogue action via the interactive computer dialogue agent;recording an interactive dialogue comprising the dialogue action and the generated dialogue line;determining a plurality of parameters associated with the interactive dialogue;allocating a value point to each dialogue element of the interactive dialogue;receiving pricing information from an ad exchange; adjusting the allocated value point of each dialogue element based on the received pricing information;creating a digital conversation asset comprising at least the retrieved interactive dialogue associated with the allocated value point to each dialogue element of the interactive dialogue;instantiating the created digital conversation asset; determining a key term of the digital conversation asset;determining a value for the digital conversation asset at least based on the determined key term of the digital conversation asset and ...

Подробнее
24-06-2021 дата публикации

METHOD AND APPARATUS FOR PROCESSING DATA

Номер: US20210192288A1
Принадлежит:

Embodiments of the present disclosure provide a method and apparatus for processing data. The method may include: acquiring a sample set; inputting a plurality of target samples in the sample set into a pre-trained first natural language processing model, respectively, to obtain prediction results output from the pre-trained first natural language processing model; determining the obtained prediction results as labels of the target samples in the plurality of target samples, respectively; and training a to-be-trained second natural language processing model, based on the plurality of target samples and the labels of the target samples to obtain a trained second natural language processing model, parameters in the first natural language processing model being more than parameters in the second natural language processing model. 1. A method for processing data , the method comprising:acquiring a sample set, wherein samples in the sample set are unlabeled sentences;inputting a plurality of target samples in the sample set into a pre-trained first natural language processing model, respectively, to obtain prediction results output from the pre-trained first natural language processing model;determining the obtained prediction results as labels of the target samples in the plurality of target samples, respectively; andtraining a to-be-trained second natural language processing model, based on the plurality of target samples and the labels of the target samples to obtain a trained second natural language processing model, wherein parameters in the first natural language processing model are more than parameters in the second natural language processing model.2. The method according to claim 1 , wherein the label of the target sample is used to indicate a probability that the target sample belongs to any one of at least two types.3. The method according to claim 1 , wherein the method further comprises:replacing a target word of the sample in the sample set with a ...

Подробнее
24-06-2021 дата публикации

METHOD AND APPARATUS FOR VOICE CONVERSION AND STORAGE MEDIUM

Номер: US20210193160A1
Принадлежит:

The present disclosure discloses a voice conversion method. The method includes: obtaining a to-be-converted voice, and extracting acoustic features of the to-be-converted voice; obtaining a source vector corresponding to the to-be-converted voice from a source vector pool, and selecting a target vector corresponding to the target voice from the target vector pool; obtaining acoustic features of the target voice output by the voice conversion model by using the acoustic features of the to-be-converted voice, the source vector corresponding to the to-be-converted voice, and the target vector corresponding to the target voice as an input of the voice conversion model; and obtaining the target voice by converting the acoustic features of the target voice using a vocoder. In addition, a voice conversion apparatus and a storage medium are also provided. 1. A voice conversion method , comprising steps of:obtaining a to-be-converted voice, and extracting acoustic features of the to-be-converted voice;obtaining a source vector corresponding to the to-be-converted voice from a source vector pool, and selecting a target vector corresponding to the target voice from the target vector pool;obtaining acoustic features of the target voice output by the voice conversion model by using the acoustic features of the to-be-converted voice, the source vector corresponding to the to-be-converted voice, and the target vector corresponding to the target voice as an input of the voice conversion model; andobtaining the target voice by converting the acoustic features of the target voice using a vocoder.2. The method of claim 1 , wherein the step of obtaining the source vector corresponding to the to-be-converted voice from the source vector pool claim 1 , and selecting the target vector corresponding to the target voice from the target vector pool comprises:obtaining a source voice identifier corresponding to the to-be-converted voice, and obtaining the source vector corresponding to the ...

Подробнее
24-06-2021 дата публикации

AUDIO RECOGNITION METHOD, DEVICE AND SERVER

Номер: US20210193167A1
Автор: Jiang Tao
Принадлежит:

An audio recognition method, comprising: acquiring an audio file to be recognized (S); extracting audio feature information of the audio file to be recognized, the audio feature information including audio fingerprints (S); searching, in a fingerprint index database, audio attribute information matched with the audio feature information, the fingerprint index database including an audio fingerprint set in which invalid audio fingerprint removal has been performed on audio sample data (S). As the audio fingerprint set in the fingerprint index database has been subjected to invalid audio fingerprint removal of audio sample data, the storage space of audio fingerprints in the fingerprint index database can be reduced, and the audio recognition efficiency can be improved. Further provided are an audio recognition device and a server. 1. An audio recognition method , comprising:acquiring an audio file to be recognized;extracting audio feature information of the audio file to be recognized, wherein the audio feature information comprises audio fingerprints; andsearching audio attribute information matched with the audio feature information, in a fingerprint index database;wherein, the fingerprint index database comprises an audio fingerprint set in which invalid audio fingerprints have been removed from audio sample data.2. The audio recognition method according to claim 1 , wherein the fingerprint index database comprises the audio fingerprint set in which invalid audio fingerprints have been removed from the audio sample data by a classifier.3. The audio recognition method according to claim 2 , wherein the classifier is established through following operations:extracting feature point data of audio data in a training data set as first feature point data;performing an audio attack on the audio data in the training data set, and extracting feature point data of audio data in the training data set after performing the audio attack as second feature point data;comparing ...

Подробнее
30-05-2019 дата публикации

SOUND PROCESSING DEVICE AND METHOD

Номер: US20190164534A1
Принадлежит:

The present technology relates to a sound processing device and a method that can present progress of sound reproduction. The sound processing device includes a control unit for controlling a sound output that aurally expresses progress of sound reproduction with respect to an entirety of the sound reproduction according to the reproduction of a sound. The present technology can be applied to a sound speech progress presentation UI system. 1. A sound processing device , comprisinga control unit configured to control a sound output that aurally expresses progress of sound reproduction with respect to an entirety of the sound reproduction according to reproduction of a sound.2. The sound processing device according to claim 1 , whereinthe sound is a spoken sound based on a speech text.3. The sound processing device according to claim 1 , whereinthe control unit controls the sound output that expresses the progress by using a sound image position.4. The sound processing device according to claim 3 , whereinthe control unit controls the sound output in which an orientation position of a sound image differs in each reproduction section including a speech of a presentation item and the sound image moves toward a predetermined direction according to the progress of the sound reproduction.5. The sound processing device according to claim 4 , whereinthe control unit identifies the reproduction section corresponding to a specified direction on the basis of metadata including information indicating reproduction start time of the reproduction section of the sound and information related to a direction of the sound image in the reproduction section, and operates to start reproducing the sound from the specified reproduction section.6. The sound processing device according to claim 5 , whereina range including the direction of the sound image in the reproduction section is defined for each reproduction section so that the reproduction section including the presentation item with ...

Подробнее
21-05-2020 дата публикации

Behavior identification method, behavior identification device, non-transitory computer-readable recording medium recording therein behavior identification program, machine learning method, machine learning device, and non-transitory computer-readable recording medium recording therein machine learning program

Номер: US20200160218A1
Автор: Ko Mizuno, Kousuke ITAKURA

In a behavior identification method, surrounding sound is acquired, a feature value that is specified by a spectrum pattern included in spectrum information generated from sound made by a person performing a predetermined behavior is extracted from the sound acquired, the predetermined behavior is identified by the feature value, and information indicating the predetermined behavior identified is output.

Подробнее
01-07-2021 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM

Номер: US20210201929A1
Автор: Uesaka Toshimitsu
Принадлежит: SONY CORPORATION

Advice information indicating an action that a user should take to succeed in retried speech recognition is generated and presented. An information processing apparatus therefore includes a speech recognition success/failure determination unit that determines success or failure of speech recognition of a user's speech input, a normal response generation unit that generates normal response information to be presented to the user in a case where it is determined in the determination that the speech recognition has succeeded, and an advice information generation unit that generates advice information to be presented to the user in a case where it is determined in the determination that the speech recognition has failed due to a surrounding environment of the user. 1. An information processing apparatus comprising:a speech recognition success/failure determination unit that determines success or failure of speech recognition of a user's speech input;a normal response generation unit that generates normal response information to be presented to the user in a case where it is determined in the determination that the speech recognition has succeeded; andan advice information generation unit that generates advice information to be presented to the user in a case where it is determined in the determination that the speech recognition has failed due to a surrounding environment of the user.2. The information processing apparatus according to claim 1 , further comprisinga response control unit that selects the normal response information in a case where a result indicating that the speech recognition has succeeded has been acquired as a result of the determination, and selects the advice information in a case where a result indicating that the speech recognition has failed has been acquired.3. The information processing apparatus according to claim 2 , further comprisinga response information presentation unit that presents information selected by the response control unit to ...

Подробнее
01-07-2021 дата публикации

REAL-TIME VERBAL HARASSMENT DETECTION SYSTEM

Номер: US20210201934A1
Автор: Han Kun, Lyu Ying
Принадлежит:

In some cases, a verbal harassment detection system may use machine learning models to detect verbal harassment in real-time or near real-time. The system may receive an audio segment comprising a portion of audio captured by a microphone located within a vehicle. Further, the system may convert the audio segment to a text segment. The system may provide at least the text segment to a prediction model associated with verbal harassment detection to obtain a harassment prediction. Further, the system may provide the audio segment to an emotion detector to obtain a detected emotion of a speaking user that made an utterance included in the audio segment. Based at least in part on the harassment prediction and the detected emotion, the system may automatically, and without user intervention, determine whether a user is being harassed. 1. A computer-implemented method of predicting an occurrence of harassment of a user of a ride-sharing application , the computer-implemented comprising: receiving an audio segment comprising a portion of audio captured by a microphone located within a vehicle providing a ride to a user of a ride-sharing application associated with a ride-sharing service;', 'converting the audio segment to a text segment;', 'accessing a prediction model associated with verbal harassment detection;', 'providing at least the text segment to the prediction model to obtain a harassment prediction;', 'providing the audio segment to an emotion detector to obtain a detected emotion of a speaking user that made an utterance included in the audio segment; and', 'determining based at least in part on the harassment prediction and the detected emotion that the user is being harassed., 'as implemented by an interactive computing system comprising one or more hardware processors and configured with specific computer-executable instructions,'}2. The computer-implemented method of claim 1 , wherein the prediction model comprises at least one of a hierarchical attention ...

Подробнее
01-07-2021 дата публикации

METHOD AND SYSTEMS FOR SPEECH THERAPY COMPUTER-ASSISTED TRAINING AND REPOSITORY

Номер: US20210202096A1
Принадлежит:

A computerized system and method of improving a speech therapy process include receiving a signal representing an utterance spoken by the patient, determining a first score reflecting a correlation of the signal and a patient acoustic model, determining a second score reflecting a correlation of the signal and the reference acoustic model, determining a progress metric reflecting a difference between the first score and the second score and responsively providing in real-time an indication of progress to the patient, wherein the indication is presented in at least one of an audio and a visual manner. 1. A computing system comprising at least one processor and at least one memory communicatively coupled to the at least one processor , the memory comprising computer-readable instructions that when executed by the at least one processor cause the computing system to implement a method of speech therapy assessment comprising:training a first acoustic model according to speech of a patient;training a second acoustic model according to speech of a reference speaker;receiving a signal representing an utterance spoken by the patient;determining a first score reflecting a correlation of the signal and the first speech recognition acoustic model;determining a second score reflecting a correlation of the signal and the second speech recognition acoustic model;determining a progress metric reflecting a relationship between the first score and the second score; and,responsively to the progress metric, providing in real-time an indication of progress to the patient, wherein the indication is presented in at least one of an audio and a visual manner.2. The computer system of claim 1 , wherein the processor is further configured to determine a third score reflecting a correlation of the signal and an acoustic model trained by recent speech of the patient claim 1 , and to apply the third score to determine an equipment problem.3. A computing system comprising at least one processor ...

Подробнее
23-06-2016 дата публикации

METHOD FOR IMPROVING ACOUSTIC MODEL, COMPUTER FOR IMPROVING ACOUSTIC MODEL AND COMPUTER PROGRAM THEREOF

Номер: US20160180836A1
Принадлежит:

Embodiments include methods and systems for improving an acoustic model. Aspects include acquiring a first standard deviation value by calculating standard deviation of a feature from first training data and acquiring a second standard deviation value by calculating standard deviation of a feature from second training data acquired in a different environment from an environment of the first training data. Aspects also include creating a feature adapted to an environment where the first training data is recorded, by multiplying the feature acquired from the second training data by a ratio obtained by dividing the first standard deviation value by the second standard deviation value. Aspects further include reconstructing an acoustic model constructed using training data acquired in the same environment as the environment of the first training data using the feature adapted to the environment where the first training data is recorded. 1. A method for improving an acoustic model , comprising:acquiring, by a computer, a first standard deviation value by calculating standard deviation of a first feature from a first training data acquired in first environment;acquiring, by the computer, a second standard deviation value by calculating standard deviation of a second feature from second training data acquired in a second environment;calculating, by the computer, a modified first feature, by multiplying the second feature acquired from the second training data by a ratio obtained by dividing the first standard deviation value by the second standard deviation value; andreconstructing, by the computer, an acoustic model constructed using training data acquired in the first environment, using the modified first feature.2. The method according to claim 1 , wherein the first feature is one of a cepstrum and log mel filter bank output.3. The method according to claim 1 , wherein the amount of the first training data is smaller than an amount of the second training data.4. The ...

Подробнее
23-06-2016 дата публикации

CONCISE DYNAMIC GRAMMARS USING N-BEST SELECTION

Номер: US20160180842A1
Принадлежит:

A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar. 1. A method comprising:receiving speech from a user;recognizing the speech to yield a recognized identifier having a first character and a second character; identifying a first confusion set of characters having a probability of being confused with the first character, the first confusion set of characters having one or more substitute first confusion character;', 'iteratively substituting the first character with each respective one or more substitute first confusion character in the first confusion set of characters to yield N first selection identifiers;, 'for the first character identifying a second confusion set of characters having a probability of being confused with the second character, the second confusion set of characters having one or more substitute second confusion character;', 'iteratively substituting the second character with each respective one or more substitute second confusion character in the second confusion set of characters to yield M second selection identifiers;, 'for the second charactercombining the N first selection identifiers and the M second selection identifiers into S selection identifiers;forming a new grammar based on a match of the S selection identifiers with at least ...

Подробнее
22-06-2017 дата публикации

TECHNOLOGIES FOR ROBUST CRYING DETECTION USING TEMPORAL CHARACTERISTICS OF ACOUSTIC FEATURES

Номер: US20170178667A1
Принадлежит:

Technologies for identifying sounds are disclosed. A sound identification device may capture sound data, and split the sound data into frames. The sound identification device may then determine an acoustic feature vector for each frame, and determine parameters based on how each acoustic feature varies over the duration of time corresponding to the frames. The sound identification device may then determine if the sound matches a pre-defined sound based on the parameters. In one embodiment, the sound identification device may be a baby monitor, and the pre-defined sound may be a baby crying. 1. A sound identification device for identifying sounds , the sound identification device comprising:a sound data capture module to acquire sound data;a sound frame determination module to determine a plurality of frames of sound data based on the sound data; determine an acoustic feature matrix having two dimensions and comprising a plurality of first-dimension vectors and a plurality of second-dimension vectors, wherein each first-dimension vector of the plurality of first-dimension vectors corresponds to a corresponding frame of the plurality of frames and each second-dimension vector of the plurality of second-dimension vectors comprises an acoustic feature vector associated with the corresponding frame, and wherein each first-dimension vector of the plurality of first-dimension vectors is associated with a different acoustic feature;', 'determine a plurality of temporal parameters for each first-dimension vector of the plurality of first-dimension vectors;', 'determine, based on the pluralities of temporal parameters, whether the sound data corresponds to a pre-defined sound., 'a sound identification module to2. The sound identification device of claim 1 , wherein each acoustic feature vector of the acoustic feature matrix comprises mel-frequency cepstrum coefficients.3. The sound identification device of claim 1 , wherein the pre-defined sound is a cry of an infant.4. The ...

Подробнее
08-07-2021 дата публикации

Model Evaluation Method and Device, and Electronic Device

Номер: US20210210112A1
Принадлежит:

A model evaluation method includes obtaining M first audio signals synthesized by using a first to-be-evaluated speech synthesis model, and obtaining N second audio signals generated through recording; performing voiceprint extraction on each of the M first audio signals to obtain M first voiceprint features; performing voiceprint extraction on each of the N second audio signals to obtain N second voiceprint features; clustering the M first voiceprint features to obtain K first central features; clustering the N second voiceprint features to obtain J second central features; counting the cosine distances between the K first central features and the J second central features to obtain a first distance; and evaluating the first to-be-evaluated speech synthesis model based on the first distance. 1. A model evaluation method , comprising:obtaining M first audio signals synthesized by using a first to-be-evaluated speech synthesis model, and obtaining N second audio signals generated through recording;performing voiceprint extraction on each of the M first audio signals to obtain M first voiceprint features, and performing voiceprint extraction on each of the N second audio signals to obtain N second voiceprint features;clustering the M first voiceprint features to obtain K first central features, and clustering the N second voiceprint features to obtain J second central features;counting cosine distances between the K first central features and the J second central features to obtain a first distance; andevaluating the first to-be-evaluated speech synthesis model based on the first distance;{'b': '1', 'wherein M, N, K and J are positive integers greater than , M is greater than K, and N is greater than J.'}2. The method of claim 1 , wherein the step of counting the cosine distances between the K first central features and the J second central features to obtain the first distance comprises:for every first central feature, calculating the cosine distance between the ...

Подробнее
04-06-2020 дата публикации

Device, system and method for providing audio summarization data from video

Номер: US20200177655A1
Принадлежит: Motorola Solutions Inc

A device system and method for providing audio summarization data from video is provided. A portable media streaming device: transmits, to one or more receiving terminals, first frames of video captured by a video camera; determines, after transmission of the first frames of the video, that a signal strength associated with the first network falls below a predetermined threshold; generates audio summarization data corresponding to one or more objects of interest identified in second frames of the video, the second frames captured by the video camera after the first frames; selects a portion of the audio summarization data based on a context associated with a receiving terminal of the one or more receiving terminals; and transmits, using the communication unit, via one or more of the first network and a second network, the portion of the audio summarization data to the receiving terminal.

Подробнее
16-07-2015 дата публикации

Detecting distorted audio signals based on audio fingerprinting

Номер: US20150199974A1
Принадлежит: Facebook Inc

An audio identification system generates a probe audio fingerprint of an audio signal and determines amount of pitch shifting in the audio signal based on analysis of correlation between the probe audio fingerprint and a reference audio fingerprint. The audio identification system applies a time-to-frequency domain transform to frames of the audio signal and filters the transformed frames. The audio identification system applies a two-dimensional discrete cosine transform (DCT) to the filtered frames and generates the probe audio fingerprint from a selected number of DCT coefficients. The audio identification system calculates a DCT sign-only correlation between the probe audio fingerprint and the reference audio fingerprint, and the DCT sign-only correlation closely approximates the similarity between the audio characteristics of the probe audio fingerprint and those of the reference audio fingerprint. Based on the correlation analysis, the audio identification system determines the amount of pitch shifting in the audio signal.

Подробнее
05-07-2018 дата публикации

SYSTEM AND METHOD FOR NEURAL NETWORK BASED FEATURE EXTRACTION FOR ACOUSTIC MODEL DEVELOPMENT

Номер: US20180190267A1
Принадлежит:

A system and method are presented for neural network based feature extraction for acoustic model development. A neural network may be used to extract acoustic features from raw MFCCs or the spectrum, which are then used for training acoustic models for speech recognition systems. Feature extraction may be performed by optimizing a cost function used in linear discriminant analysis. General non-linear functions generated by the neural network are used for feature extraction. The transformation may be performed using a cost function from linear discriminant analysis methods which perform linear operations on the MFCCs and generate lower dimensional features for speech recognition. The extracted acoustic features may then be used for training acoustic models for speech recognition systems. 1. A method for training acoustic models in speech recognition systems , wherein the speech recognition system comprises a neural network , the method comprising the steps of: a. extracting acoustic features from a speech signal using the neural network; and b. processing the acoustic features into an acoustic model by the speech recognition system.224.-. (canceled) The present invention generally relates to telecommunications systems and methods, as well as automatic speech recognition systems. More particularly, the present invention pertains to the development of acoustic models used in automatic speech recognition systems.A system and method are presented for neural network based feature extraction for acoustic model development. A neural network may be used to extract acoustic features from raw MFCCs or the spectrum, which are then used for training acoustic models for speech recognition systems. Feature extraction may be performed by optimizing a cost function used in linear discriminant analysis. General non-linear functions generated by the neural network are used for feature extraction. The transformation may be performed using a cost function from linear discriminant ...

Подробнее
06-07-2017 дата публикации

SYSTEM AND METHOD FOR NEURAL NETWORK BASED FEATURE EXTRACTION FOR ACOUSTIC MODEL DEVELOPMENT

Номер: US20170193988A1
Принадлежит:

A system and method are presented for neural network based feature extraction for acoustic model development. A neural network may be used to extract acoustic features from raw MFCCs or the spectrum, which are then used for training acoustic models for speech recognition systems. Feature extraction may be performed by optimizing a cost function used in linear discriminant analysis. General non-linear functions generated by the neural network are used for feature extraction. The transformation may be performed using a cost function from linear discriminant analysis methods which perform linear operations on the MFCCs and generate lower dimensional features for speech recognition. The extracted acoustic features may then be used for training acoustic models for speech recognition systems. 1. A method for training acoustic models in speech recognition systems , wherein the speech recognition system comprises a neural network , the method comprising the steps of:a. extracting acoustic features from a speech signal using the neural network; andb. processing the acoustic features into an acoustic model by the speech recognition system.2. The method of claim 1 , wherein the acoustic features are extracted from Mel Frequency Cepstral Coefficients.3. The method of claim 1 , wherein the features are extracted from a speech signal spectrum.4. The method of claim 1 , wherein the neural network comprises at least one of: activation functions with parameters claim 1 , prealigned feature data claim 1 , and training.5. The method of claim 4 , wherein the training is performed using a stochastic gradient descent method on a cost function.6. The method of claim 5 , wherein the cost function is a linear discriminant analysis cost function.7. The method of claim 1 , wherein the extracting of step (a) further comprises the step of optimizing a cost function claim 1 , wherein the cost function is capable of transforming general non-linear functions generated by the neural network.8. The ...

Подробнее
20-06-2019 дата публикации

Signal processing device, signal processing method, and computer-readable recording medium

Номер: US20190188468A1
Принадлежит: NEC Corp

A signal processing device includes: a basis storage that stores an acoustic event basis group; a model storage that stores an identification model, as a feature amount, a combination of activation levels of spectral; an identification signal analysis unit that, upon input of a spectrogram of an acoustic signal for identification, performs sound source separation on the spectrogram by using a spectral basis set that is obtained by appending spectral bases corresponding to an unknown acoustic event that is an acoustic event other than the acoustic event specified as a detection target to the acoustic event basis group and causing only unknown spectral bases within the spectral basis set to be learned, and thereby calculating activation levels of spectral bases of the acoustic events in the spectrogram of the acoustic signal for identification; and a signal identify unit that identifies an acoustic event included in the acoustic signal for identification.

Подробнее
14-07-2016 дата публикации

Analysis object determination device and analysis object determination method

Номер: US20160203121A1
Принадлежит: NEC Corp

An analysis subject determination device includes: a demand period detection unit which detects, from data corresponding to audio of a dissatisfaction conversation, a demand utterance period which represents a demand utterance of a first conversation party among a plurality of conversation parties which are carrying out the dissatisfaction conversation; a negation period detection unit which detects, from the data, a negation utterance period which represents a negation utterance of a second conversation party which differs from the first conversation party; and a subject determination unit which, from the data, determines a period with a time obtained from the demand period utterance period as a start point and a time obtained from the negation utterance period after the demand utterance period as an end point to be an analysis subject period of a cause of dissatisfaction of the first conversation party in the dissatisfaction conversation.

Подробнее
21-07-2016 дата публикации

SIGNAL PROCESSING APPARATUS, SIGNAL PROCESSING METHOD, AND SIGNAL PROCESSING PROGRAM

Номер: US20160210987A1
Автор: Sugiyama Akihiko
Принадлежит:

This invention enables to effectively detect an abrupt change in a signal. The signal processing apparatus includes a converter that converts an input signal into a phase component signal in a frequency domain, a first calculator that calculates a first phase gradient as a gradient of the phase of the phase component signal, a second calculator that calculates a second phase gradient using the first phase gradients at a plurality of frequencies, and a determiner that determines existability concerning an abrupt change in the input signal based on the first phase gradients and the second phase gradient. 1. A signal processing apparatus comprising:a converter that converts an input signal into a phase component signal in a frequency domain;a first calculator that calculates a first phase gradient of the phase component signal for each of a plurality of frequencies;a second calculator that calculates a second phase gradient at a plurality of frequencies using the first phase gradients; anda determiner that determines presence of an abrupt change in the input signal based on the first phase gradient and the second phase gradient.2. The signal processing apparatus according to claim 1 , wherein said second calculator calculates the second phase gradient at a plurality of frequencies using the first phase gradient claim 1 , and an amplitude or a power3. The signal processing apparatus according to claim 1 , wherein said determiner determines the presence of an abrupt change in the input signal based on a similarity between the first phase gradient and the second phase gradient.4. The signal processing apparatus according to claim 3 , wherein said determiner determines that the abrupt change in the signal exists at a frequency at which a difference between the first phase gradient and the second phase gradient does not exceed a predetermined value.5. The signal processing apparatus according to claim 1 , wherein said second calculator calculates an average value of the ...

Подробнее
18-06-2020 дата публикации

COMPUTER VISION BASED MONITORING SYSTEM AND METHOD

Номер: US20200193615A1
Принадлежит: CHERRY LABS, INC.

A monitoring system includes sensors that monitor activity within a designated territory. The sensors include visual sensors that make video recordings. A local processing system located within or proximate to the designated territory receives signals from the sensors. The local processing system processes and analyzes the signals from the sensors to produce messages that describe activity within the designated territory as monitored by the sensors. The messages do not include audio, visual or other direct identifying information that directly reveal identity of persons within the designated territory. A monitoring station outside the designated territory receives the messages produced by the local processing system and makes the messages available to external observers. 1. A monitoring system comprising:sensors that monitor activity within a designated territory, the sensors including visual sensors that make video recordings;a local processing system located within or proximate to the designated territory, the local processing system receiving signals from the sensors, the local processing system processing and analyzing the signals from the sensors to produce messages that describe activity within the designated territory as monitored by the sensors, the messages not including audio, visual or other direct identifying information that directly reveal identity of persons within the designated territory; anda monitoring station outside the designated territory, the monitoring station receiving the messages produced by the local processing system and making the messages available to external observers.2. A monitoring system as in claim 1 , wherein the messages describe actions performed by a person or animal within the designated territory sufficiently to allow an external observer to determine when external intervention is required.3. A monitoring system as in or claim 1 , wherein the monitoring station produces an alarm when the messages describe actions performed ...

Подробнее
02-10-2014 дата публикации

INTELLIGENT INTERACTIVE VOICE COMMUNICATION SYSTEM AND METHOD

Номер: US20140297272A1
Автор: Saleh Fahim
Принадлежит:

The present invention generally relates to intelligent voice communication systems. Specifically, this invention relates to systems and methods for providing intelligent interactive voice communication services to users of a telephony means. Preferred embodiments of the invention are directed to providing interactive voice communication services in the form of intelligent and interactive automated prank calling services. 1. A web-based system for providing intelligent interactive voice communications , the system comprising:a voice processing module comprising computer-executable code stored in non-volatile memory;a response processing module comprising computer-executable code stored in non-volatile memory;a processor; anda communications means,wherein said voice processing module, said response processing module, said processor, and said communications means are operably connected and are configured to:receive a voice communication from a call participant;identify one or more complex speech elements from said voice communication, wherein said one or more complex speech elements are selected from the group comprising tone, pitch, inflection, pause, tempo, volume, consistency and fluidity;generate a speech analysis based on said one or more complex speech elements;determine a response, wherein said response is based at least in part on said speech analysis;transmit said response via said communications means.2. The system of claim 1 , wherein said response is a complex response type claim 1 , selected from the group comprising an interruption claim 1 , a sound response claim 1 , a third-party contact inclusion and a switch in voice response.3. The system of claim 2 , wherein said complex response type is an interruption that is transmitted concurrently with receipt of said voice communication.4. The system of claim 2 , wherein said speech analysis comprises information selected from the group comprising call participant gender claim 2 , call participant tone claim 2 ...

Подробнее
06-08-2015 дата публикации

SYSTEMS AND METHODS FOR IDENTIFYING A SOUND EVENT

Номер: US20150221321A1
Принадлежит:

Systems and methods for identifying a perceived sound event are provided. In one exemplary embodiment, the system includes an audio signal receiver, a processor, and an analyzer. The system deconstructs a received audio signal into a plurality of audio chunks, for which one or more sound identification characteristics are determined. One or more distances of a distance vector are then calculated based on one or more of the sound identification characteristics. The distance vector can be a sound gene that serves as an identifier for the sound event. The distance vector for a received audio signal is compared to distance vectors of predefined sound events to identify the source of the received audio signal. A variety of other systems and methods related to sound identification are also provided. 1. A method for identifying a sound event , comprising:receiving a signal from an incoming sound event;deconstructing the signal into a plurality of audio chunks;determining one or more sound identification characteristics of the incoming sound event for one or more audio chunks of the plurality of audio chunkscalculating one or more distances of a distance vector based on one or more of the one or more sound identification characteristics;comparing in real time one or more of the one or more distances of the distance vector of the incoming sound event to one or more commensurate distances of one or more predefined sound events stored in a database;identifying the incoming sound event based on the comparison between the one or more distances of the incoming sound event and the one or more commensurate distances of the plurality of predefined sound events stored in the database; andcommunicating the identity of the incoming sound event to a user.2. The method of claim 1 , further comprising:prior to determining one or more sound identification characteristics of the incoming sound event for an audio chunk, multiplying the audio chunk by a Hann window; andperforming a Discrete ...

Подробнее
25-06-2020 дата публикации

DATA AIDED METHOD FOR ROBUST DIRECTION OF ARRIVAL (DOA) ESTIMATION IN THE PRESENCE OF SPATIALLY-COHERENT NOISE INTERFERERS

Номер: US20200202883A1
Принадлежит:

A method and apparatus to determine a direction of arrival (DOA) of a talker in the presence of a source of spatially-coherent noise. A time sequence of audio samples that include the spatially-coherent noise is received and buffered. Aided by previously known data, a trigger point is detected in the time sequence of audio samples when the talker begins to talk. The buffered time sequence of audio samples is separated into a noise segment and a signal-plus-noise segment based on the trigger point. For each direction of a plurality of distinct directions: an energy difference is computed for the direction between the noise segment and the signal-plus-noise segment, and the DOA of the talker is selected as the direction of the plurality of distinct directions having a largest of the computed energy differences. 1. A method to determine a direction of arrival (DOA) of a talker in the presence of a source of spatially-coherent noise , the method comprising:receiving and buffering a time sequence of audio samples that include the spatially-coherent noise;detecting, aided by previously known data, a trigger point in the time sequence of audio samples when the talker begins to talk;separating the buffered time sequence of audio samples into a noise segment and a signal-plus-noise segment based on the trigger point; 'computing, for the direction, an energy difference between the noise segment and the signal-plus-noise segment; and', 'for each direction of a plurality of distinct directionsselecting as the DOA of the talker the direction of the plurality of distinct directions having a largest of the computed energy differences.2. The method of claim 1 ,wherein the previously known data comprises a keyword spoken by the talker.3. The method of claim 1 ,wherein the previously known data comprises a biometric characteristic of the talker.4. The method of claim 1 , extracting speech features from the time sequence of audio samples; and', 'detecting when the time sequence of ...

Подробнее
05-08-2021 дата публикации

STORAGE MEDIUM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING APPARATUS

Номер: US20210241763A1
Автор: MOTOSUGI Keigo
Принадлежит: FUJITSU LIMITED

A non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process includes identifying, for voice data acquired by a microphone provided in an information processing apparatus, a measurement value of a motion sensor or a geomagnetic sensor of the information processing apparatus at a timing when input of the voice data is accepted; and determining switching of a speaker corresponding to the voice data based on the identified measurement value of the motion sensor or the geomagnetic sensor. 1. A non-transitory computer-readable storage medium storing a program that causes a computer to execute a process , the process comprising:identifying, for voice data acquired by a microphone provided in an information processing apparatus, a measurement value of a motion sensor or a geomagnetic sensor of the information processing apparatus at a timing when input of the voice data is accepted; anddetermining switching of a speaker corresponding to the voice data based on the identified measurement value of the motion sensor or the geomagnetic sensor.2. The non-transitory computer-readable storage medium according to claim 1 , whereinthe determining process includesidentifying a speaker corresponding to a range including the identified measurement value of the motion sensor or the geomagnetic sensor by referring to a storage unit that stores the range related to the measurement value of the motion sensor or the geomagnetic sensor in association with information for identifying the speaker.3. The non-transitory computer-readable storage medium according to claim 1 , wherein the process comprisingoutputting a recognition result obtained by performing voice recognition processing on the voice data.4. The non-transitory computer-readable storage medium according to claim 1 , wherein the microphone is a unidirectional microphone.5. The non-transitory computer-readable storage medium according to claim 2 , wherein the process ...

Подробнее
16-07-2020 дата публикации

AUTOMATIC AUDIO DUCKING WITH REAL TIME FEEDBACK BASED ON FAST INTEGRATION OF SIGNAL LEVELS

Номер: US20200225907A1
Принадлежит:

A computer-implemented method for audio signal processing includes analyzing a foreground audio signal to determine metrics corresponding to audio slices of the foreground audio signal. Each such metric indicates a value for an audio property of a respective audio slice. The method further includes computing a total metric for an audio slice as a function of a set of the metrics corresponding to a set of the audio slices including the audio slice. The method further includes adding a key frame to a track based on the total metric. The track includes the foreground audio signal and a background audio signal, and a location of the key frame corresponds to a location of the audio slice on the track. The key frame indicates a change to the audio property of the background audio signal at the location on the track, and the key frame is utilizable for audio ducking. 1. A computer-implemented method for audio signal processing , the method comprising:analyzing a foreground audio signal to determine metrics corresponding to audio slices of the foreground audio signal, wherein each metric corresponds to a respective audio slice and indicates a value for an audio property of the respective audio slice;computing a total metric for an audio slice as a function of a set of the metrics corresponding to a set of the audio slices, wherein the set of the audio slices includes the audio slice; andadding a key frame to a track based on the total metric, wherein the track comprises the foreground audio signal and a background audio signal, wherein a location of the key frame corresponds to a location of the audio slice on the track, and wherein the key frame indicates a change to the audio property of the background audio signal at the location on the track,wherein the key frame added to the track based on the total metric is utilizable for audio ducking, the audio ducking comprising adapting a property of the background audio signal based on a state of the foreground audio signal.2. ...

Подробнее
16-07-2020 дата публикации

Audio source identification

Номер: US20200227068A1
Принадлежит: International Business Machines Corp

Embodiments facilitating audio source identification are provided. A computer-implemented method comprises: receiving, by a device operatively coupled to one or more processors, an audio signal under inspection; generating, by the device, an image of time-frequency spectrum of low frequency component and high frequency component of the audio signal; and identifying, by the device, a source of the audio signal based on the generated image and one or more patterns of time-frequency spectrum, wherein each of the one or more patterns is corresponding to low frequency feature and high frequency feature of a specific audio source.

Подробнее
23-08-2018 дата публикации

METHOD AND SYSTEM FOR DETECTING AN AUDIO EVENT FOR SMART HOME DEVICES

Номер: US20180239967A1
Принадлежит:

This application discloses a method implemented by an electronic device to detect a signature event (e.g., a baby cry event) associated with an audio feature (e.g., baby sound). The electronic device obtains a classifier model from a remote server. The classifier model is determined according to predetermined capabilities of the electronic device and ambient sound characteristics of the electronic device, and distinguishes the audio feature from a plurality of alternative features and ambient noises. When the electronic device obtains audio data, it splits the audio data to a plurality of sound components each associated with a respective frequency or frequency band and including a series of time windows. The electronic device further extracts a feature vector from the sound components, classifies the extracted feature vector to obtain a probability value according to the classifier model, and detects the signature event based on the probability value. 1. A method for detecting a signature event associated with an audio feature , comprising:on an electronic device having one or more processors and memory storing one or more programs for execution by the one or more processors, automatically and without user intervention:obtaining from a remote server a classifier model that distinguishes an audio feature from a plurality of alternative features and ambient noises, wherein the classifier model is determined by the remote server according to a number of false positives generated by the classifier model;obtaining audio data associated with an audio signal;extracting a feature vector from a plurality of sound components of the audio data based, the feature vector including a plurality of elements that are arranged according a predetermined order;classifying the extracted feature vector based on the classifier model to obtain a probability value indicating whether the audio signal includes the audio feature; anddetecting the signature event associated with the audio ...

Подробнее
23-08-2018 дата публикации

Personalized assistance system based on emotion

Номер: US20180240454A1
Принадлежит: Sony Corp

A personalized voice assistance system and method to provide a personalized emotion-based assistance to a first individual from a group of individuals are disclosed. The personalized voice assistance system detects an activity of the first individual in a first time period in a defined area. A requirement of an assistance for the first individual may be determined based on the detected activity in the defined area. The personalized voice assistance system may further compute, based on the detected activity, an emotional reaction of a second individual from the group of individuals. The emotional reaction of the second individual may be computed for the determined requirement of the assistance for the first individual. The personalized voice assistance system may further generate an output voice similar to the second individual based on at least the computed emotional reaction to assist the first individual.

Подробнее
13-11-2014 дата публикации

Speech Signal Enhancement Using Visual Information

Номер: US20140337016A1
Принадлежит: Nuance Communications Inc

Visual information is used to alter or set an operating parameter of an audio signal processor, other than a beamformer. A digital camera captures visual information about a scene that includes a human speaker and/or a listener. The visual information is analyzed to ascertain information about acoustics of a room. A distance between the speaker and a microphone may be estimated, and this distance estimate may be used to adjust an overall gain of the system. Distances among, and locations of, the speaker, the listener, the microphone, a loudspeaker and/or a sound-reflecting surface may be estimated. These estimates may be used to estimate reverberations within the room and adjust aggressiveness of an anti-reverberation filter, based on an estimated ratio of direct to indirect (reverberated) sound energy expected to reach the microphone. In addition, orientation of the speaker or the listener, relative to the microphone or the loudspeaker, can also be estimated, and this estimate may be used to adjust frequency-dependent filter weights to compensate for uneven frequency propagation of acoustic signals from a mouth, or to a human ear, about a human head.

Подробнее
09-09-2021 дата публикации

SYSTEMS AND METHODS OF SPEAKER-INDEPENDENT EMBEDDING FOR IDENTIFICATION AND VERIFICATION FROM AUDIO

Номер: US20210280171A1
Автор: KHOURY Elie, PHATAK Kedar
Принадлежит:

Embodiments described herein provide for audio processing operations that evaluate characteristics of audio signals that are independent of the speaker's voice. A neural network architecture trains and applies discriminatory neural networks tasked with modeling and classifying speaker-independent characteristics. The task-specific models generate or extract feature vectors from input audio data based on the trained embedding extraction models. The embeddings from the task-specific models are concatenated to form a deep-phoneprint vector for the input audio signal. The DP vector is a low dimensional representation of the each of the speaker-independent characteristics of the audio signal and applied in various downstream operations. 1. A computer-implemented method comprising:applying, by a computer, a plurality of task-specific machine learning models on an inbound audio signal having one or more speaker-independent characteristics to extract a plurality of speaker-independent embeddings for the inbound audio signal;extracting, by the computer, a deep phoneprint (DP) vector for the inbound audio signal based upon the plurality of speaker-independent embeddings extracted for the inbound audio signal; andapplying, by the computer, one or more post-modeling operations on the plurality of speaker-independent embeddings extracted for the inbound audio signal to generate one or more post-modeling outputs for the inbound audio signal.2. The method according to claim 1 , further comprisingextracting, by the computer, a plurality of features from a training audio signal, the plurality of features including at least one of: a spectro-temporal feature and metadata associated with the training audio signal; andextracting, by the computer, the plurality of features from the inbound audio signal.3. The method according to claim 2 , wherein the metadata includes at least one of: a microphone-type used to capture the training audio signal claim 2 , a device type from which the ...

Подробнее
09-09-2021 дата публикации

Voice Response Method and Device, and Smart Device

Номер: US20210280172A1
Принадлежит: Beijing Orion Star Technology Co Ltd

A voice response method, apparatus and intelligent device are disclosed. The method includes: receiving voice information sent by a user; determining whether the voice information contains a wake-up word; and if so, outputting a response voice according to a preset response rule. Thus, if there is a wake-up word in voice information received by the intelligent device, the intelligent device outputs a response voice according to a preset response rule. That is, after the user sends a wake-up word, the intelligent device outputs a voice to respond to the wake-up word. Therefore, the user can directly determine that the device has been woken up and can have a better experience.

Подробнее
08-08-2019 дата публикации

SYSTEMS AND METHODS FOR CLUSTER-BASED VOICE VERIFICATION

Номер: US20190245967A1
Принадлежит: Capital One Services, LLC

Systems for caller identification and authentication may include an authentication server. The authentication server may be configured to receive audio data including speech of a plurality of telephone calls, use audio data for at least a subset of the plurality of telephone calls to store a plurality of known characteristics each associated with a specific demographic, and/or use audio data for at least one of the plurality of telephone calls to identify the telephone caller making the telephone call based on determining a most similar known characteristic of the plurality of known characteristics to the audio data of the caller. 1. A method comprising:receiving, by a processor of an authentication server, audio data including speech of a user;analyzing, by the processor, the audio data to identify at least one characteristic of the speech of the user;associating, by the processor, the at least one characteristic to a cluster based on a comparison with a plurality of known characteristics, each known characteristic being associated with at least one cluster;receiving, by the processor, data indicative of a purported identity of the user;comparing, by the processor, the data indicative of the purported identity to data indicative of the at least one cluster; and likely having the purported identity in response to determining the data indicative of the purported identity matches the data indicative of the at least one cluster, and', 'unlikely to have the purported identity in response to determining the data indicative of the purported identity matches data indicative of a different cluster., 'identifying, by the processor, the user as at least one of2. The method of claim 1 , wherein:the at least one characteristic of the speech of the user comprises a plurality of words;each known characteristic comprises a plurality of associated words; andthe associating comprises determining a similarity of the plurality of words and the plurality of associated words of the ...

Подробнее
06-09-2018 дата публикации

SOUND-RECOGNITION SYSTEM BASED ON A SOUND LANGUAGE AND ASSOCIATED ANNOTATIONS

Номер: US20180254054A1
Принадлежит: OtoSense Inc.

The disclosed embodiments provide a system for recognizing a sound event in raw sound. During operation, the system receives the raw sound, wherein the raw sound comprises a sequence of digital samples of sound. Next, the system segments the raw sound into a sequence of tiles, wherein each tile comprises a set of consecutive digital samples. The system then converts the sequence of tiles into a sequence of snips, wherein each snip includes a symbol representing an associated tile in the sequence of tiles. Next, the system generates annotations for the sequence of snips and the raw sound, wherein each annotation specifies a property associated with one or more snips in the sequence of snips or the raw sound. Finally, the system recognizes the sound event based on the generated annotations. 1. A method for recognizing a sound event in raw sound , comprising:receiving the raw sound, wherein the raw sound comprises a sequence of digital samples of sound;segmenting the raw sound into a sequence of tiles, wherein each tile comprises a set of consecutive digital samples;converting the sequence of tiles into a sequence of snips, wherein each snip includes a symbol representing an associated tile in the sequence of tiles, wherein each snip takes up less space than an associated tile, wherein each snip is stored in a canonical representation, and wherein the sequence of snips is searchable;generating annotations for the sequence of snips and the raw sound, wherein each annotation specifies a property associated with one or more snips in the sequence of snips or the raw sound; andrecognizing the sound event based on the generated annotations.2. The method of claim 1 , wherein converting the sequence of tiles into the sequence of snips comprises:identifying tile features for each tile in the sequence of tiles;performing a clustering operation based on the tile features to identify clusters of tiles and to associate each tile with a cluster;associating each identified cluster ...

Подробнее
06-09-2018 дата публикации

RADIO COMMUNICATION DEVICE

Номер: US20180254055A1
Автор: NAKANO Manabu
Принадлежит:

A transmission controller monitors a sound pressure determination signal and a distance determination signal. The transmission controller controls a transmission voice processor to start an operation of generating a transmission voice signal, when the distance determination signal indicates that a distance is equal to or less than a first distance. The transmission controller controls to start an operation of determining a sound pressure of a voice signal, when the distance determination signal indicates that the distance is equal to or less than a second distance shorter than the first distance. The transmission controller supplies a transmission control signal to a transmission circuit so that the transmission circuit transmits the transmission voice signal as a radio wave, when the sound pressure determination signal indicates that the sound pressure is equal to or greater than a predetermined threshold value. 1. A radio communication device comprising:a sound pressure determination unit configured to determine a sound pressure of a voice signal output from, a microphone, and to generate a sound pressure determination signal;a distance determination unit configured to determine a distance from, the radio communication device to a user of the radio communication device based on a detection value that is generated by a distance sensor and corresponds to the distance from the radio communication device to the user, and to generate a distance determination signal;a transmission voice processor configured to implement voice processing for the voice signal output from the microphone, and to generate a transmission voice signal;a transmission circuit configured to transmit the transmission voice signal as a radio wave; anda transmission controller configured to monitor the sound pressure determination signal and the distance determination signal, whereinthe transmission controller:controls the transmission voice processor to start operations of implementing the voice ...

Подробнее